CN116701438B

CN116701438B - Data association analysis method, device, electronic equipment and computer storage medium

Info

Publication number: CN116701438B
Application number: CN202310981753.4A
Authority: CN
Inventors: 石志林
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-08-07
Filing date: 2023-08-07
Publication date: 2024-01-30
Anticipated expiration: 2043-08-07
Also published as: CN116701438A

Abstract

The embodiment of the application discloses a data association analysis method, a device, electronic equipment and a computer storage medium, wherein the method comprises the following steps: acquiring a set of data to be associated; based on grouping conditions, grouping the first data to be associated in the set of data to be associated and the second data to be associated to obtain a plurality of data groups; ordering the first data to be associated and the second data to be associated in each data packet by taking the end point of the interval as the condition of ascending arrangement; for each data packet, performing packet forward scanning operation on the first data to be associated and the second data to be associated after sequencing in sequence to obtain an associated data pair of each data packet, wherein scanning areas corresponding to the first data to be associated and the second data to be associated after each scanning are not overlapped; all data packets are traversed based on the packet forward scan operation until all associated data pairs of the set of data to be associated are obtained. The method and the device can improve the real-time correlation analysis efficiency of the data.

Description

Data association analysis method, device, electronic equipment and computer storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a data association analysis method, a data association analysis device, an electronic device, and a computer storage medium.

Background

With the continuous development of technology, the performance of devices is continuously improved, and the data analysis technology of big data is mature and starts to be applied in various scenes. In the data analysis technology of big data, a data association analysis algorithm can be used to perform association analysis on a large amount of data in a database so as to determine data which are needed by the object and have association with each other.

However, the calculation amount of the association analysis performed by adopting the data association analysis algorithm is large, which means that the calculation amount is large, which can cause the performance bottleneck of the device, so that the efficiency of the association analysis is low, the data association analysis can be performed only in an offline state, and the online real-time data association analysis cannot be realized.

Disclosure of Invention

The embodiment of the application provides a data association analysis method, a device, electronic equipment and a storage medium, which can effectively improve the efficiency of carrying out online real-time association analysis on mass data.

An embodiment of the present application provides a data association analysis method, where the method includes:

Acquiring a set of data to be associated, wherein the data to be associated is a section with object information and time information, the starting point of the section represents the starting time, and the end point of the section represents the ending time;

respectively grouping first data to be associated and second data to be associated in the set of data to be associated based on grouping conditions to obtain a plurality of data groups, wherein the first data to be associated and the second data to be associated are divided based on object information, the grouping conditions comprise that overlapping areas exist among the data to be associated of the same data group, and the starting point of an interval is smaller than a preset value; ordering the first data to be associated and the second data to be associated in each data packet by taking the end point of the interval as the condition of ascending arrangement;

for each data packet, performing preset forward scanning operation on the first data to be associated and the second data to be associated after sequencing in sequence to obtain an associated data pair of each data packet, wherein the associated data pair is a pair of the first data to be associated and the second data to be associated with overlapping areas, and scanning areas corresponding to the first data to be associated and the second data to be associated after scanning are not overlapped;

Traversing all data packets based on the preset forward scanning operation until all associated data pairs of the set of data to be associated are obtained.

A second aspect of embodiments of the present application provides a data association analysis apparatus, the apparatus including:

the data collection acquisition unit is used for acquiring a collection of data to be associated, wherein the data to be associated is a section with object information and time information, the starting point of the section represents the starting time, and the end point of the section represents the ending time;

the data grouping unit is used for respectively grouping the first data to be associated and the second data to be associated in the set of data to be associated based on grouping conditions to obtain a plurality of data groups, wherein the first data to be associated and the second data to be associated are divided based on object information, the grouping conditions comprise the existence of overlapping areas among the data to be associated of the same data group, and the starting point of an interval is smaller than a preset value;

the data sorting unit is used for sorting the first data to be associated and the second data to be associated in each data packet under the condition that the end point of the interval is used as an ascending order;

the data scanning unit is used for sequentially carrying out preset forward scanning operation on the first data to be associated and the second data to be associated after sequencing aiming at each data packet to obtain an associated data pair of each data packet, wherein the associated data pair is a pair of the first data to be associated and the second data to be associated with a coincidence area, and the scanning areas corresponding to the first data to be associated and the second data to be associated after each scanning are not coincident;

And the data association unit is used for traversing all data packets based on the preset forward scanning operation until all associated data pairs of the set of data to be associated are obtained.

Optionally, the data scanning unit includes:

a first scanning area first determining subunit, configured to perform a preset forward scanning operation with reference to first data to be associated in the data packet, so as to obtain a first scanning area;

a data association first subunit, configured to combine second data to be associated covered by the first scanning area with each first data to be associated in the data packet to form the associated data pair;

a second scanning area determining subunit, configured to perform a preset forward scanning operation with reference to second first data to be associated in the data packet, so as to obtain a second scanning area, where a start point of the second scanning area is an end point of the first scanning area;

a data association second subunit, configured to combine second data to be associated covered by the second scan area, the second first data to be associated, and each first data to be associated located after the second first data to be associated, into the association data pair;

And the step iterates the first subunit, and is used for repeatedly executing the steps on the first data to be associated which is not scanned until all associated data pairs of the data packet are obtained.

Optionally, the data associating the first subunit includes:

a first endpoint matching condition subunit, configured to obtain, from the sorted second to-be-associated data, second to-be-associated data that satisfies a first endpoint matching condition as candidate scan data, where the first endpoint matching condition is that a starting point of the second to-be-associated data is greater than a starting point of the first to-be-associated data;

and the second determination subunit of the first scanning area is configured to execute the packet scanning operation by using a starting point of first candidate scanning data as a scanning starting point and a starting point of first target scanning data meeting a second endpoint matching condition as a scanning end point, so as to obtain the first scanning area, where the second endpoint matching condition is that the starting point of the second data to be associated is greater than the end point of the first data to be associated.

Optionally, the first scan area second determining subunit is further specifically configured to: comparing the end point of the first data to be associated with the start point of the first candidate scanning data;

If the starting point of the first candidate scanning data is smaller than the ending point of the first data to be associated, the ending point of the first data to be associated is continuously compared with the starting point of the next candidate scanning data after the first candidate scanning data;

repeating the steps until the target scanning data are obtained, and stopping the preset forward scanning operation at the starting point of the target scanning data.

Optionally, the data association analysis device further includes:

the data comparison first unit is used for comparing the first data to be associated which is not scanned with the second data to be associated;

and a preset forward scanning first unit, configured to perform the preset forward scanning operation with the second data to be associated as a reference if there is second data to be associated with the start point being smaller than the first data to be associated.

Optionally, the first scan area first determining subunit includes:

a sub-bucket obtaining subunit, configured to obtain a first sub-bucket that is completely covered by the first data to be associated in the bucket index, and a second sub-bucket where an endpoint of the first data to be associated is located;

the second data to be associated determining subunit is configured to determine, according to coverage information of the second data to be associated on the first sub-bucket and the second sub-bucket, scan coverage data from the second data to be associated, where the scan coverage data is second data to be associated that participates in the preset forward scanning operation;

And the third determining subunit of the first scanning area is used for acquiring the starting point of the first scanning coverage data and the coverage area corresponding to the starting point of the last scanning coverage data to obtain the first scanning area. Optionally, the second data to be associated determining subunit is further specifically configured to:

establishing a reference relation between the second data to be associated and the bucket index;

traversing candidate second data to be associated with the starting point of which is positioned in the second sub-bucket; and determining the first candidate second data to be associated and the second data to be associated with the starting point positioned in the first sub-bucket together as the scanning coverage data.

Optionally, the first scan area first determining subunit includes:

a subunit, configured to obtain a preset expansion coefficient of the preset forward scanning operation, where the preset expansion coefficient is an integer greater than zero;

a spreading coefficient grouping subunit, configured to group the second sorted data to be associated according to the preset spreading coefficient, where the number of data included in each group is the same as the preset spreading coefficient;

a preset forward scanning subunit, configured to, for each packet, determine, if the last second data to be associated in the packet meets a third point matching condition, a region covered by the first second data to be associated in the packet to the start point of the last second data to be associated as a coverage region of the first scanning region, and perform the preset forward scanning operation on a next packet of the packet, where the third point matching condition is the start point of the second data to be associated and is smaller than or equal to the end point of the first data to be associated;

And the step iterating the second subunit, which is used for repeatedly executing the steps on the rest groups in sequence until all coverage areas of the first scanning area are obtained, so as to obtain the first scanning area. Optionally, the data association analysis device further includes:

a third-point condition matching unit, configured to sequentially determine whether remaining second to-be-associated data in the packet meets the third-point matching condition if last second to-be-associated data in the packet does not meet the third-point matching condition;

a fourth determining unit of the first scanning area, configured to obtain first last data satisfying the third terminal matching condition from the remaining second data to be associated, and determine, as the first scanning area, an area from a start point of the first second data to be associated in the packet to an area covered by the start point of the first last data;

and the fifth determining unit of the first scanning area is used for acquiring second last data in a last packet of the packet if the data meeting the third-end point matching condition does not exist in the rest of second data to be associated, and determining an area from the starting point of the first second data to be associated in the packet to the area covered by the starting point of the second last data as the first scanning area.

Optionally, the data packet unit includes:

the data sorting subunit is used for sorting each first data to be associated and each second data to be associated under the condition that the starting point of the interval is used as an ascending order;

a first sub-unit of data packet, configured to divide each first data to be associated that satisfies the packet condition into the same data packet if the starting point of the first data to be associated is smaller than the starting point of the first second data to be associated, where the preset value is the starting point value of the first second data to be associated;

a second sub-unit of data packet, configured to divide each second data to be associated that satisfies the packet condition into the same data packet if the starting point of the first data to be associated is greater than the starting point of the first second data to be associated, where the preset value is the starting point value of the first data to be associated;

and the step iterating third subunit is used for repeatedly executing the steps on the first data to be associated which are not grouped and the second data to be associated which are not grouped until all the data to be associated complete the grouping operation.

Optionally, the data association analysis device further includes:

The starting point array construction unit is used for acquiring a starting point array formed by the starting point of the first data to be associated and the starting point of the second data to be associated;

the terminal array construction unit is used for acquiring a terminal array formed by the terminal of the first data to be associated and the terminal of the second data to be associated;

a preset forward scanning second unit, configured to perform the preset forward scanning operation based on the start point array and the end point array.

An electronic device provided in a third aspect of an embodiment of the present application includes:

a processor and a storage medium;

the processor is used for realizing each instruction;

the storage medium is configured to store a plurality of instructions for loading and executing the data correlation analysis method described above by the processor.

The fourth aspect of the embodiments of the present application further provides a computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps in any of the data association analysis methods provided by the embodiments of the present application.

The fifth aspect of the embodiments of the present application also provides a computer program product, including a computer program or instructions, which when executed by a processor implements any of the data association analysis methods provided by the embodiments of the present application.

Therefore, according to the embodiment of the application, grouping operation can be performed on the first data to be associated and the second data to be associated respectively based on grouping conditions, since the grouping conditions define that overlapping areas exist among the data to be associated under the same grouping, and the starting point of the intervals is smaller than a preset value, and the grouping method further sorts the grouped data to be associated according to the condition that the end points of the intervals are ascending, so that continuous scanning intervals from the same data set to be processed can be used as a group to be processed, the effect that scanning areas corresponding to the data to be associated of the same grouping are not overlapped can be achieved, repeated comparison of common areas of the data to be associated of the same grouping is avoided, calculation amount of data association analysis can be obviously reduced, efficiency of data association analysis algorithms is improved, and efficiency of online real-time association analysis of mass data is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is an application scenario schematic diagram of a data association analysis system provided in an embodiment of the present application;

fig. 2a is a schematic flow chart of a data association analysis method according to an embodiment of the present application;

fig. 2b is a schematic flow chart of a data association analysis method according to an embodiment of the present application;

FIG. 3a is a schematic diagram showing data to be associated according to an embodiment of the present application;

FIG. 3b is a schematic diagram of performing a forward scan provided by an embodiment of the present application;

FIG. 3c is a schematic diagram of performing a preset forward scan according to an embodiment of the present application;

FIG. 3d is a schematic diagram of introducing bucket indexes in a preset forward scan provided by an embodiment of the present application;

FIG. 3e is a schematic diagram of an application of the enhanced loop unrolling algorithm provided by an embodiment of the present application;

FIG. 3f is a diagram illustrating a comparison of data layouts for decomposing forward scan and forward scan according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a data association analysis device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

Cloud technology (Cloud technology): the system is a hosting technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data, and can be understood as a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied based on a cloud computing business model, and a resource pool can be formed, so that the system is used as required, and is flexible and convenient. Background service of the technical network system needs a large amount of computing and storage resources, such as video websites, picture websites and more portal websites, along with the high development and application of the internet industry, each object possibly has an own identification mark and needs to be transmitted to the background system for logic processing, data of different levels are processed separately, and various industry data needs powerful system rear shield support, so cloud technology needs to be supported by cloud computing. Cloud computing is a computing model that distributes computing tasks over a large number of computer-made resource pools, enabling various application systems to acquire computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the cloud are infinitely expandable in the sense of users, and can be acquired at any time, used as needed, expanded at any time and paid for use as needed. As a basic capability provider of cloud computing, a cloud computing resource pool platform, referred to as a cloud platform for short, is generally called infrastructure as a service (IaaS, infrastructure as a Service), and multiple types of virtual resources are deployed in the resource pool for external clients to select for use. The cloud computing resource pool mainly comprises: computing devices (which may be virtualized machines, including operating systems), storage devices, and network devices.

In order to achieve the purpose of improving the efficiency of carrying out association analysis on different data, the embodiment of the application provides a data association analysis method, a device, a medium and equipment. The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present application based on the embodiments herein. Examples of the embodiments are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements throughout or elements having like or similar functionality.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In order to facilitate understanding of the technical solutions and the technical effects thereof described in the embodiments of the present application, the embodiments of the present application explain related terms:

spatiotemporal data: refers to data having spatiotemporal properties, i.e. data comprising spatial information and temporal information. Which is data collected or recorded spatially over time. Spatiotemporal data is typically used to describe and analyze phenomena, events, or objects that are related in time and space.

Section connection: it means that, in the two interval sets, the interval pair satisfying the specific condition is found. Specifically, given two sets of intervals R and S, the goal of an interval connection is to find all pairs of intervals (R, S) that satisfy a condition, where R belongs to set R, S belongs to set S, and a certain connection condition is satisfied.

Forward scanning: i.e., forward scan, is a data processing algorithm for traversing and processing sequentially in a ordered data set. The algorithm processes the data items one by one starting from the start position of the data set until the end of the data set is reached. For each section scanned, scan forward from the section list of the other set, all sections with start points before r.end will form a connection result with r.

Barrel dividing index: is a data index structure for accelerating data lookup and retrieval operations. It divides the data into a plurality of buckets (buckets), each containing a set of data items with similar characteristics. Each sub-bucket has a unique identifier (bucket number) for quickly locating and accessing a particular data item.

And (3) cyclic unfolding: is an optimization technique and aims to improve the efficiency of loop execution. The method reduces loop control overhead and improves instruction level parallelism by reducing the number of iterations of a loop body in a compiler or program and repeatedly executing a plurality of iteration steps in the loop.

Enhancing cyclic deployment: is an extension and improvement of the traditional loop unfolding technology and aims to further improve the loop execution efficiency. The method further reduces the loop control overhead, improves the instruction level parallelism through deeper unfolding and optimizing strategies, and utilizes hardware characteristics and optimizing technology to optimize the calculation instructions in the loop body to the greatest extent.

Data layout: refers to the manner in which data is organized and arranged in computer memory. It relates to how data is stored in memory and how it is accessed and manipulated. Proper data placement can significantly impact the performance and efficiency of a program, especially for computationally intensive applications with large amounts of data access.

Fig. 1 is a schematic diagram of an implementation environment of a data association analysis method according to an embodiment of the present application, where, as shown in fig. 1, the implementation environment may include at least a terminal and a server.

The embodiment of the application provides a data association analysis method, a data association analysis device, electronic equipment and a computer readable storage medium. The data association analysis device may be integrated in an electronic device, which may be a server or a terminal.

The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, network acceleration services (Content Delivery Network, CDN), basic cloud computing services such as big data and an artificial intelligent platform.

The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted terminal, etc. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited herein.

For example, referring to fig. 1, taking an example that the data association analysis device is integrated in a server, the server may be configured to obtain a set of data to be associated, where the data to be associated is a section with object information and time information, a start point of the section represents a start time, and an end point of the section represents an end time; respectively grouping first data to be associated and second data to be associated in the set of data to be associated based on grouping conditions to obtain a plurality of data groups, wherein the first data to be associated and the second data to be associated are divided based on object information, the grouping conditions comprise that overlapping areas exist among the data to be associated of the same data group, and the starting point of an interval is smaller than a preset value; ordering the first data to be associated and the second data to be associated in each data packet by taking the end point of the interval as the condition of ascending arrangement; for each data packet, performing preset forward scanning operation on the first data to be associated and the second data to be associated after sequencing in sequence to obtain an associated data pair of each data packet, wherein the associated data pair is a pair of the first data to be associated and the second data to be associated with overlapping areas, and scanning areas corresponding to the first data to be associated and the second data to be associated after scanning are not overlapped; traversing all data packets based on the preset forward scanning operation until all associated data pairs of the set of data to be associated are obtained.

The embodiments of the present application may be applied to a variety of scenarios including, but not limited to, data analysis, cloud technology, artificial intelligence, intelligent transportation, and the like.

In addition, "plurality" in the embodiments of the present application means two or more. "first" and "second" and the like in the embodiments of the present application are used for distinguishing descriptions and are not to be construed as implying relative importance.

It will be appreciated that in the specific embodiments of the present application, related data such as data to be associated with an object, when the embodiments of the present application are applied to specific products or technologies, permission or consent of the object needs to be obtained, and collection, use and processing of the related data need to comply with related laws and regulations and standards of related countries and regions.

The embodiment of the application can also be realized by combining Cloud technology, wherein Cloud technology (Cloud technology) refers to a hosting technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data, and can also be understood as a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied based on a Cloud computing business model. Cloud technology requires cloud computing as a support. Cloud computing is a computing model that distributes computing tasks over a large number of computer-made resource pools, enabling various application systems to acquire computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Specifically, the server and the database are located in the cloud, and the server may be a physical machine or a virtualized machine.

The following describes a data association analysis method provided by the application. FIG. 2a is a flow chart of a method of data correlation analysis provided in an embodiment of the present application, which provides the method operational steps described in the embodiments or flow charts, but may include more or less operational steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented in a real system or server product, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multithreaded environment). Referring to fig. 2a, a data association analysis method provided in an embodiment of the present application may include the following steps:

step 101, acquiring a set of data to be associated.

For easy understanding of the data to be associated, refer to fig. 3a, and fig. 3a is a schematic diagram showing the data to be associated according to an embodiment of the present application.

The data to be associated may be a section with object information and time information, where a start point of the section represents a start time and an end point of the section represents an end time. In the embodiment of the present application, the data to be associated may be regarded as an interval corresponding to the spatio-temporal data.

In general, the end point value of the interval corresponding to the data to be associated, that is, the start point value start and the end point value end, may be from a domain of a non-negative integer N, expressed as start, end e N, and the interval i of the data to be associated is defined as i= [ start, end]Belongs to a subset of N, and the subset comprises all non-negative integers x which meet the conditions that the start is less than or equal to x and less than or equal to end. Assuming that R and S are a set of two intervals, the intervals are connectedThe two intersecting interval pairs R epsilon R and S epsilon S are formed, and the r.start is less than or equal to s.start is less than or equal to r.end or s.start is less than or equal to r.start is less than or equal to S end.

It will be appreciated that the above data to be associated may be stored in a corresponding time database, each data to be associated stored in the time database may be regarded as a tuple, each tuple having an effective time interval, where an effective time interval refers to a time range in which a tuple has validity and activity in a certain time domain, for example, the effective time interval corresponding to a tuple of employee data may be defined as a time range between the employee's date of entry and date of departure. And each tuple has a display attribute relationship capable of representing data. The displayed attribute relationship is a relationship between the data structure and the attribute definition in the database, for example, the displayed attribute relationship corresponding to the tuple of the employee data may include the name, the department, the date of job entry, the date of job departure, and the like of the employee.

Referring now to the scenario example, as shown in FIG. 3a, two employees in department A, whose names are name1 and name2, respectively, and four employees in department B, whose names are name5, name6, name7, and name8, respectively, are shown in the presentation interface. Accordingly, the start date and end date of each employee's work at the respective department are also presented. It will be understood that, by taking the name and the division of each employee as object information and taking the start date and the end date corresponding to each employee and the time range between the start date and the end date as time information, a plurality of intervals as illustrated in the right half of fig. 3a can be generated according to the object information and the time information, the start point of each interval corresponds to the start date, the end point of each interval corresponds to the end date, and each interval can be regarded as one data to be associated, for example, the first data to be associated corresponding to the name1 employee is the interval [2011, 2021] of the working time of the name1 employee in the division a.

According to the above example, the data to be associated includes not only the time information of employee incumbent time, but also the space information of employee incumbent department, so the data to be associated in the present application is a time sequence connection interval belonging to the spatio-temporal data. It can be understood that the time sequence of the data to be associated is the time sequence of the employee at the time of job, for example, the time sequence of the name6 employee at the time of job is 2014, 2015 and 2016. It should be noted that, in addition to the above examples, the time granularity of the data to be associated may be different from the time interval set to be hours, days, months, years, etc. according to the requirements, which is not limited in this embodiment.

In the present scenario, the data to be associated may be subjected to data association analysis by an interval connection operation, and each intersecting interval is found to determine a data pair having an association from a set of data to be associated. For example, it may be found that the data to be associated of the name1 employee and the data to be associated of the name2 employee are intersecting interval pairs having intersecting validity periods, where the intersecting validity periods represent effective intersecting time between two intervals, and the two intersecting interval pairs represent a certain association between the two intersecting interval pairs, which represents a portion where working time between the name1 employee and the name2 employee is intersected.

Regarding the generation of the data to be associated, a one-dimensional discrete domain or continuous domain may be given, depending on whether the type of the data to be associated is discrete data or continuous data, the interval of the domain is defined by a start point and an end point, for example, the discrete data in the aforementioned scene corresponds to date data of the employee working in the respective departments. Further, when constructing the data to be associated, the starting point value of the data to be associated is considered to be smaller than the ending point value.

It should be noted that, the method of the present application adopts domain-based partition to process the data to be associated. It will be appreciated that since the data to be associated is spatiotemporal data, the data to be associated may be partitioned based on the time domain or the spatial domain. For example, the data to be associated may be divided into different time domains, and still referring to the foregoing example, a partition may be created according to the time domain from 2008 to 2023, and the data to be associated from name1 employee to name2 employee and from name5 employee to name8 employee may be divided into the partition. For another example, the data to be associated may be divided according to different spatial domains, a partition is created according to the spatial domains that were occupied in the department a and the department B, and the data to be associated of the name1 employee to the name2 employee and the name5 employee to the name8 employee is divided into the partition.

For this example, if the above data is hashed by using a conventional hash algorithm, for example, the ID and the work number of the employee are hashed, the obtained hash value is mapped to different partitions according to the preset hash value range, for example, two data to be associated with a hash value of 1 and a hash value of 2 may be divided into hash partitions with hash value ranges of 1-100.

It can be appreciated that the domain-based partitioning of the present application may overcome the defects of the conventional hash partition, and the hash partition may be close to or centralized in a certain range due to employee numbers, so that partition data is inclined, that is, the data amount of data to be associated in a certain partition is too much compared with other partitions. In addition, as the data calculation amount of the hash partition is large, the communication overhead and maintenance cost of the hash partition are also large, and the domain-based partition of the data to be associated in the application does not have the defects, but also can adopt a multithreading processing mode, for example, interval connection tasks of different partitions can be distributed to different threads for processing.

Furthermore, the partition data can be further subdivided in a small connection mode, for example, the interval connection task in the partition can be subdivided into a plurality of independent subtasks, each subtask is responsible for processing a part of interval connection tasks of data to be associated, and each small connection subtask can be scheduled in an available CPU thread to realize parallel computation. And the self-adaptive domain partition of the data can be effectively realized, wherein the self-adaptive domain partition refers to dynamically adjusting the partition size, the quantity of data to be associated in each partition, the quantity of subtasks and the like according to the data processing condition of each thread, so that the whole computing system can realize load balance.

Step 102, based on grouping conditions, grouping operation is performed on the first data to be associated and the second data to be associated in the set of data to be associated respectively, so as to obtain a plurality of data groups.

Wherein the first data to be associated and the second data to be associated may be divided based on the object information. Taking the foregoing scenario as an example, the first to-be-associated data and the second to-be-associated data may be divided based on departments where different employees are located, for example, employee data of the department a may be used as the first to-be-associated data, and employee data of the department B may be used as the second to-be-associated data.

Accordingly, the division may be performed based on the names of different employees, for example, employee data of name1 employee may be used as the second data to be associated. Different object information can be adopted to divide the first data to be associated and the second data to be associated according to the requirements of data association analysis, and the embodiment is not limited.

The grouping condition may include that a coincidence region exists between data to be associated of the same data group, and a start point of an interval is smaller than a preset value. Referring to fig. 3c, fig. 3c is a schematic diagram illustrating performing a preset forward scan according to an embodiment of the present application. Assuming that R is first data to be associated, S is second data to be associated, R is a set of the first data to be associated, and S is a set of the second data to be associated, R e R, S e S is satisfied, and each connection pair (R, S) satisfies (R, S) e r×s.

As shown in fig. 3b, a starting point with a preset value s2 may be defined, and r1 and r2 satisfying the grouping condition may be separated into the same data packet because there is a coincidence region between the intervals of r1 and r2, and the starting points of r1 and r2 are both smaller than the starting point of s 2.

Optionally, step 102 may include:

ordering each first data to be associated and each second data to be associated under the condition that the starting point of the interval is arranged in an ascending order;

if the starting point of the first data to be associated is smaller than the starting point of the first second data to be associated, dividing each first data to be associated meeting the grouping condition into the same data group, wherein the preset value is the starting point value of the first second data to be associated;

if the starting point of the first data to be associated is larger than the starting point of the first second data to be associated, dividing each second data to be associated meeting the grouping condition into the same data group, wherein the preset value is the starting point value of the first data to be associated;

and repeatedly executing the steps on the first data to be associated which is not grouped and the second data to be associated which is not grouped until all the data to be associated complete the grouping operation.

In some embodiments, the execution pseudocode that performs the grouping operation on the data to be associated is as follows:

1 sortand/>by start endpoint,// ordered according to the starting end of each interval

2first interval in/>;///>First interval

3first interval in/>;///>First interval

4whileand/>non-amplified do// traverse->And->

5/(if->A starting point of less thansStarting point of (a)

6next group from/>w.r.t./>;///>The next group is assigned to +.>，/>Is a data packet corresponding to the first data to be associated r.

The process of performing the grouping operation for r1 and r2 is now further illustrated in connection with the data to be associated in fig. 3 c. As shown in fig. 3c, the ordered data to be associated may be obtained first according to the condition that the start point of the interval is in ascending order, where the ordered data to be associated includes the first data to be associated r1 and r2, and the ordered second data to be associated s1, s2, s3, s4 and s5. Because s1 is obviously disjoint with r1 and r2, assuming that s1 is the scanned second data to be associated, and is not considered in the current sorting operation, the first data to be associated is r1, and the first second data to be associated is s2.

As can be seen from fig. 3c, defining the preset value as the start of s2, and r1 as the start of r1 being smaller than the start of s2, r1 is listed as a new data packet. And traversing to the next ungrouped r2 in sequence, wherein the starting point of r2 is smaller than the starting point of s2, and the overlapping area exists between r2 and r1, dividing r2 into data groups of r1 until the first data r to be associated which completely meets the grouping condition is obtained. Correspondingly, in the execution statement, if the starting point of the second data s to be associated, which is not grouped, is smaller than the starting point of the first data r to be associated, which is not grouped, the operations of grouping r1 and r2 are symmetrically executed.

In the above manner, the forward scanning operation may be performed on a per data packet basis on the data to be correlated, so as to implement the preset forward scanning operation of the present application, and for details on the preset forward scanning, please refer to the description in the subsequent steps.

And step 103, ordering the first data to be associated and the second data to be associated in each data packet by taking the end point of the interval as the ascending arrangement condition.

Specifically, taking the first data to be associated r1 and r2 as an example, after r1 and r2 are divided into the same data packet, since the end point of r2 is smaller than the end point of r1, that is, r2.End < r1.End, r2 is ordered to the front of r1, so as to obtain the corresponding data packet { r2, r1}.

By sorting the data to be associated in each data packet, the feature that the packet continuous interval has a coincident region can be fully utilized in the subsequent preset forward scanning process, redundant scanning region and interval endpoint comparison operation in the forward scanning is avoided, and the description of step 104 is continued with respect to the beneficial effect of sorting the data to be associated in the data packet.

Step 104, for each data packet, performing a preset forward scanning operation on the first to-be-associated data and the second to-be-associated data after being sequenced in sequence to obtain an associated data pair of each data packet.

The association data pair is a pair of first to-be-associated data and second to-be-associated data with overlapping areas, and scanning areas corresponding to the first to-be-associated data and the second to-be-associated data after scanning are not overlapped.

From the foregoing, each associated data pair (R, S) satisfies (R, S) ∈r×s. In some embodiments, the preset forward scanning operation may be a modified scanning operation based on the forward scanning operation. Referring to fig. 3b, fig. 3b is a schematic diagram of performing forward scanning according to an embodiment of the present application, and now assume that forward scanning is performed on each data to be associated in fig. 3b, and the corresponding execution pseudo code may be expressed as:

1sortand/>by start endpoint,// ordered according to the starting end of each interval

2first interval in/>;///>First interval

3first interval in/>;///>First interval

4whileand/>non-amplified do// traverse->And->

5/(if->A starting point of less thansStarting point of (a)

6///>Assign to->

7///>And->Intersection of

8Output intersection pair +.>

9///>Assign +.>

10///>Assign +.>

11else// ifThe initial endpoint is greater than or equal tosStarting point of (a)

12; ///>Assign to->

13///>And->Intersection of

14Output intersection pair +.>

15///>Assign +.>

16← next interval in S///>Assign +. >。

As can be seen in conjunction with fig. 3b, each time the forward scanning operation stops the scan line at the start of each data to be associated in R and S, for example, the scan line stops at the start of S2, it means that the scan is completed for S2. In the forward scanning execution pseudo code, the core is the part of the while loop, which is executed to compare the end point r.end of each r with the start point s.end of s to determine whether the first data to be associated intersects the second data to be associated. In addition to the cost of ordering R and S, the forward scan is made to be a total of |R|+|S|+|Secondary endpoint comparison, wherein, |j->And I is the number of associated data pairs.

As can be seen from fig. 3b, due to.end/>End is establishedSo that r1.end +.>The start is also true, so after r2.end and s2.start are compared to obtain r2.end > s2.start, r1.end does not need to be compared with s2.start again, so that the defect of redundant comparison between interval endpoints exists in forward scanning, and overlapping scanning areas corresponding to different data to be correlated as shown in fig. 3b can also occur, thus the waste of calculation resources can be caused, the efficiency of data correlation analysis is reduced, and the above defects of forward scanning operation can be effectively avoided through the improved grouping forward operation. / >

Optionally, step 104 may include:

taking the first data to be associated in the data packet as a reference, performing a preset forward scanning operation to obtain a first scanning area;

forming the second data to be associated covered by the first scanning area and each first data to be associated in the data packet into the associated data pair;

performing preset forward scanning operation by taking second first data to be associated in the data packet as a reference to obtain a second scanning area, wherein the starting point of the second scanning area is the end point of the first scanning area;

forming the association data pair by the second data to be associated covered by the second scanning area, the second first data to be associated and each first data to be associated positioned behind the second first data to be associated;

repeating the steps on the first data to be associated which is not scanned until all the associated data pairs of the data packet are obtained.

In order to describe the preset forward scan operation more accurately, the execution pseudocode given for the preset forward scan operation is as follows:

2first interval in/>;///>First interval

3first interval in/>;///>First interval

4whileand/>non-amplified do// traverse->And->

5/(if->A starting point of less thansStarting point of (a)

6next group from/>w.r.t./>;///>The next group is assigned to +.>

7sortby end endpoint;// according to end endpoint pair +.>Ordering of

8///>Assign to->

9foreachin order do// traverse packet +.>

10whilenulland/>.start/>.end do///>And->Intersection of

11foreachdo// traverse packet->

12Output each crossing pair +.>

13///>Assign +.>

14← first interval in/>after/>;///>The next interval after is assigned +.>(2, 5)

15else

16next group from/>w.r.t./>;///>The next group of (3)

17sortby end endpoint;// according to end endpoint pair +.>Ordering of

18///>Assign to->

19foreachin order do// traversal->

20whilenull and/>.start/>.end do///>And->Intersection->

21foreachdo// traverse packet->

22outputOutput of each crossing pair +.>

23next interval in/>///>Assign +.>

24first interval in/>after/>; ///>The next interval after is assigned +.>。

Please refer to fig. 3cFig. 3c is a schematic diagram of performing a preset forward scan according to an embodiment of the present application. As can be seen from FIG. 3c, the core improvement of the preset forward scan operation is to execute the while loop of lines 10-12 and 20-22 in the pseudo code, as shown in FIG. 3c, the preset forward scan operation is performed based on the first data to be correlated r2, and after the associated data pair { r2, s2} is outputted, the associated data pair is directly outputted according to the execution logic of the preset forward scan. That is, since r1 is located after r2 in the data packet, when the associated data pair { r2, s2} is output, the associated data pair { r1, s2} is directly output.

As shown in fig. 3c, after the preset forward scanning operation performed based on r2 is finished, the associated data pairs { r2, s2}, { r1, s2}, { r2, s3} and { r1, s3} are obtained, and the preset forward scanning operation is continuously performed based on r1, because s2 and s3 have already been covered by the first scanning area corresponding to r2 and the associated data pairs { r1, s2} and { r1, s3} corresponding to r1 have already been obtained, when the while loop in the preset forward scanning is performed, the scanning line only needs to start from the start point of s4 to the end point of s5, so that the first scanning area does not coincide with the second scanning area, and no redundant section end point comparison operation exists. Therefore, the preset forward scanning can effectively improve the efficiency of data association analysis by simultaneously processing a group of continuous intervals from the same set, and further reduce the complexity of forward scanning.

Optionally, the step of "performing a preset forward scanning operation with reference to the first data to be associated in the data packet to obtain a first scanning area" includes:

acquiring second data to be associated meeting a first end point matching condition from the sorted second data to be associated as candidate scanning data, wherein the first end point matching condition is that the starting point of the second data to be associated is larger than the starting point of the first data to be associated;

And taking the starting point of the first candidate scanning data as a scanning starting point, and taking the starting point of the first target scanning data meeting the second endpoint matching condition as a scanning end point, and executing the grouping scanning operation to obtain the first scanning area, wherein the second endpoint matching condition is that the starting point of the second data to be associated is larger than the end point of the first data to be associated.

Still taking fig. 3c as an example, the first scanning area is a scanning area corresponding to r2, and in the process of generating the first scanning area, candidate scanning data, that is, data that may be scanned in the preset forward scanning, is determined from the second data to be associated through the first endpoint matching condition. For example, in fig. 3c, if the starting point of s1 is smaller than the starting point of r2, s1 does not satisfy the first end point matching condition, and a preset forward scanning operation based on r2 is not required to be performed on s 1.

The second endpoint matching condition may correspond to a while condition statement of the 10 th line and the 20 th line in the preset forward scanning execution pseudo code, that is, a size relationship between a start point s.start of the second data to be associated and an end point r.end of the first data to be associated is compared. It can be understood that when there is target scan data satisfying the second endpoint matching condition in the second data to be processed, the starting point of the target scan data is taken as the scan endpoint, for example, the target scan data in fig. 3c is s4, and the starting point from s2 to s4 is the first scan area corresponding to r 2.

Optionally, the step of "taking the starting point of the first candidate scan data as the scanning starting point and the starting point of the first target scan data satisfying the second endpoint matching condition as the scanning end point" includes: comparing the end point of the first data to be associated with the start point of the first candidate scanning data;

Still referring to the preset forward scan operation in fig. 3c, the first candidate data is r2, and the first candidate scan data is s2. When the starting point of s2 is smaller than the end point of r2, that is, s2.Start < r2.End, r2 continues to perform end point comparison operation with the next candidate scanning data s3 of s2, the size between r2.End and s3.Start is judged, and the above steps are repeated until the target scanning data appear. For example, in fig. 3c, if the start point of s4 is greater than the end point of r2, s4 is determined as the target scan data corresponding to r2, and the preset forward scan operation based on r2 is stopped at the start point of the target scan data s 4.

Optionally, the method of the present application may further include:

comparing the first data to be associated which is not scanned with the second data to be associated;

and if the second data to be associated with the starting point of which is smaller than the first data to be associated exists, carrying out the preset forward scanning operation by taking the second data to be associated as a reference.

The step corresponds to the part of the else condition statement in the preset forward scanning execution pseudo code, namely the 15 th-24 th lines of the preset forward scanning execution pseudo code, it can be understood that when the condition that r.start is greater than or equal to s.start occurs, the execution logic of the preset forward scanning operation taking the second association data s as the reference is set, the specific execution mode is symmetrical to the preset forward scanning operation taking the first association data r as the reference, and if the end point of s is greater than or equal to the starting point of r, the association data pair is output。

It should be noted that, in practical application, there are two ways to preset the specific implementation of forward scanning. The first is to copy each group into a dedicated array and manage it in main memory; the second is to keep pointers to the start and end indexes of each group, and the set segments corresponding to the group are reordered. Tests show that the first method can reduce cache failure and has higher speed, but occupies more memory space. While the second method occupies less memory, but at a relatively slow speed. However, through optimizing the forward scanning operation by the preset forward scanning operation, the efficiency of the interval connection algorithm can be remarkably improved.

acquiring a first sub-bucket which is completely covered by the first data to be associated in a bucket index and a second sub-bucket which is located at the end point of the first data to be associated;

determining scanning coverage data from the second data to be associated according to coverage information of the second data to be associated on the first sub-bucket and the second sub-bucket, wherein the scanning coverage data is the second data to be associated which participates in the preset forward scanning operation;

and acquiring the starting point of the first scanning coverage data and the coverage area corresponding to the starting point of the last scanning coverage data to obtain the first scanning area.

The embodiment of the application combines the bucket index with the preset forward scanning operation, so as to further improve the efficiency of data association. Referring to fig. 3d, fig. 3d is a schematic diagram of introducing bucket indexes in a preset forward scan according to an embodiment of the present application.

And mapping the R bucket index corresponding to the first data to be associated to the second data to be associated, and storing the R bucket index as the S bucket index of the second data to be associated. As shown in fig. 3d, taking R1 as an example, there are three first sub-buckets and one second sub-bucket in the R bucket index that are completely covered by R1, so when the endpoint comparison with the second associated data is performed subsequently, only the coverage condition of the second associated data on the first sub-bucket and the second sub-bucket needs to be obtained, and the interval endpoint comparison operation can be skipped, so that whether the two associated data intersect can be obtained.

Optionally, the step of determining scan coverage data from the second data to be associated according to coverage information of the second data to be associated on the first sub-bucket and the second sub-bucket includes:

The scan coverage data is data through which a scan line of a preset forward scan operation passes. As shown in fig. 3d, since the starting points of s2 and s3 are in the range of the first sub-bucket, s2 and s3 must cover the data for the scan corresponding to the data packet where r1 is located. The starting point of the candidate second data to be associated s4 is in the range of the second sub-bucket, and s4 is the first candidate second data to be associated, so that the preset forward scanning operation of the data packet where r1 is located will also cover s4, so s4 can be determined as the scanning coverage data as well.

It should be noted that, in the subsequent preset forward scanning operation, since the starting points of s4 and s5 are both in the second sub-bucket, the magnitudes between the end point of r1 and the starting points of s4 and s5 still need to be compared. It can be seen that the benefits of combining the bucket index to the preset forward scan operation are: the end point comparison operation is not needed for the data to be associated with the starting point in the first sub-bucket, so that the efficiency of data association analysis can be further improved on the basis of preset forward scanning.

acquiring a preset expansion coefficient of the preset forward scanning operation, wherein the preset expansion coefficient is an integer greater than zero;

grouping the second sorted data to be associated according to the preset expansion coefficient, wherein the number of data contained in each grouping is the same as that of the preset expansion coefficient;

for each group, if the last second data to be associated in the group meets a third-point matching condition, determining a region covered by the starting point of the first second data to be associated in the group to the starting point of the last second data to be associated as a coverage region of the first scanning region, and performing the preset forward scanning operation on the next group of the group, wherein the third-point matching condition is the starting point of the second data to be associated and is smaller than or equal to the ending point of the first data to be associated;

and repeatedly executing the steps on the rest groups in sequence until all coverage areas of the first scanning area are obtained, so as to obtain the first scanning area.

In the present application, optimization is also performed for loop expansion, resulting in an algorithm that enhances loop expansion. Loop expansion is an algorithm that can improve program performance by rewriting loops into a series of similar, independent sequences of statements, for example, loops that process 1000 elements of an array can be expanded to perform 100 iterations, each of which processes 10 identical and independent elements.

In the preset forward scan operation algorithm of the present application, loop unrolling may be used to optimize the while loop on lines 7-9 of the execution pseudocode. Specifically, by expanding the loop by a factor of x, the new loop will only perform 1/x iterations of the original loop, each of which will process x identical and independent interval pairs simultaneously (r,) And checking the exit condition "/every x-th data to be processed>null). Furthermore, each iteration of the new loop will check the next (r,/for)>) The r.end of the pair is not less than->Start overlap condition and output the pairs when the condition is satisfied. />

It can be seen that while loop unrolling can reduce loop costs, the optimization objective of loop unrolling is only for while loopsAnd cannot be applied to the endpoint comparison condition of the output associated data pair " .start/>End "because the intersection between each pair of intervals needs to be checked. Thus, loop unrolling may not significantly improve the efficiency of the algorithm, and loop unrolling may also cause forward scanned code to become lengthy and difficult to maintain. Therefore, the optimization principle of the enhanced loop expansion on the basis of the loop expansion will be explained below.

Referring to fig. 3e, fig. 3e is a schematic diagram illustrating an application of the enhanced loop expansion algorithm according to the embodiment of the present application. First, a value of a preset expansion coefficient may be set, and the value of the preset expansion coefficient is applied to an endpoint comparison condition of the output associated data pair.

Specifically, assuming that the preset expansion coefficient is 3, every 3 second data s to be associated may be divided into a group, and the endpoint comparison condition "of the output associated data pair" is executed ".start/>When end ", namely the third end point matching condition, the end point comparison condition is only needed to be executed on the second data to be associated of the last bit in each packet. For easy understanding, in conjunction with the example in fig. 3e, it is assumed that the first data to be associated r covers 9 second data to be associated s, and then, according to a preset expansion coefficient, when the endpoint comparison condition of the output associated data pair is executed, the second data to be associated is compared once every 3 intervals, that is, the endpoint of r only needs to be compared with the start point of s3, the start point of s6 and the start point of s9, and s3, s6 and s9 are all the last second data to be associated of each packet.

It will be appreciated that since the starting point value of the last second data to be associated in each packet is the largest, e.g. the starting point of s3 is greater than the starting points of s1 and s2, when the last second data to be associated satisfies the third point matching condition, the other second data to be associated in the same group also satisfies the third point matching condition, so that combining the enhanced loop expansion to the preset forward scan and the bucket index to the preset forward scan can each achieve the benefit of reducing some of the end point comparison operations.

Optionally, the method of the present application may further include:

if the last second data to be associated in the packet does not meet the third terminal matching condition, sequentially determining whether the rest of the second data to be associated in the packet meet the third terminal matching condition;

acquiring first last data meeting the third-point matching condition from the rest second data to be associated, and determining a region from a starting point of the first second data to be associated in the packet to a region covered by the starting point of the first last data as the first scanning region;

and if the rest of the second data to be associated do not have the data meeting the third terminal matching condition, acquiring second last data in the last packet of the packet, and determining the area from the starting point of the first second data to be associated in the packet to the area covered by the starting point of the second last data as the first scanning area.

For ease of understanding, an example is still provided in connection with fig. 3 e. As shown in fig. 3e, it is assumed that the start point of the second data to be associated s9 is larger than the end point of the first data to be associated r, i.e. s9 does not satisfy the third end point matching condition, s9 does not intersect r. Because the algorithm of enhancing the cyclic expansion skips the judgment of whether the same group of other second data s7 and s8 to be associated meets the third-point matching condition, the operation of judging whether the s7 and s8 meet the third-point matching condition is triggered, so that wrong associated data pairs are prevented from being output, and the accuracy of carrying out data association analysis by preset forward scanning is ensured.

Optionally, the method of the present application may further include:

acquiring a starting point array composed of the starting point of the first data to be associated and the starting point of the second data to be associated;

acquiring an end point array consisting of the end points of the first data to be associated and the end points of the second data to be associated;

and executing the preset forward scanning operation based on the starting point array and the ending point array.

To enhance the performance of the loop unrolling in main memory, preset forward scan operations may be made more efficient by decomposing the data to be correlated. It can be appreciated that the most critical factor due to the output of the critical data pair is whether the second data to be associated satisfies the third terminal matching condition "s.start End ", in this process, the start point r.start of the first association data r and the end point s.end of the second association data s are not used in the preset forward scanning operation, and based on this, the association data can be decomposed into two separate arrays.

In some embodiments, the data to be processed may be decomposed into respective start and end arrays. Taking the first data to be correlated r1 as an example, assuming that the interval of r1 is [2,7], r1 can be decomposed into a starting point array [2] and an end point array [7], when the preset forward scanning operation is performed by taking r1 as a reference, the third end point matching operation is performed on the second data to be correlated, and only the end point array of r1 and the starting point array corresponding to other second data to be correlated are required to be used as input.

Through the above decomposition operation of the data to be processed, when the preset forward scanning operation advances forward, the algorithm only needs to iterate the corresponding starting point array or the corresponding ending point array, so that the space occupation and the cache miss number in the main memory are reduced.

The data layout expansion in this manner may be referred to as a split forward scan, and the manner in which this manner is combined with the preset forward scan operation may be referred to as a split preset forward scan, by using the split forward scan and the split preset forward scan, the performance of the loop expansion in the main memory may be further improved. Referring to fig. 3f, several data layout modes of forward scanning operations are compared, and the data layout modes of the forward scanning operation and the packet decomposition forward scanning operation can improve the algorithm efficiency by reducing unnecessary data access and cache miss, thereby improving the overall performance and efficiency of data association analysis.

Step 105, traversing all data packets based on the preset forward scanning operation until all associated data pairs of the set of data to be associated are obtained.

It will be appreciated that the foregoing operations corresponding to steps 101-104 are specific embodiments of the preset forward scanning operation based on one data packet, so when the preset forward scanning operation of the present application is performed to find the associated data pairs, all the data packets need to be traversed to obtain all the associated data pairs, so as to avoid missing the associated data pairs.

The other specific flow of the data association analysis method is as follows, referring to fig. 2b, the method further includes the following steps:

step 201, acquiring a set of data to be associated;

step 202, based on grouping conditions, grouping the first data to be associated and the second data to be associated in the set of data to be associated respectively to obtain a plurality of data groups;

step 203, ordering the first data to be associated and the second data to be associated in each data packet under the condition that the end point of the interval is arranged in an ascending order;

204, performing a preset forward scanning operation with the first data to be associated in the data packet as a reference to obtain a first scanning area;

step 205, forming a correlation data pair by the second data to be correlated covered by the first scanning area and each first data to be correlated in the data packet;

step 206, performing a preset forward scanning operation with reference to the second first data to be associated in the data packet to obtain a second scanning area;

step 207, forming a correlation data pair by the second data to be correlated covered by the second scanning area, the second first data to be correlated and the first data to be correlated located behind the second first data to be correlated;

Step 208, repeating the above steps for the first data to be associated which is not scanned until all associated data pairs of the data packet are obtained;

step 209, traversing all data packets based on a preset forward scanning operation until all associated data pairs of the set of data to be associated are obtained.

The method described in the above embodiments will be described in further detail below.

As shown in fig. 4, a schematic structural diagram of a data association analysis device according to an embodiment of the present application is provided, where the device includes:

a data set obtaining unit 301, configured to obtain a set of data to be associated, where the data to be associated is a section with object information and time information, an origin of the section represents a start time, and an end of the section represents an end time;

a data grouping unit 302, configured to perform a grouping operation on a first data to be associated and a second data to be associated in the set of data to be associated, respectively, to obtain a plurality of data groups, where the first data to be associated and the second data to be associated are divided based on object information, the grouping condition includes that overlapping areas exist between the data to be associated in the same data group, and a start point of an interval is smaller than a preset value;

A data sorting unit 303, configured to sort the first data to be associated and the second data to be associated in each data packet under a condition that the end point of the interval is arranged in an ascending order;

the data scanning unit 304 is configured to sequentially perform a preset forward scanning operation on the first to-be-associated data and the second to-be-associated data after being sequenced for each data packet, so as to obtain an associated data pair of each data packet, where the associated data pair is a pair of the first to-be-associated data and the second to-be-associated data with overlapping areas, and the scanning areas corresponding to the first to-be-associated data and the second to-be-associated data after each scanning are not overlapped;

the data association unit 305 is configured to traverse all data packets based on the preset forward scanning operation until all associated data pairs of the set of data to be associated are obtained.

Optionally, the data scanning unit 304 includes:

Optionally, the data associating the first subunit includes:

Optionally, the data association analysis device further includes:

Optionally, the first scan area first determining subunit includes:

Optionally, the data packet unit 302 includes:

Optionally, the data association analysis device further includes:

The embodiment of the application also provides electronic equipment which can be a terminal, a server and other equipment. As shown in fig. 5, a schematic structural diagram of an electronic device according to an embodiment of the present application is shown, specifically:

the electronic device may include one or more processing cores 'processors 401, one or more computer-readable storage media's memory 402, a power supply 403, an input unit 404, and a communication unit 405, among other components. Those skilled in the art will appreciate that the electronic device structure shown in fig. 5 does not create a limitation on the electronic device and may include more or fewer components than shown, or may combine certain components, or may be a different arrangement of components. Wherein:

the processor 401 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or units stored in the memory 402, and calling data stored in the memory 402. In some embodiments, processor 401 may include one or more processing cores; in some embodiments, processor 401 may integrate an application processor that primarily processes operating systems, presentation interfaces, applications, etc., with a modem processor that primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and units, and the processor 401 executes various functional applications and data processing by running the software programs and units stored in the memory 402. The memory 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the memory 402.

The electronic device also includes a power supply 403 for powering the various components, and in some embodiments, the power supply 403 may be logically connected to the processor 401 by a power management system, such that charge, discharge, and power consumption management functions are performed by the power management system. The power supply 403 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The electronic device may further comprise an input unit 404, which input unit 404 may be used for receiving input digital or character information and generating keyboard, mouse, joystick, optical or trackball signal inputs in connection with object settings and function control.

The electronic device may also include a communication unit 405, and in some embodiments the communication unit 405 may include a wireless unit, through which the electronic device may wirelessly transmit over a short distance, thereby providing wireless broadband internet access to the subject. For example, the communication unit 405 may be used to assist an object in e-mail, browsing web pages, accessing streaming media, and the like.

Although not shown, the electronic device may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 401 in the electronic device loads executable files corresponding to the processes of one or more application programs into the memory 402 according to the following instructions, and the processor 401 executes the application programs stored in the memory 402, so as to implement various functions as follows:

acquiring scene change data of a scene where a display page is located, wherein the scene change data is acquisition data changing along with time in the scene;

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

To this end, embodiments of the present application provide a computer readable storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform steps in any of the data correlation analysis methods provided by embodiments of the present application. For example, the instructions may perform the steps of:

Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

The instructions stored in the storage medium may perform steps in any data association analysis method provided in the embodiments of the present application, so that the beneficial effects that any data association analysis method provided in the embodiments of the present application can be achieved, which are detailed in the previous embodiments and are not repeated herein.

The foregoing describes in detail a data association analysis method, apparatus, electronic device and storage medium provided in the embodiments of the present application, and specific examples are applied to illustrate principles and implementations of the present application, where the foregoing description of the embodiments is only used to help understand the method and core idea of the present application; meanwhile, those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, and the present description should not be construed as limiting the present application in view of the above.

Claims

1. A method of data association analysis, the method comprising:

respectively grouping first data to be associated and second data to be associated in the set of data to be associated based on grouping conditions to obtain a plurality of data groups, wherein the first data to be associated and the second data to be associated are divided based on object information, the grouping conditions comprise that overlapping areas exist among the data to be associated of the same data group, and the starting point of an interval is smaller than a preset value;

ordering the first data to be associated and the second data to be associated in each data packet by taking the end point of the interval as the condition of ascending arrangement;

taking the first data to be associated in the data packet as a reference, performing packet forward scanning operation to obtain a first scanning area;

forming a correlation data pair by the second data to be correlated covered by the first scanning area and each first data to be correlated in the data packet;

Performing grouping forward scanning operation by taking second first data to be associated in the data grouping as a reference to obtain a second scanning area, wherein the starting point of the second scanning area is the end point of the first scanning area;

performing the grouping forward scanning operation based on the first data to be associated in the data grouping and taking the first data to be associated in the data grouping as a reference, until all associated data pairs of the data grouping are obtained, wherein the associated data pairs are a pair of first data to be associated and second data to be associated with overlapping areas, and the scanning areas corresponding to the first data to be associated and the second data to be associated after each scanning are not overlapped;

traversing all data packets based on the packet forward scanning operation until all associated data pairs of the set of data to be associated are obtained.

2. The method according to claim 1, wherein performing a packet forward scanning operation based on first data to be associated in the data packet to obtain a first scanning area includes:

3. The method of claim 2, wherein performing the packet scan operation with a start point of the first candidate scan data as a scan start point and a start point of the first target scan data satisfying the second endpoint matching condition as a scan end point comprises:

comparing the end point of the first data to be associated with the start point of the first candidate scanning data;

And returning to the step of executing the end point of the first data to be associated and the start point of the first candidate scanning data until the target scanning data is obtained, and stopping the grouping forward scanning operation at the start point of the target scanning data.

4. The method as recited in claim 1, further comprising:

and if the second data to be associated with the starting point of which is smaller than the first data to be associated exists, carrying out the grouping forward scanning operation by taking the second data to be associated as a reference.

5. The method according to claim 1, wherein performing a packet forward scanning operation based on first data to be associated in the data packet to obtain a first scanning area includes:

determining scanning coverage data from the second data to be associated according to the coverage information of the second data to be associated on the first sub-bucket and the second sub-bucket, wherein the scanning coverage data is the second data to be associated which participates in the grouping forward scanning operation;

6. The method of claim 5, wherein determining scan coverage data from the second data to be associated based on coverage information of the second data to be associated for the first and second buckets, comprises:

7. The method according to claim 1, wherein performing a packet forward scanning operation based on first data to be associated in the data packet to obtain a first scanning area includes:

acquiring a preset expansion coefficient of the grouping forward scanning operation, wherein the preset expansion coefficient is an integer greater than zero;

For each packet, if the last second data to be associated in the packet meets a third-point matching condition, determining a region covered by the starting point of the first second data to be associated in the packet to the starting point of the last second data to be associated as a coverage region of the first scanning region, and performing the packet forward scanning operation on the next packet of the packet, wherein the third-point matching condition is the starting point of the second data to be associated and is smaller than or equal to the ending point of the first data to be associated;

and returning to the step of executing the preset expansion coefficient for acquiring the forward scanning operation of the group based on the rest groups until all coverage areas of the first scanning area are obtained, so as to obtain the first scanning area.

8. The method as recited in claim 7, further comprising:

9. The method according to claim 1, wherein the grouping the first data to be associated and the second data to be associated in the set of data to be associated based on the grouping condition includes:

And based on the first data to be associated which is not grouped and the second data to be associated which is not grouped, returning to execute the condition of ascending arrangement taking the start point of the interval as the starting point, and sequencing each first data to be associated and each second data to be associated until all the data to be associated complete the grouping operation.

10. The method of claim 1, further comprising, prior to sequentially performing a packet forward scan operation on the sorted first to-be-associated data, the second to-be-associated data for each data packet:

and performing the packet forward scanning operation based on the start point array and the end point array.

11. A data correlation analysis device, the device comprising:

the data scanning unit is used for carrying out grouping forward scanning operation by taking the first data to be associated in the data grouping as a reference to obtain a first scanning area; forming a correlation data pair by the second data to be correlated covered by the first scanning area and each first data to be correlated in the data packet; performing grouping forward scanning operation by taking second first data to be associated in the data grouping as a reference to obtain a second scanning area, wherein the starting point of the second scanning area is the end point of the first scanning area; forming the association data pair by the second data to be associated covered by the second scanning area, the second first data to be associated and each first data to be associated positioned behind the second first data to be associated; performing the grouping forward scanning operation based on the first data to be associated in the data grouping and taking the first data to be associated in the data grouping as a reference, until all associated data pairs of the data grouping are obtained, wherein the associated data pairs are a pair of first data to be associated and second data to be associated with overlapping areas, and the scanning areas corresponding to the first data to be associated and the second data to be associated after each scanning are not overlapped;

And the data association unit is used for traversing all data packets based on the packet forward scanning operation until all associated data pairs of the set of data to be associated are obtained.

12. An electronic device, comprising:

a processor and a storage medium;

the processor is used for realizing each instruction;

the storage medium is configured to store a plurality of instructions for loading and executing by a processor the data correlation analysis method of any one of claims 1 to 10.

13. A computer readable storage medium storing executable instructions which when executed by a processor implement the data correlation analysis method of any one of claims 1 to 10.

14. A computer program product comprising a computer program or instructions which, when executed by a processor, implements the data correlation analysis method of any of claims 1 to 10.