WO2023207832A1 - 数据处理装置的控制方法与装置 - Google Patents

数据处理装置的控制方法与装置 Download PDF

Info

Publication number
WO2023207832A1
WO2023207832A1 PCT/CN2023/090020 CN2023090020W WO2023207832A1 WO 2023207832 A1 WO2023207832 A1 WO 2023207832A1 CN 2023090020 W CN2023090020 W CN 2023090020W WO 2023207832 A1 WO2023207832 A1 WO 2023207832A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
key
value
key value
processing
Prior art date
Application number
PCT/CN2023/090020
Other languages
English (en)
French (fr)
Inventor
戴国浩
朱振华
汪玉
肖世海
傅天予
张学仓
Original Assignee
华为技术有限公司
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司, 清华大学 filed Critical 华为技术有限公司
Publication of WO2023207832A1 publication Critical patent/WO2023207832A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2237Vectors, bitmaps or matrices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24558Binary matching operations
    • G06F16/2456Join operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data

Definitions

  • Graph mining algorithm is a representative graph processing algorithm and data mining algorithm, which is used to find specific subgraph patterns in the complete graph data structure and count the frequency of occurrence of the subgraph pattern.
  • Graph mining algorithms are widely used. Common application cases include community network analysis in social media, protein analysis in bioinformatics, and drug discovery in the field of computational chemistry.
  • the neighbor set of a point can be used to indicate whether there is a relationship between the point and other points in the relationship graph.
  • the neighbor set of the point includes multiple numbers, and the value of each number is the sequence number of the point that has a relationship with the point.
  • a data processing device can be used to compare the neighbor sets of two points in the relationship graph to determine equal sequence numbers.
  • the equal serial number is the serial number of the point in the relationship diagram that has a relationship with both points.
  • the processing power of the data processing device is limited.
  • the data processing device can compare two column indexes whose number does not exceed a preset value. When the number of numbers in a certain column index of the two column indexes exceeds the preset value, the numbers in a certain column index need to be grouped to obtain multiple arrays. After that, the data processing device can be used to compare the plurality of arrays with the other column index of the two column indexes.
  • the present application provides a control method and device for a data processing device, which can reduce the amount of calculation and shorten the data processing time.
  • each key value in any one of the data groups in each data set is greater than each key value in the data group located after any one of the data groups; multiple iterations are performed, each iteration including: converting the first target data A group and the second target data group are input to the data processing device, and the data processing device is used to determine the equal key value in the first target data group and the second target data; in each data When the data groups in the set are arranged in the first order and the first key value is less than or equal to the second key value, obtain the first data group in the first data set that is located after the first target data group.
  • each data group in each data set is arranged in order of size of the key value.
  • the number of first data groups in the first data set is multiple.
  • the first data group whose each key value exceeds the key value range in the second target data group can no longer be input to the data processing device, that is, it will no longer be compared with the second target data group, reducing the comparison with the second target data group.
  • the second target data group is compared with the number of the first data group, thereby reducing the amount of calculation.
  • the range of key values in the second target data group may belong to a range that is greater than or equal to the smallest key value in the second target data group. That is to say, when the minimum value of the key value in the first target data group is less than or equal to the maximum value of the key value in the second target data group, the subsequent first data group can no longer be regarded as the first target data group.
  • This second target data set is compared.
  • the number of at least one second data group is multiple, the data groups in each data set are arranged in the first order, and the first key If the value is greater than or equal to the second key value, obtain the second data group located after the second target data group in the second data set as the second target data group; in each data set If the data groups are arranged in the second order, and the third key value is less than or equal to the fourth key value, obtain the second data group located after the second target data group in the second data set.
  • the second data set serves as the second target data set.
  • the data group in another data set compared with a data group in a certain data set includes a size that is greater than or equal to the minimum key value in the certain array group and less than or equal to the key value in the certain data group.
  • the key value within the range of the maximum value, and reduces the possibility of comparing the certain array group with the data group in another data set that only includes key values outside the range, improving operation efficiency and reducing the amount of operation .
  • the data processing device includes a processing matrix, the processing matrix includes v ⁇ v processing units, v is a positive integer, the first target data group and the second target data group The number of at least one data in each target data group is less than or equal to v.
  • the i-th first data in the first target data group is the j-th processing unit along the second direction among the v processing units located at the first edge in the j-th input period of the iteration.
  • the p-th second data in the second target data is the q-th second data along the first direction among the v processing units located at the second edge input in the q-th input period of the iteration.
  • the first edge is adjacent to the second edge, and the processing unit inputting different data in each target data group is different
  • the first direction is a direction from the second edge to the inside of the processing matrix and perpendicular to the second edge, and the second direction is from the first edge to the inside of the processing matrix and perpendicular to the The direction of the first edge; each processing unit in the processing matrix is used to determine the key value in the first data input to the processing unit in the same input period and the key value in the second data.
  • each processing unit in the processing matrix is also used to, in the next input cycle of receiving the first data and the second data, convert the first One data is transmitted to the next processing unit along the first direction, and the second data is transmitted to the next processing unit along the second direction.
  • Using a processing matrix to process the first target data group and the second target data group can improve processing efficiency.
  • the key values in different data in each data set are different, and each processing unit in the processing matrix is used to: If the key value is not equal to the key value in the second data, the first data is transmitted to the next processing unit along the first direction, and the second data is transmitted to the next processing unit along the first direction. Data is transferred to the next processing unit in the second direction.
  • the processing unit can transmit the first data to the next processing unit along the first direction only when the key value in the first data is not equal to the key value in the second data, and transmit the second data to the next processing unit in the first direction. Data is transferred to the next said processing unit in the second direction. Therefore, data transmission can be reduced, and the amount of calculation can be reduced.
  • the data processing device further includes a filter matrix, the filter matrix includes v filter units, and the v filter units are respectively located along the first edge of the processing matrix.
  • each processing unit in the processing matrix is also used to: compare the key value in the first data with the third If the key values in the two data are equal, in the next input cycle of receiving the first data and the second data, the processing result of the processing unit is transmitted along the second direction to The next unit, the unit is the processing unit or the filtering unit, and the processing results include equal key values; or, in the next input cycle in which the processing results are received, the processing The result is transmitted to the next unit along the second direction; the method further includes: when the first key value is greater than or equal to the second key value, controlling the v along the first direction Each filtering unit sequentially outputs the processing results corresponding to the second target data group according to the input period.
  • the processing results corresponding to each key value in a certain second data group can be uniformly output after the second data group is input into the data processing device for the last time, thereby improving the flexibility of processing result output.
  • the data processing device further includes a compressed triangular matrix, the compressed triangular matrix includes v-row compression units along the first direction, and the compressed triangular matrix includes v-row compression units along the first direction.
  • the number of compression units increases row by row, and each compression unit among the plurality of compression units is configured to: receive the processing result output by the filtering unit before the compression unit along the second direction, or, receive The processing result output by the compression unit in one row along the first direction; in the next input cycle of receiving the processing result, transmit the processing result to the compression unit in the next row along the first direction. Describe the processing results.
  • the processing result corresponding to a certain second data group can be output within the same input period, thereby improving the flexibility of processing result output.
  • different key values correspond to different point sets in the relationship diagram
  • the first data is used to represent the first target point in the relationship diagram and the first data
  • the key value in corresponds to Whether there is a relationship between at least one point in the point set
  • the second data is used to represent at least one of the point set corresponding to the second target point in the relationship diagram and the key value in the second data.
  • each processing matrix is also used to output a processing result when the key value in the first data is equal to the key value in the second data, the The processing result is used to indicate the query point in the relationship graph, and the relationship between the query point and the two target points meets the preset situation.
  • It can determine the query point of the preset connection status between the two target points in the relationship graph, so that the subgraph with a certain subgraph model structure in the relationship graph can be identified, and graph mining can be realized.
  • the first data also includes a first relationship value group of the first target point corresponding to the key value
  • the second data also includes the key value
  • each processing unit in the processing matrix is also used to compare the key value in the first data with all the values in the second data.
  • preset operations are performed on each bit of the first relationship value group and the second relationship value group.
  • the equal key values correspond to the first relationship value group and the second relationship value group.
  • the same bit in the two relationship value groups corresponds to the same point in the point set corresponding to the equal key value, and the result of the preset operation of each bit is used to indicate the point corresponding to the bit. Whether the relationship between the two target points meets the preset situation.
  • the key value in the first data is equal to the key value in the second data, that is, the key value in the first data and the key value in the second data indicate the same set of points. Therefore, according to the relationship between the first relationship value group and the second The result of a bitwise preset operation on a group of relational values that determines the query point.
  • the key values in each first data group are from small to large.
  • the key values in each first data group are arranged from large to small.
  • Each key value in the first data set is arranged from small to large or from large to small. Therefore, at least one data can be obtained from the first data set as the first target data group, and the first target data groups obtained multiple times are all First data set. This makes the division of the first data group more flexible.
  • a second aspect provides a control device for a data processing device, including an acquisition module and a processing module.
  • the acquisition module is configured to acquire a first target data group and a second target data group.
  • the first target data group is a first data group among multiple first data groups of the first data set.
  • the second target data group The data set is a first data set in at least one second data set of the second data set, and each data set in each of the first data set and the second data set includes at least one data set.
  • the data groups in each data set are arranged in the first order or the second order. In the case of being arranged in the first order, each key in any of the data groups in each data set The values are less than each key value in the data group located after any of the data groups.
  • each key value in any of the data groups in each data set is greater than Each key value in the data group that follows any of the data groups described.
  • the processing module is used to,perform multiple iterations. Each iteration includes inputting the first target data set and the second target data set into the data processing device, and the data processing device is used to determine the The key value in the first target data set and the second target data are equal. Each iteration also includes, when the data groups in each data set are arranged in the first order and the first key value is less than or equal to the second key value, obtaining the location in the first data set.
  • the first data group after the first target data group is used as the first target data group the first key value is the largest key value in the first target data group, and the second key value is the The largest key value in the second target data group.
  • Each iteration also includes, when the data groups in each data set are arranged in the second order and the third key value is greater than or equal to the fourth key value, obtaining the location in the first data set.
  • the third key value is the smallest key value in the first target data group
  • the fourth key value is the The smallest key value in the second target data set.
  • the number of at least one second data group is multiple.
  • the data groups in each data set are arranged in the first order and the first key value is greater than or equal to the second key value, obtain the location in the second data set that is The second data group after the second target data group serves as the second target data group.
  • the data groups in each data set are arranged in the second order and the third key value is less than or equal to the fourth key value, obtain the location in the second data set that is The second data group after the second target data group serves as the second target data group.
  • the data processing device includes a processing matrix, the processing matrix includes v ⁇ v processing units, v is a positive integer, the first target data group and the second target data group The number of at least one data in each target data group is less than or equal to v.
  • the i-th first data in the first target data group is the j-th processing unit along the second direction among the v processing units located at the first edge in the j-th input period of the iteration.
  • the p-th second data in the second target data is the q-th second data along the first direction among the v processing units located at the second edge input in the q-th input period of the iteration.
  • the first edge is adjacent to the second edge, different data in each target data group are input to different processing units, and the first direction is from the second edge.
  • the second direction is a direction from the first edge to the inside of the processing matrix and perpendicular to the first edge.
  • Each processing unit in the processing matrix is used to determine whether the key value in the first data input to the processing unit in the same input period is equal to the key value in the second data.
  • each processing unit in the processing matrix is also configured to transmit the first data to the edge in the next input cycle of receiving the first data and the second data.
  • the processing unit next in the first direction transmits the second data to the processing unit next in the second direction.
  • the key values in different data in each data set are different, and each processing unit in the processing matrix is used to: If the key value is not equal to the key value in the second data, the first data is transmitted to the next processing unit along the first direction, and the second data is transmitted to the next processing unit along the first direction. Data is transferred to the next processing unit in the second direction.
  • the data processing device further includes a filter matrix, the filter matrix includes v filter units, and the v filter units are respectively located along the first edge of the processing matrix.
  • the v rows in the direction follow the last processing unit in each row along the second direction.
  • Each processing unit in the processing matrix is further configured to: when the key value in the first data is equal to the key value in the second data, receive the first data and the next input cycle of the second data, transmit the processing result of the processing unit to the next unit along the second direction, and the unit is the processing unit or the filtering unit, and the deal with The result includes equal key values; or, in the next input cycle in which the processing result is received, the processing result is transmitted to the next unit along the second direction.
  • the processing module is also configured to, when the first key value is greater than or equal to the second key value, control the v filtering units along the first direction to sequentially output the first key value according to the input period. The processing results corresponding to the two target data groups.
  • the first data also includes a first relationship value group of the first target point corresponding to the key value
  • the second data also includes the key value The corresponding second relationship value group of the second target point.
  • Each processing unit in the processing matrix is also configured to: when the key value in the first data is equal to the key value in the second data, perform a processing on the first relationship value
  • Each bit of the group and the second relationship value group performs a preset operation respectively, and the same bits in the first relationship value group and the second relationship value group corresponding to the equal key value correspond to the equal key value.
  • the points corresponding to the values are the same points in the set, and the result of the preset operation of each bit is used to indicate whether the relationship between the point corresponding to the bit and the two target points meets the requirements. Describe the default situation.
  • the key values in each first data group are from small to large.
  • the key values in each first data group are arranged from large to small.
  • a control device for a data processing device including a memory and at least one processor.
  • the memory is used to store a program.
  • the control device uses To perform the method described in the first aspect.
  • a data processing method including: obtaining at least one key-value pair of the first target point and at least one key-value pair of the second target point among the two target points in the relationship diagram, where the target point
  • Each key-value pair includes a key value of the target point and a relationship value group of the target point corresponding to the key value.
  • Different key values correspond to different point sets in the relationship diagram.
  • the target The relationship value group of the target point corresponding to the key value of a point is used to indicate whether the target point has a relationship with each point in the point set corresponding to the key value.
  • each key-value pair of the target point There are points in the point set corresponding to the key value that are related to the target point; determine at least one of the first target points An equal key value among at least one of the key values of the key value and the second target point; the relationship value group of the first target point corresponding to the equal key value and the second The relationship value group of the target point determines the query point in the relationship graph, and the relationship between the query point and the two target points meets the preset situation.
  • each key-value pair of the target point there is a point that has a relationship with the target point in the point set corresponding to the key value. That is to say, for each target point, the point set corresponding to the key value in each key-value pair contains a point that has a relationship with the target point.
  • obtaining at least one key-value pair for each of the two target points in the relationship diagram includes: obtaining the relationship diagram data, where the relationship diagram data includes rows Offset vector, key value vector, relationship value vector; according to the relationship diagram data, at least one key-value pair of each target point in the two target points is determined, and the key value vector includes multiple values in the relationship diagram.
  • the row offset vector is used to indicate the position of at least one key value for each point in the key value vector
  • the relationship value vector includes the plurality of points The relationship value group of the point corresponding to the key value of each point in the key value vector, and the order of at least one key value of each point in the key value vector is consistent with the relationship value group of the point corresponding to the key value of each point.
  • the order in the relation value vector is the same.
  • the key-value pairs of each point in the relationship diagram can be determined based on the relationship diagram data. Storing relationship diagrams in the format of relationship diagram data can reduce storage space.
  • the number of the point concentration points corresponding to different key values is equal.
  • the number of points in the point set corresponding to different key values is equal, which makes it easier to determine the key-value pair of the target point based on the relationship diagram data.
  • the relationship is determined based on the relationship value group of the first target point and the relationship value group of the second target point corresponding to the equal key value.
  • the query points in the figure include: performing preset operations on each bit of the relationship value group of the first target point and the relationship value group of the second target point corresponding to the equal key value, each The same bit in the relationship value group of different points corresponding to the key value corresponds to the same point in the point set corresponding to the key value, and the result of the preset operation of each bit is used Indicates whether the relationship between the point corresponding to the bit and the two target points meets the preset situation.
  • Preset operations are performed on each bit in the relationship value group of the two target points corresponding to the equal key values of the two target points.
  • the operations can be performed in parallel, which can shorten the operation time.
  • determining an equal key value among at least one key value of the first target point and at least one key value of the second target point includes: At least one key-value pair of the first target point is sequentially input to a plurality of the processing units located at the first edge of the processing matrix along the second direction according to the input cycle, and at least one key-value pair of the second target point is input A plurality of the processing units located at the second edge of the processing matrix are sequentially input along the first direction according to the input period to determine the equal key value.
  • the processing matrix includes v ⁇ v processing units, v is A positive integer greater than 1, at least one key-value pair of the first target point and at least one key-value pair of the second target point are input into the processing matrix starting from the same input period, and the The first edge is adjacent to the second edge, the first direction is a direction away from the second edge, the second direction is a direction away from the first edge, and each processing unit is used to determine the input Whether the key value of the first target point of the processing unit is equal to the key value of the second target point; and whether the key value of the first target point corresponding to the equal key value is Determining the relationship value group and the relationship value group of the second target point, and determining the query point in the relationship diagram includes: using the processing matrix to compare the first target point corresponding to the equal key value.
  • the relationship value group and the relationship value group of the second target point are processed to determine the query point, and each processing unit in the plurality of processing units is used to perform processing on the key of the first target point.
  • each processing unit in the plurality of processing units is also used to, according to the input cycle, transmit the key-value pair of the first target point to the next processing unit along the first direction, and transmit the key-value pair of the second target point to the next processing unit along the second direction. direction to the next processing unit.
  • the processing matrix can be understood as a systolic array.
  • the key-value pairs of each target point "flow" rhythmically among the processing units of the processing matrix in a “streaming” manner. All processing units process the flowing data in parallel. It can increase processing speed and reduce processing time.
  • each processing unit in the plurality of processing units is specifically configured to: compare all the key values of the first target point and the second target point. If the key values are not equal, the key-value pair of the first target point is transmitted to the next processing unit along the first direction, and the key-value pair of the second target point is transmitted. to the next processing unit along the second direction.
  • the key values in each key-value pair of a target point are different.
  • the processing unit transmits the key values of the first target point and the second target point.
  • the key value pair is no longer transmitted. Therefore, the equal key value can no longer be compared with other key values, thereby reducing the amount of calculation.
  • the number of at least one key-value pair of the first target point is greater than v
  • the method further includes: placing the key-value pairs in order of size At least one key-value pair of the first target point is divided into a plurality of first key-value pairs, and at least one key-value pair of the second target point is divided into at least one in the order of the key values in the key-value pair.
  • the number of the key-value pairs in each of the plurality of first key-value pairs and the at least one second key-value pair is less than or equal to v,
  • the key-value pair with the smallest key value among the plurality of first key-value pairs is the third key-value pair
  • the key-value pair with the smallest key value among the plurality of second key-value pairs is the third key-value pair.
  • the fourth key-value pair perform multiple iterations until the maximum key value in the third key-value pair is greater than the maximum key value in the fourth key-value pair; the iterations include: At least one key-value pair in the third key-value pair group is sequentially input to a plurality of the processing units located at the first edge among the plurality of processing units according to the input cycle, and the fourth key-value pair group is At least one of the key-value pairs is sequentially input to a plurality of the processing units located at the second edge among the plurality of processing units according to the input cycle; the largest key value in the third key-value pair group is smaller than the In the case of the largest key value in the fourth key-value pair, the next key value of the third key-value pair among the plurality of first key-value pairs arranged in ascending order of key values will be A key-value pair is used as the third key-value pair; at least one key-value pair of the first target point is sequentially input into a plurality of the first edges of the processing matrix along the second direction according to the
  • the processing unit sequentially inputs at least one key-value pair of the second target point along the first direction according to the input cycle to a plurality of processing units located at the second edge of the processing matrix, including: Before performing the multiple iterations and after each iteration, multiple key-value pairs of the third key-value pair group are sequentially input into multiple processes located at the first edge of the processing matrix according to the input cycle. unit, sequentially input the plurality of key-value pairs of the fourth key-value pair group into the plurality of processing units located at the second edge of the processing matrix among the plurality of processing units according to the input period.
  • this method can significantly reduce the amount of calculation and reduce the calculation time.
  • a data processing device including: an acquisition module and a processing module; the acquisition module is used to acquire at least one key-value pair of the first target point and the second target point among the two target points in the relationship diagram. At least one key-value pair of the target point, each key-value pair of the target point includes a key value of the target point and a relationship value group of the target point corresponding to the key value, and different key values correspond to the Different point sets in the above relationship diagram, the relationship value group of the target point corresponding to the key value of the target point is used to indicate whether the target point has a relationship with each point in the point set corresponding to the key value, In each key-value pair of the target point, there is a point that has a relationship with the target point in the point set corresponding to the key value; the processing module is configured to determine at least one of the first target points.
  • the relationship value group between the query point and the second target point determines the query point in the relationship diagram, and the relationship between the query point and the two target points meets the preset situation.
  • the acquisition module is specifically configured to: acquire the relationship graph data, where the relationship graph data includes a row offset vector, a key value vector, and a relationship value vector; according to the Relationship graph data, determine at least one key-value pair for each of the two target points, the key value vector includes at least one key value for each of the plurality of points in the relationship graph, the row The offset vector is used to indicate the position of at least one key value of each point in the key value vector, and the relationship value vector includes a relationship value group of points corresponding to the key value of each point in the plurality of points, The order of at least one key value of each point in the key value vector is the same as the order of the relationship value group of the point corresponding to the key value of each point in the relationship value vector.
  • the number of the point concentration points corresponding to different key values is equal.
  • the processing module is specifically configured to determine the relationship between the relationship value group of the first target point corresponding to the equal key value and the second target point.
  • Each bit of the value group performs a preset operation respectively, and the same bit in the relationship value group of the different points corresponding to each key value corresponds to the same point in the point set corresponding to the key value, The result of the preset operation of each bit is used to indicate whether the relationship between the point corresponding to the bit and the two target points meets the preset condition.
  • the key value of the first target point is equal to the key value of the second target point; in the case where the key value of the first target point is equal to the key value of the second target point, according to The relationship value group of the first target point and the relationship value group of the second target point are input to the processing unit, and a query point in the relationship diagram is determined.
  • the query point is related to the two targets.
  • the relationship between the points conforms to the preset situation; according to the input cycle, the key-value pair of the first target point is transmitted to the next processing unit along the first direction, and the second target point is The key-value pair of the point is transmitted to the next processing unit along the second direction.
  • each of the plurality of processing units is specifically configured to: compare the key value of the first target point and the second target point. If the key values are not equal, the key value pair of the first target point is transferred to the next processing unit along the first direction, and the key value pair of the second target point is transferred to the next processing unit along the first direction. transmitted to the next processing unit along the second direction.
  • the controller is also used to:
  • At least one key-value pair of the second target point is divided into at least one second key-value pair group, and each key value in the plurality of first key-value pair groups and the at least one second key-value pair group
  • the number of key-value pairs in the pair group is less than or equal to v, wherein the key-value pair with the smallest key value among the plurality of first key-value pairs is a third key-value pair, and the plurality of second key-value pairs
  • the key-value pair with the smallest key value among the key-value pairs is the fourth key-value pair; multiple iterations are performed until the maximum key value in the third key-value pair is greater than the fourth key-value pair
  • the maximum key value in the group, the iteration includes: sequentially inputting at least one key-value pair in the third key-value pair group into multiple processing units located at the first edge among the plurality of processing units according to the input cycle.
  • the processing unit sequentially inputs at least one key-value pair in the fourth key-value pair group into a plurality of processing units located at the second edge among the plurality of processing units according to an input cycle; If the largest key value in the third key-value pair is less than the largest key value in the fourth key-value pair, the plurality of first key values will be arranged in ascending order according to the key values.
  • the next first key-value pair of the third key-value pair in the pair group is used as the third key-value pair; the controller is specifically configured to, before performing the multiple iterations and each iteration After that, multiple key-value pairs of the third key-value pair group are sequentially input to multiple processing units located at the first edge of the processing matrix according to the input cycle, and the fourth key-value pair group is A plurality of key-value pairs are sequentially input to a plurality of the processing units located at the second edge of the processing matrix among the plurality of processing units according to the input cycle.
  • a data processing device including: a controller and a processing matrix, where the processing matrix includes v ⁇ v processing units, v is a positive integer greater than 1; the controller is used to input at least one key-value pair of the first target point among the two target points of the relationship diagram sequentially along the second direction according to the input cycle.
  • a plurality of processing units located at the first edge of the processing matrix among the plurality of processing units process at least one key-value pair of the second target point among the two target points along the first direction according to the input period.
  • a plurality of the processing units located at the second edge of the processing matrix among the plurality of processing units are sequentially input, at least one key-value pair of the first target point and at least one key of the second target point.
  • Each key-value pair of the target point includes a key value of the target point and a relationship value group of the target point corresponding to the key value.
  • Different Key values correspond to different point sets in the relationship diagram, and the relationship value group of the target point corresponding to the key value of the target point is used to indicate each point set corresponding to the target point and the key value.
  • each processing unit in the plurality of processing units is used to : Determine whether the key value of the first target point input to the processing unit is equal to the key value of the second target point; when the key value of the first target point is equal to the key value of the second target point, When the key values of the second target point are equal, the relationship is determined based on the relationship value group of the first target point and the relationship value group of the second target point input to the processing unit.
  • each processing unit among the plurality of processing units is specifically configured to: connect the key value of the first target point and the second target point. If the key values are not equal, the key value pair of the first target point is transferred to the next processing unit along the first direction, and the key value pair of the second target point is transferred to the next processing unit along the first direction. transmitted to the next processing unit along the second direction.
  • the key-value pairs of each point in the relationship diagram can be determined based on the relationship diagram data. Storing relationship diagrams in the format of relationship diagram data can reduce storage space.
  • the controller is further configured to determine at least one key-value pair of the first target point and at least one key-value pair of the second target point according to the stored relationship graph data.
  • the number of points in the point set corresponding to different key values is equal, which makes it easier to determine the key-value pair of the target point based on the relationship diagram data.
  • each processing unit in the plurality of processing units is configured to: connect all the key values of the first target point and the second target point.
  • preset operations are performed on each bit of the relationship value group of the first target point and the relationship value group of the second target point corresponding to the equal key value.
  • the same bit in the relationship value group of different points corresponding to the key value corresponds to the same point in the point set corresponding to the key value, and the result of the preset operation of each bit is used Indicates whether the relationship between the point corresponding to the bit and the two target points meets the preset situation.
  • the number of at least one key-value pair of the first target point is greater than v
  • the controller is specifically configured to: follow the order of the key values in the key-value pair. , divide at least one key-value pair of the first target point into a plurality of first key-value pairs, and divide at least one key-value pair of the second target point into at least one second key-value pair, The number of key-value pairs in each of the plurality of first key-value pairs and the at least one second key-value pair is less than or equal to v, wherein the plurality of first key-value pairs The key-value pair with the smallest key value among the key-value pairs is the third key-value pair, and the key-value pair with the smallest key value among the plurality of second key-value pairs is the fourth key-value pair; Iterating until the maximum key value in the third key-value pair is greater than the maximum key value in the fourth key-value pair, the iteration includes: combining multiple key values in the third
  • the controller Periodically input multiple processing units located at the second edge among the plurality of processing units; the largest key value in the third key-value pair group is smaller than the largest key value in the fourth key-value pair group In the case of a value, the next first key-value pair of the third key-value pair among the plurality of first key-value pairs arranged in ascending order of key value will be used as the third key-value pair.
  • Key-value pairs; the controller is specifically configured to, before performing the multiple iterations and after each iteration, input multiple key-value pairs of the third key-value pair group in sequence according to the input cycle.
  • the plurality of processing units at the first edge of the processing matrix sequentially input the plurality of key-value pairs of the fourth key-value pair group into the plurality of processing units located in the processing unit according to the input period.
  • a plurality of said processing units at the second edge of the matrix are sequentially input the plurality of key-value pairs of the fourth key-value pair group into the plurality of processing units located in the processing unit according to the input period.
  • a data processing device including a memory and at least one processor, the memory being used to A program is stored.
  • the processor is configured to execute the method in any implementation manner of the first aspect.
  • a computer-readable medium stores program code for device execution.
  • the program code includes a method for executing any one of the implementation methods of the first aspect or the fourth aspect. .
  • a ninth aspect provides a computer program product containing instructions, which when the computer program product is run on a computer, causes the computer to execute the method in any one of the implementations of the first aspect or the fourth aspect.
  • a chip in a tenth aspect, includes a processor and a data interface.
  • the processor reads instructions stored in the memory through the data interface and executes any one of the first aspect and the fourth aspect. Methods in the implementation.
  • the chip may further include a memory, in which instructions are stored, and the processor is configured to execute the instructions stored in the memory.
  • the processor is configured to execute the method in any implementation manner of the first aspect or the fourth aspect.
  • the method of the first aspect may specifically refer to the method of the first aspect and any of the various implementations of the first aspect.
  • Figure 1 is a schematic structural diagram of a graph mining algorithm.
  • Figure 2 is a schematic diagram of a data format.
  • Figure 3 is a schematic diagram of a queue-based set merging.
  • Figure 4 is a schematic structural diagram of a data processing method provided by an embodiment of the present application.
  • Figure 5 is a schematic flow chart of a method for generating relationship diagram data provided by an embodiment of the present application.
  • Figure 6 is a schematic flow chart of another data processing method provided by an embodiment of the present application.
  • Figure 7 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • Figure 8 is a schematic flow chart of yet another data processing method provided by an embodiment of the present application.
  • Figure 9 is a schematic structural diagram of a processing unit provided by an embodiment of the present application.
  • Figure 10 is a schematic diagram of a key-value pair set provided by an embodiment of the present application.
  • Figure 11 is a schematic structural diagram of a filter unit provided by an embodiment of the present application.
  • Figure 12 is a schematic structural diagram of a compression triangle provided by an embodiment of the present application.
  • Figure 13 is a schematic structural diagram of a compression unit provided by an embodiment of the present application.
  • Figure 14 is a schematic diagram of the processing time of the data processing device provided by the embodiment of the present application.
  • Figure 15 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • Figure 16 is a schematic structural diagram of a data processing system provided by an embodiment of the present application.
  • Figure 17 is a schematic structural diagram of another data processing system provided by an embodiment of the present application.
  • Figure 18 is a schematic diagram illustrating the performance comparison of the data processing system provided by the embodiment of the present application.
  • Figure 19 is a schematic flowchart of a control method for a data processing device provided by an embodiment of the present application.
  • Figure 20 is a schematic structural diagram of a control device of a data processing device provided by an embodiment of the present application.
  • Figure 21 is a schematic structural diagram of a control device of a data processing device provided by an embodiment of the present application.
  • Figure 22 is a schematic structural diagram of a data processing system provided by an embodiment of the present application.
  • Graph mining algorithm is a representative graph processing algorithm and data mining algorithm, which is used to find specific subgraph patterns in the complete graph data structure and count the frequency of occurrence of the subgraph pattern.
  • Graph mining algorithms are widely used. Common application cases include community network analysis in social media, protein analysis in bioinformatics, and drug discovery in the field of computational chemistry.
  • Figure 1 is a schematic structural diagram of a graph mining algorithm.
  • Each point on the relationship graph 110 (ie, point 1 to point N) is sequentially regarded as a point v0 in the subgraph model.
  • a relationship graph also known as a graph or graph data structure, is used to represent whether a relationship exists between nodes. There is a relationship between two nodes, which can also be understood as having a relationship between the two nodes.
  • a relationship graph consists of interconnected vertices and edges.
  • the points in the relationship diagram can also be called nodes and can be used to represent entities.
  • An edge between two points in a relationship graph can be used to indicate that there is a relationship between the two points.
  • An entity refers to something that is distinguishable and exists independently. Such as a certain person, a certain city, a certain kind of plant, a certain kind of commodity, a certain equipment, a certain atom, etc.
  • Relationship diagrams provide the ability to analyze problems from a "relationship" perspective.
  • point v1 is a neighbor of point v0.
  • each neighbor of the point u0 is sequentially regarded as the point v1.
  • point v2 is a common neighbor of point v0 and point v1.
  • point u1 is determined to be the point v1 on the relationship diagram 110
  • the points in the common neighbors of the point u0 and the point u1 are sequentially regarded as the point v2.
  • the common neighbor of point u0 and point u1 belongs to the intersection of the neighbor of point u0 and the neighbor of point u1 on the relationship graph 110 .
  • point v3 is a common neighbor of point v0 and point v1, but not a neighbor of point v2.
  • the points that are not neighbors of u2 among the common neighbors of the point u0 and the point u1 are sequentially taken as the point v3.
  • the core of the depth-first search algorithm is a series of nested for loops and set operations:
  • the graph mining algorithm processes the relationship graph 110 shown in Figure 1 and can obtain three effective subgraphs.
  • the relationship graph 110 shown in Figure 1 can be stored using compressed sparse row (CSR) format.
  • CSR compressed sparse row
  • the vector corresponding to each point can be used to indicate whether it is connected to other points.
  • the vector corresponding to each point can be equal to the number of points in the graph.
  • the vector corresponding to each point in the relationship diagram 110 may include 8 bits, and each bit is used to indicate whether the point is connected to other points.
  • the same bits in different vectors are used to indicate whether the points corresponding to different vectors are connected to the same point. That is, the same bits in different vectors correspond to the same points.
  • the i-th bit is used to indicate whether there is a connection between the certain point and point i, i ⁇ [0,7], and i is an integer.
  • i is an integer.
  • the first bit of the vector is "1"; there is no connection between point 0 and point 2, and the second bit of the vector corresponding to point 0 is "0". Therefore, it can be determined that the vector N(0) corresponding to point 0 is "01010011", and the vector N(1) corresponding to point 1 is "10101110".
  • the adjacency matrix can include vectors corresponding to each point in the relationship graph.
  • the vector corresponding to each point in the relationship graph can be used as a row in the adjacency matrix.
  • the order of the points corresponding to each row in the adjacency matrix can be the same as the order of the points corresponding to each bit in each vector.
  • Figure 2 is a schematic diagram of CSR format data.
  • Data in CSR format is obtained by encoding the data as a whole.
  • the data in CSR format includes three parts: row offset, column index and graph data value.
  • the i-th number in the row offset is used to represent the starting position of the first non-zero element in the i-th row of the matrix.
  • the column index is used to represent the column coordinates of the column in which the non-zero element is located in the matrix, and the graph data value is used to represent the specific value of the non-zero element.
  • the column coordinates of the column where the non-zero element in row 0 is located are 1, 3, 6, and 7, a total of 4 digits; the column coordinates of the column where the non-zero element in row 1 is located are 0, 2, 4, 5, and 6, respectively. 5 people in total. Therefore, the adjacency matrix can be expressed using the CSR format as: row offset (row) "0,4 --, column index "1,3,6,7,0,2,4,5,6", Figure Data value "1,1,1,1,1,1,1,1,1"
  • the part of the column index corresponding to each point can be understood as the neighbor set of the point. Determine the intersection of the neighbor sets of two points, so that the query point that has a relationship with the points corresponding to the two sets can be determined.
  • the position of the non-zero value in the vector corresponding to the point 0 (that is, the neighbor set corresponding to the point 0) according to the 0th column index.
  • the position of the non-zero value in the vector corresponding to point 1 (that is, the neighbor set corresponding to point 1) can be based on the 4th to 8th numbers in the column index.
  • compare the 0th to 3rd numbers in the column index with the 4th to 8th numbers in the column index that is, compare the neighbor set of point 0 with the neighbor set of point 1, and determine The intersection of the neighbor set of point 0 and the neighbor set of point 1 set.
  • the data processing means may be used to compare the column index portions of two points.
  • the data processing device has limited processing capabilities.
  • the data processing device may compare two neighbor sets whose number including sequence numbers does not exceed a preset value.
  • the data processing device can be used to compare the plurality of arrays with the other column index of the two column indexes, but the amount of calculation required is large and the processing time is long.
  • Figure 19 is a schematic flow chart of a data processing method based on a data processing device provided by an embodiment of the present application.
  • the data processing device includes a processing matrix, the processing matrix includes v ⁇ v processing units, v is a positive integer.
  • Method 2200 includes S2210 to S2220.
  • the first target data group is a first data group among multiple first data groups of the first data set.
  • the second target data group is is a first data group in at least one second data group of the second data set, each data group in each of the first data set and the second data set including at least one data, each The data includes key values, and the data groups in each data set are arranged in the first order or the second order.
  • any one of the data groups in each data set is Each key value in is smaller than each key value in the data group located after any of the data groups.
  • each key value in any of the data groups in each data set is Each key value is greater than each key value in the data group that follows any of the data groups.
  • the first set of data and the second set of data may be stored in memory.
  • the first data set includes a plurality of first data groups
  • the second data set includes a plurality of second data groups.
  • Obtaining the first target data group and the second target data group may be reading the first target data group and the second target data group from the memory.
  • Each iteration includes inputting the first target data set and the second target data set into the data processing device, the data processing device being used to determine the first target data set and the second target data The key values that are equal in .
  • Each iteration also includes, in the case where the data groups in each data set are arranged in the first order and the first key value is less than or equal to the second key value, or, in the case where the data groups in each data set are If the data groups are arranged in the second order, and the third key value is greater than or equal to the fourth key value, obtain the first data group located after the first target data group in the first data set.
  • the first key value is the largest key value in the first target data group
  • the second key value is the largest key value in the second target data group
  • the third key value is the smallest key value in the first target data group
  • the fourth key value is the smallest key value in the second target data group.
  • each data group in each data set is arranged in order of size of the key value.
  • the number of first data groups in the first data set is multiple.
  • the first data group whose each key value exceeds the key value range in the second target data group can no longer be input to the data processing device, that is, it will no longer be compared with the second target data group, reducing the comparison with the second target data group.
  • the second target data group is compared with the number of the first data group, thereby reducing the amount of calculation.
  • Figure 10 illustrates an example in which the first order and the second order are the same.
  • the column on the left is the first data set, and the column on the right is is listed as the second data set.
  • the first data set includes 5 data with key values 1, 4, 7, 8, 10, etc.
  • the second data set includes 5 data with key values 1, 2, 3, 4, 5, 6, 7, 8, 9 respectively. 9 data etc.
  • the first data set may include 2 first data groups, wherein the 1st first data group includes 3 data with key values of 1, 4, and 7 respectively, and the 2nd first data group
  • the data group includes 2 data with key values 8 and 10.
  • the second data set may include three second data groups.
  • the first second data group includes three pieces of data with key values 1, 2, and 3 respectively.
  • the second second data group includes key values 4 and 5 respectively.
  • 3 data of 6, and the third second data group includes 3 data with key values of 7, 8, and 9 respectively.
  • the data groups in each data set are arranged in the first order, and the first key value is greater than or equal to the second In the case of a key value, the second data group located after the second target data group in the second data set is obtained as the second target data group.
  • Figure 10 is still used as an example for explanation.
  • the largest key value 7 in the first target data group is greater than the largest key value 3 in the second target data group.
  • the next second data group of the first second data group can be obtained and used as the second data group.
  • the second second data group includes three data with key values 4, 5, and 6 respectively.
  • the largest key value 7 in the first target data group is greater than the largest key value 6 in the second target data group.
  • the next second data group of the second second data group can be obtained and used as the second data group. Two target data sets.
  • the target data group with a smaller key value will be included in the data set to which it belongs.
  • the next target data group is used as a target data group in the next iteration, and the target data group with a larger key value is used as another target data group in the next iteration, which allows comparison with a certain data group in a certain data set.
  • the data group in another data set includes key values in a range that is greater than or equal to the minimum value of the key value in the certain data group and less than or equal to the maximum value of the key value in the certain data group, and such that the certain value is
  • the possibility of comparing a data group with a data group in another data set that only includes key values outside the range is reduced, thereby improving computing efficiency and reducing the amount of computing.
  • the first target data group and the second data group start to be input in the same input period, and the first edge and the jth processing unit Two edges are adjacent, the first direction is a direction from the second edge to the inside of the processing matrix and perpendicular to the second edge, and the second direction is a direction from the first edge to the processing matrix. inward and perpendicular to the direction of the first edge.
  • Each processing unit in the processing matrix is used to determine whether the key value in the first data input to the processing unit in the same input period is equal to the key value in the second data, so The first data is the data belonging to the first target data group, and the second data is the data belonging to the second target data group.
  • Using a processing matrix to process the first target data group and the second target data group can improve processing efficiency.
  • the processing matrix can be understood as a logical matrix.
  • the embodiment of the present application does not limit whether the actual physical positions of each processing unit in the processing matrix are arranged in rows and columns.
  • the first direction and the second direction can be understood as logical directions in the processing matrix.
  • the processing unit can transmit the first data to the next processing unit along the first direction only when the key value in the first data is not equal to the key value in the second data, and transmit the second data to the next processing unit in the first direction. Data is transferred to the next said processing unit in the second direction. Therefore, data transmission can be reduced, and the amount of calculation can be reduced.
  • the data processing device may further include a filtering matrix, the filtering matrix including v filtering units, the v filtering units are respectively located in each of the v rows of the processing matrix along the first direction along the second direction. After the last processing unit.
  • Each processing unit in the processing matrix is further configured to: when the key value in the first data is equal to the key value in the second data, receive the first data and the next input cycle of the second data, transmit the processing result of the processing unit to the next unit along the second direction, and the unit is the processing unit or the filtering unit, and the The processing result includes equality of the key values.
  • Method 2200 also includes, if the first key value is greater than or equal to the second key value, controlling the v filtering units along the first direction to sequentially output the second key value according to the input period. The processing result corresponding to the target data group.
  • the processing results corresponding to each key value in a certain second data group can be placed in the last section of the second data group. After the data is input to the processing device, the output is unified to improve the flexibility of processing result output.
  • the data processing device may also include a compressed triangular matrix.
  • the compression triangular matrix includes v rows of compression units along the first direction, and the number of the compression units increases row by row along the first direction.
  • the v-row compression unit may be located behind one filter unit along the second direction respectively.
  • Each compression unit among the plurality of compression units is configured to receive the processing result output by the filtering unit before the compression unit along the second direction, or to receive a row along the first direction.
  • the processing result output by the compression unit is configured to receive the processing result output by the filtering unit before the compression unit along the second direction, or to receive a row along the first direction.
  • Each compression unit in the plurality of compression units is configured to, in the next input cycle in which the processing result is received, transmit the processing result to the compression unit in the next row along the first direction.
  • the processing result corresponding to a certain second data group can be output in the same input cycle, thereby improving the flexibility of processing result output.
  • Method 2200 can be applied to graph mining.
  • Each processing matrix is also used to output a processing result when the key value in the first data is equal to the key value in the second data, and the processing result is used to indicate the For a query point in the relationship diagram, the relationship between the query point and the two target points conforms to the preset situation.
  • the default situation may be to have a relationship with both the first target point and the second target point, to have a relationship with the first target point but not to the second target point, to have no relationship to the first target point but to have a relationship with the second target point.
  • One of the four situations having a relationship with the first target point and having no relationship with the second target point.
  • the preset situation can be determined based on the subgraph model in graph mining.
  • a query point with a preset connection status between two target points in the relationship graph can be determined, so that a subgraph with a specific subgraph model structure in the relationship graph can be identified.
  • the processing unit may no longer compare the data of the first target point and the key value of the second target point.
  • the data is transmitted, thereby reducing the amount of computation.
  • each processing unit may be used to, when the key value in the first data is not equal to the key value in the second data, The first data is transmitted to the next processing unit along the first direction, and the second data is transmitted to the next processing unit along the second direction.
  • the number of points in the point set corresponding to different key values in each data of the target point may be the same or different.
  • the point set corresponding to each key value may include one or more points.
  • the number of points in the point sets corresponding to different key values may all be 1, and then each data of the target point may only include key values.
  • Different key values for a target point may be used to indicate different points in the relationship graph that have a relationship with the target point, or different key values for the target point may be used to indicate different points in the relationship graph that do not have a relationship with the target point.
  • Whether the point indicated by the key value of the target point has a relationship with the target point can be determined, for example, based on the relationship between the query point that needs to be determined and the two target points, which is not limited in this embodiment of the present application.
  • the number of points in the point set corresponding to different key values may be multiple.
  • Each data of the target point may include a key value and a relationship value group of the target point corresponding to the key value.
  • the relationship value group of the target point corresponding to the key value indicates whether the target point has a relationship with each point in the point set corresponding to the key value.
  • the first data also includes a first relationship value group of the first target point corresponding to the key value
  • the second data also includes a first relationship value group of the second target point corresponding to the key value. Second relational value group.
  • Each processing unit in the processing matrix is also configured to: when the key value in the first data is equal to the key value in the second data, perform a processing on the first relationship value
  • Each bit of the group and the second relationship value group performs a preset operation respectively, and the same bits in the first relationship value group and the second relationship value group corresponding to the equal key value correspond to the equal key value.
  • the points corresponding to the values are the same points in the set, and the result of the preset operation of each bit is used to indicate whether the relationship between the point corresponding to the bit and the two target points meets the requirements. Describe the default situation.
  • the key value in the first data is equal to the key value in the second data, that is, the key value in the first data and the key value in the second data indicate the same set of points. Therefore, according to the relationship between the first relationship value group and the second The result of a bitwise preset operation on a group of relational values that determines the query point.
  • processing results of the processing unit may also include results of preset operations performed on each bit of the first relational value group and the second relational value group.
  • each filtering unit in the filtering matrix is used to determine whether the query point exists according to the received processing result, and output the processing result that the query point exists.
  • the filter matrix can be used to filter the processing results output by the processing matrix, and determine the processing results for which the query point exists among each processing result output by the processing matrix.
  • the key values in each first data group are arranged from small to large; in the case of the data groups in each data set When arranged in the second order, the key values in each first data group are arranged from large to small.
  • the data in each data set can be arranged in order from small to large or from large to small by key value. Therefore, the data groups in the data collection can be stored in the divided format. Alternatively, a device performing method 2000 may partition the data set.
  • At S2210 at least one first data and at least one second data can be obtained, and the at least one first data is is the first target data group, and the at least one second data is the second target data group.
  • at least one first data obtained each time can be used as a first target data group, and at least one second data obtained each time is a second target data group.
  • the at least one first data that has not been obtained can be used as one or more first data groups, and the at least one second data that has not been obtained can be used as one or more second data groups.
  • v pieces of first data and/or v pieces of second data can be acquired each time, thereby improving computing processing efficiency.
  • Each data in each data set is arranged in order of key value, making the division of data groups more flexible.
  • At least one data whose number does not exceed v may be obtained from the first data set as the first target data group.
  • each second data group is arranged from small to large along the second order.
  • equal key values among at least one key value of the first target point and at least one key value of the second target point are determined.
  • each point has a relationship with only a small number of points in the relationship diagram. That is to say, if each bit in the relationship vector of each point is used to indicate whether the point has a relationship with each point in the relationship diagram, where "1" means there is a relationship and "0" means there is no relationship, then the Relation vectors can be considered sparse data.
  • each point in the relationship diagram into multiple point sets, each point set corresponding to a key value.
  • the relationship value group of the point corresponding to each key value can be used to indicate whether each point in the point set corresponding to the key value has a relationship with the point. If there is a point corresponding to the point in a certain point set, the key value corresponding to the point set and the relationship value group of the point corresponding to the key value can be used as a key-value pair of the point.
  • Data compression can be achieved by using key-value pairs of each point in the relationship graph to represent the relationship graph.
  • Method 500 uses the key-value pairs of each point in the relationship diagram to determine the equal key value in the key-value pair of the two target points in the relationship diagram, and determines the relationship value group based on the relationship value group of the two points corresponding to the equal key value.
  • the relationship between the two target points conforms to the query point of the preset situation. While reducing the amount of data used to represent the relationship diagram, the processing time required to determine query points can be reduced.
  • the point with the same column index in the queue of two points is determined to determine the point that has a relationship with both points.
  • the key value in the key value pair represents a point set that has a relationship with the target point.
  • the number of key value pairs is generally smaller than the number of column indexes, so that , the comparison of key value sizes can reduce the amount of calculation and reduce the calculation time.
  • Point sets corresponding to different key values may or may not include the same points.
  • the point sets corresponding to different key values do not include the same points, which can further improve the degree of data compression and reduce the storage space occupied by storing key-value pairs used to represent each point of the relationship graph.
  • relationship graph data may be obtained, and at least one key-value pair of each of the two target points may be determined based on the relationship graph data.
  • Relationship graph data includes row offset vectors, key value vectors, and relationship value vectors.
  • a key value vector includes at least one key value for each point in the relationship graph.
  • the row offset vector is used to indicate the position of at least one key value for each point in the key value vector.
  • the row offset vector may include multiple offset information, each offset information is used to represent at least one point The starting position of a key value in the key value vector.
  • the order of the offset information of each point in the relationship diagram in the row offset vector can be the same as the order of the key values of each point in the key value vector.
  • the offset information can be a sequence number.
  • the key value of each point is continuous in the key value vector.
  • the offset information of a certain point is used to indicate the starting sequence number of at least one key value of the point in the key value vector.
  • the relationship value vector includes the relationship value group of the point corresponding to the key value of each point in the relationship graph.
  • the order of at least one key value of each point in the key value vector is the same as the order of the relationship value group of the point corresponding to the key value of each point in the relationship value vector.
  • relationship graph data to represent relationship graphs can further improve data compression.
  • each key point of the target point can be determined based on the position of at least one key value of the target point in the key value vector and the number of equal points in the point set.
  • the relational value group corresponding to the value. This makes it easier to determine the key-value pair of the target point based on the relationship graph data.
  • preset operations may be performed on each bit of the relationship value group of the first target point and the relationship value group of the second target point corresponding to the equal key value.
  • the same bit corresponds to the same point in the point set corresponding to the key value.
  • the result of the preset operation on each bit is used to indicate whether the relationship between the point corresponding to the bit and the two target points meets the preset situation.
  • a processing matrix may be used to process at least one key-value pair of the first target point and at least one key-value pair of the second target point.
  • the first direction is a direction from the second edge to the inside of the processing matrix and perpendicular to the second edge
  • the second direction is a direction from the first edge to the inside of the processing matrix and perpendicular to the second edge.
  • a relationship graph consists of multiple points that are connected to each other, as well as edges that connect the points.
  • the relationship graph can be represented as the neighbor vectors of each point in the relationship graph.
  • Each neighbor vector contains the same number of bits.
  • the i-th bit in the neighbor vector of each point is used to indicate whether there is a relationship between the point and the i-th point in the relationship graph. That is to say, the i-th position in the neighbor vector corresponds to the i-th point in the relationship graph. You can use "0" to indicate that there is no relationship, and "1" to indicate that there is a relationship.
  • the neighbor vectors can be divided into multiple groups according to a preset method.
  • the neighbor vector can be divided into multiple groups according to the remainder of dividing the sequence number of each bit in the neighbor vector by a certain divisor, and different groups correspond to different remainders.
  • the divisor can be a preset value.
  • each neighbor vector can be divided by a fixed number of bits in order from left to right. If the number of digits in the neighbor vector cannot be divided into the number of digits in the group, "0" can be added to the end of the neighbor vector to make the number of digits in each group the same.
  • the following is an example of dividing the neighbor vector according to the number of bits 2.
  • the bits corresponding to point 0 and point 1 in each neighbor vector are one group, the bits corresponding to point 2 and point 3 are one group, and so on to complete the division of the neighbor vector.
  • the key value array of point 0 can include 0, 1, and 3
  • the key value array of point 1 can include 0, 1, 2, and 3.
  • each key value can be arranged in a preset order.
  • each key value can be arranged in ascending or descending order. The following takes the key values array in ascending order as an example to illustrate.
  • Relationship graph data includes row offsets, key value vectors, and relationship value vectors.
  • a relationship value vector contains an array of key values for each point in the relationship graph.
  • the jth relationship value array in the relationship value vector is the relationship value array of the jth point in the relationship graph.
  • the relation value vector contains the corresponding groups for each key value array.
  • the order of the groups in the relationship value vector is the same as the order of the key value array. Relational value vectors can also be called value data.
  • the key value vector can also be called key data.
  • relationship graph data can be called bitmap with compressed sparse row (BCSR) format.
  • BCSR compressed sparse row
  • the neighbor matrix consists of multiple rows, each field representing a neighbor vector of a point.
  • compression of the neighbor matrix can be achieved. Especially when the proportion of "0" in the neighbor matrix is high, it has better compression effect.
  • the relationship graph data generated by method 600 can be processed using the data processing method shown in Figure 6 .
  • Figure 6 is a schematic flow chart of a data processing method provided by an embodiment of the present application.
  • Method 700 includes S701 to S702.
  • Relationship diagram data is used to indicate whether there is a relationship between various points in the relationship diagram.
  • Diagram data includes row offsets, relationship Key value vector, relationship value vector.
  • the key-value pairs of each point in the relationship diagram can be determined, that is, the key value array of each point and the group corresponding to each key value in the key value group of each point can be determined.
  • the key value array "0, 1, 3" of point 0 and the key value array "0, 1, 2, 3" of point 1 can be determined.
  • the group corresponding to key value 0 is 01
  • the group corresponding to key value 1 is 01
  • the group corresponding to key value 3 is 11.
  • the group corresponding to key value 0 is 10
  • the group corresponding to key value 1 is 10
  • the group corresponding to key value 2 is 11, and the group corresponding to key value 3 is 10.
  • the default calculation method is a ⁇ b, where a and b represent the value of the two points at a certain position; and point A has Relationship, if there is no relationship with point B, the default calculation method is a ⁇ b, where a represents the value of point A in a certain position, and b represents the inversion of the value of point B in that certain position. Therefore, the bit whose calculation result is 1 can be understood as the bit whose relationship between the two points satisfies the preset condition.
  • the following is an example of determining the common neighbors of two points.
  • the group corresponding to point 0 is 01, and the group corresponding to point 1 is 10. Perform a bitwise comparison of "01" and "10" to determine the bit whose bitwise AND operation result is 1.
  • a bitwise AND operation can be performed on the groups corresponding to each equal key value of the two points, and the bit whose calculation result is 1 is the bit corresponding to the two points that both have a relationship.
  • the point set corresponding to the equal key values can be determined. Then, based on the position of the bit in the group to which the calculation result is 1, the point in the relationship diagram corresponding to the bit can be determined.
  • the partitioning method of groups of neighbor matrices can also be obtained. According to the way of dividing the groups of the neighbor matrix, the key value corresponding to the group including the bit whose calculation result is 1, and the position of the bit whose calculation result is 1 in the group to which it belongs, determine the relationship diagram corresponding to the bit. point.
  • the group corresponding to the key value 1 For example, by comparing point 0 and point 1 in the group corresponding to the key value 0, the group corresponding to the key value 1, and the group corresponding to the key value 3, it can be determined that there is no calculation in the group corresponding to the key value 0 and 1.
  • the bit that evaluates to 1, the bit that evaluates to 1 in the group with key value 3 is the 0th bit in the group.
  • the corresponding group of the key value 3 in the neighbor matrix includes point 6 and point 7, among which the point corresponding to the 0th position is point 6. Therefore, it can be determined that the common neighbor of point 0 and point 1 is point 6.
  • Determining equal key values in the key value arrays corresponding to two points can be achieved by using the queue-based set merging shown in Figure 3. Determining the equal bits of two points in the group corresponding to each equal key value can be achieved by using bitwise comparison.
  • bitwise comparison process the comparison of multiple bit values can be performed in parallel, and the calculation efficiency is high.
  • determining the equal key values in the key value arrays corresponding to the two points in method 700 can effectively reduce the number of numbers that need to be compared. quantity, Improve processing efficiency.
  • Figure 7 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • the data processing device 800 includes a processing matrix (PM) 810, a filter array (FA) 820, and a compression triangle (CT) 830.
  • PM processing matrix
  • FA filter array
  • CT compression triangle
  • the processing matrix 810 includes v ⁇ v processing elements (PEs), which are used to process key-value pairs of two points.
  • PEs v ⁇ v processing elements
  • the key-value pair of each point includes the key value array of the point and the group corresponding to each key value.
  • the leftmost v processing units in the processing matrix 810 can receive multiple key-value pairs of point A in the relationship graph, and different processing units are used to receive different key-value pairs of the key-value pairs of A.
  • the uppermost v processing units in the processing matrix 810 can receive multiple key-value pairs of point B in the relationship graph, and different processing units are used to receive different key-value pairs of the key-value pairs of B.
  • Point A and point B are different points.
  • the v processing units on the left are in order from top to bottom. One clock cycle after the previous processing unit receives the key-value pair, the next processing unit receives the key-value pair.
  • the top v processing units are in order from left to right. One clock cycle after the previous processing unit receives the key-value pair, the next processing unit receives the key-value pair.
  • the processing unit is used to compare the key values in the received key-value pairs of the two points. It can also be understood that the processing unit is used to match the key values in the received key-value pairs of the two points. If the key values in the received key-value pairs of the two points are equal, the key values of the two points can be considered to be matched successfully.
  • the processing unit performs a preset calculation method on the group in the key-value pair of the two points to determine the bits in the group that meet the relationship corresponding to the preset calculation method, and outputs the key value and Calculation result output.
  • the processing unit will no longer perform calculations and comparisons in the preset calculation method on the key value pairs of the two points, and will receive the key value pair from the adjacent processing unit on the left in the next clock cycle. Transmit to the adjacent processing unit on the right, and transmit the key value received from the adjacent processing unit on the upper side and the group corresponding to the key value to the adjacent processing unit on the lower side.
  • the key-value pair A0 of point A is input from the processing unit PE 0 in the first row and first column on the left, and the key-value pair B0 of point B is input from the upper side. PE 0.
  • PE 0 compares the key value in key-value pair A0 with key-value pair B0. If the key values in key-value pair A0 and key-value pair B0 are equal, PE 0 performs a preset calculation method on each bit of the group in key-value pair A0 and key-value pair B0. PE 0 will no longer transmit key-value pair A0 and key-value pair B0.
  • the processing matrix 810 may output the equal key values and calculation results.
  • PE 0 can no longer perform the preset calculation method on the group in key-value pair A0 and key-value pair B0.
  • PE 0 can transmit the key-value pair A0 to PE 1 on the right side, that is, the first row and the second column, and transmit the key-value pair B0 to the lower side, that is, the second column, at time point t1. Row 1 of PE 2. Moreover, at time point t1, the key-value pair A1 of point A is input to the processing unit PE 2 in the second row and first column from the left, and the key-value pair B1 of point B is input to PE1 from the upper side.
  • PE 1 can compare key values in key-value pair A0 and key-value pair B1. If the key-value pair A0 is the same as The key values in the key-value pair B1 are equal, and PE 1 performs a preset calculation method on each bit in the group of the key-value pair A0 and the key-value pair B1. PE 1 will no longer transmit key-value pair A0 and key-value pair B1.
  • the processing matrix 810 can output the key value and the calculation result.
  • PE 1 can no longer determine the equal bits in the group of key-value pair A0 and key-value pair B1, and can change the key-value pair at time point t2 A0 is transmitted to the PE on the right, and the key-value pair B1 is transmitted to PE 3 on the lower side. If there are no other PEs on the right side of PE 1, PE 1 can delete the key-value pair received from the left side, that is, delete the key-value pair A0.
  • PE 2 can compare the key values in key-value pair A1 and key-value pair B0. If the key values in key-value pair A1 and key-value pair B0 are equal, PE 2 performs a preset calculation method on each bit of the group in key-value pair A1 and key-value pair B0. PE 2 will no longer transmit key-value pair A1 and key-value pair B0.
  • the processing matrix 810 may output the equal key values and the calculation results for each bit.
  • PE 2 can no longer perform the preset calculation method on the group in key-value pair A1 and key-value pair B0, and can calculate the key value at time point t2.
  • the key-value pair A1 is transferred to PE3 on the right, and the key-value pair B1 is transferred to the PE on the lower side. If there are no other PEs on the lower side of PE 2, PE 2 can delete the key-value pair received from the upper side, that is, delete the key-value pair B0.
  • PE 3 can compare the key values in key-value pair A1 and key-value pair B1.
  • Figure 9 is a schematic structural diagram of a processing unit provided by an embodiment of the present application.
  • the processing unit 1000 includes a comparison unit 1010 and a calculation unit 1020.
  • the comparison unit 1010 is used to compare the key value kt input from above with the key value kl input from the left.
  • the comparison unit 1010 can be understood as a key-value comparator.
  • the calculation unit 1020 is configured to perform a preset calculation method on each bit of the group vt input from above and the group vl input from the left when the key value kt is equal to the key value kl.
  • the output data of the processing unit 1000 may include the key value kt and the results of calculations of respective bits of the two groups input to the processing unit 1000 from above and from the left.
  • the output data of the processing unit 1000 can be transmitted to the processing unit located on the right side of the processing unit 1000 .
  • the comparison unit 1010 may also receive valid indication 1 input from the left side.
  • Valid indication 1 is used to indicate whether the key value of point A input to the row of the processing unit located on the left side of the processing unit 1000 successfully matches the key value of point B.
  • the comparison unit 1010 and the calculation unit 1020 in the processing unit 1000 may no longer perform operations. Thus, the amount of calculation can be reduced.
  • the processing unit 1000 may also output a valid indication 2.
  • the valid indication 1 indicates that the matching is successful, or the key values kt and kl are equal
  • the valid indication 2 output by the processing unit 100 indicates that the row where the processing unit 1000 is located has been successfully matched.
  • the valid indication 2 output by the processing unit 1000 may be transmitted to the processing unit on the right side of the processing unit 1000 . For example, if the valid indication 1 and the valid indication 2 are "1", it can mean that the matching is successful; conversely, if the valid indication 1 and the valid indication 2 are "0", it can mean that the matching is not successful.
  • the processing unit 1000 may receive the key value kt and group vt input from above, and the key value kl and group vl input from the left.
  • the comparison unit 1010 is used to compare key values kt is compared with the key value kl.
  • the valid indication 2 output by the processing unit 1000 indicates that the matching is not successful.
  • the calculation unit 1020 is configured to perform a preset calculation method on each bit of the group vt and the group vl respectively.
  • the processing unit 1000 outputs the valid indication 2 to indicate successful matching, and the processing unit 1000 outputs the key value of successful matching and the calculation result of the group.
  • the processing unit 1000 may receive the key value output by the left processing unit and the calculation result of the group, and output indication information 2 to indicate that the match is successful, and output the key value and the calculation result of the group. .
  • the number of key-value pairs for point A and/or point B may exceed v.
  • the number of key-value pairs exceeding v can be divided into multiple key-value pair sets, and each key-value pair set The number of key-value pairs does not exceed v.
  • the processing matrix 810 may include 3 ⁇ 3 processing units, the key values of point A include 1, 4, 7, 8, 10, etc., and the key values of point B include 1, 2, 3, 4, 5, 6, 7, 8 , 9 etc.
  • each key-value pair set includes 3 key-value pairs.
  • the key values in the two key value sets of point A are 1, 4, 7 and 8, 10 respectively.
  • the key values in the three key value sets of point B are 1-3, 4-6 and 7-9 respectively.
  • a size comparison can be performed on the largest key value in the two sets of key-value pairs input to the processing matrix 810.
  • the key value pair set to which the larger key value of the two maximum key values belongs is input into the processing matrix 810 again, and another point is entered into the processing matrix 810.
  • the set of key-value pairs with the next smallest key value is input into the processing matrix 810.
  • the key value 7 in the key-value pair set of point A is greater than the key value 3 in the key-value pair set of point B, and will include key values 1, 4, and 7.
  • the set of key-value pairs of point B with values 4-6 are input into the processing matrix 810 from the left and upper sides respectively.
  • the key-value pair set of point A including key values 1, 4, and 7, and the key-value pair set of point B including key values 1-3 are input into the processing matrix 810 starting from the same clock cycle.
  • the key-value pair set of point A including key values 1, 4, and 7, and the key-value pair set of point B including key values 4-6 also start to be input to the processing matrix 810 in the same clock cycle.
  • the time difference can be one or more clock cycles.
  • the filter array 820 can be used to compare a certain key-value pair set with key values in multiple other key-value pair sets. The compared results are merged to output an intermediate result corresponding to the key-value pair.
  • the filter array 820 includes v filter units (FU).
  • each filter unit may correspond to a row in the processing matrix 810.
  • the filter array 820 is used to merge the key value comparison results of the key-value pair set with other key-value pair sets.
  • the filtering unit can also be used to determine whether there are points that meet the preset relationship based on the calculation results of each bit of the group vt and the group vl. Points that meet the preset relationship, that is, points corresponding to bits whose calculation result is 1.
  • Figure 11 is a schematic structural diagram of a filter unit provided by an embodiment of the present application.
  • the filtering unit 1200 includes a logical processing unit 1210, a valid indication update unit 1220, a valid indication register 1230, and a result register 1240.
  • the logical processing unit 1210 is used to determine whether the calculation result of the pair output by the processing unit 1000 in the row of the processing matrix 810 corresponding to the filtering unit 1200 is all "0".
  • the logic processing unit 1210 may output "0"; conversely, when the calculation result for the group is not all "0", the logic processing unit 1210 may output "1" ".
  • the valid indication update unit 1220 may receive the output of the logical processing unit 1210 and the valid indication 2 output by the last processing unit 1000 in the row of the processing matrix 810 corresponding to the filtering unit 1200.
  • the last processing unit 1000 outputs a valid indication 2 of "0" (that is, indicating that the row of the processing unit 1000 corresponding to the filtering unit 1200 has not successfully matched the key value), or
  • the logic processing unit 1210 outputs "0"
  • the result output by the valid indication update unit 1220 is "0"
  • the indication output is invalid.
  • the valid indication register 1230 is used to store the result output by the valid indication update unit 1220.
  • the result register 1240 is used to store the calculation result of the group when the output of the logical processing unit 1210 is "1".
  • the filtering unit 1200 can also obtain the signal F and the signal L. Signals F and L can be stored in registers. The initial value of signal F is 1, and the initial value of signal L is 0.
  • the maximum key value in the key-value pair set of point A and the key value of point B can be determined.
  • the key-value pair set input period may be equal to v times the period of each key-value pair input processing matrix in each key-value pair set.
  • signal F is set to the input period value of "0" in the next set of key-value pairs, and signal L is immediately set to "0"; otherwise , the signal F is set to have a value of "1" in the next key-value pair set input period, and the signal L is set to "1" immediately.
  • the signal F is set to have a value of "0" in the next key-value pair set input period, and the signal L is immediately set to "0" ;
  • the maximum key value corresponding to point A is equal to the maximum key value corresponding to point B
  • signal F is set to have a value of "1” in the next key-value pair set input period, and signal L is immediately set to "0”
  • signal F is set to have a value of "1” in the next set of key-value pair input cycles, and signal L is set to "1" immediately.
  • the valid indication updating unit 1220 may be based on the output of the logical processing unit 1210 and the processing matrix 810 corresponding to the filtering unit 1200
  • the valid indication 3 is updated with the valid indication 2 output by the last processing unit 1000 in the row and the result stored in the valid indication register 1230.
  • a new valid indication result can be determined.
  • the new valid indication result may be valid, that is, the valid indication 3 indicates valid; conversely, when either of the output of the logical processing unit 1210 and the valid indication 2 output by the last processing unit 1000 in the row of the processing matrix 810 corresponding to the filtering unit 1200 is "0", the new one is valid.
  • the indication result may be invalid, that is, valid indication 3 indicates invalid.
  • the results stored in the valid indication register 1230 can be understood as historical valid indication results.
  • the result output by the valid indication update unit 1220 may be "1", that is, the indication is valid. That is to say, the AND operation can be performed on the new effective instruction result and the historical effective instruction result.
  • the valid indication register 1230 is used to store the result output by the valid indication updating unit 1220, and update the output of the valid indication register 1230 (ie, valid indication 3).
  • the filtering unit 1200 writes the calculation result of the new group into the result register 1240.
  • the result register 1240 may output the stored data after the filtering unit 1200 completes updating the calculation result of the group stored in the result register 1240 .
  • the result register 1240 when the signal L is set to "1”, if the valid indication result stored in the valid indication register 1230 is "1", the result register 1240 outputs the stored data. On the contrary, if the valid indication result stored in the valid indication register 1230 is "0", the result register 1240 may not output data.
  • the result register 1240 and the valid indication register 1230 can be cleared of stored data.
  • the key-value pair set at point A of the input processing matrix 810 in this period is non-repeatedly input.
  • the result register 1240 may be initialized when the signal F is "1".
  • initialization may be clearing of data stored in result register 1240.
  • the initialization may be to write the group in the key-value pair of point A input in the row of the processing matrix corresponding to the filtering unit 1200 into the result register 1240. Therefore, when the output of the logical processing unit 1210 is "1", the calculation result of the new group can be ANDed with the data in the result register 1240, thereby updating the calculation result of the group.
  • Data processing apparatus 800 may also include a controller (not shown). The controller is used to set signal F and signal L.
  • the compression triangle 830 includes v ⁇ (v+1)/2 compression units (CU) forming a right triangle and is used to compress the data output by the filter array 820 .
  • CU compression units
  • each row of the compression triangle 830 corresponds to a filter unit in the filter matrix 820 .
  • the interface 1401 is used to connect the compression unit above the compression unit 1400.
  • the interface 1402 is used to connect the compression unit on the upper left side of the compression unit 1400 .
  • the interface 1403 is used to receive the valid indication 3 output by the filtering unit corresponding to the row where the compression unit 1400 is located.
  • the 1st, 3rd, and 4th filtering units in the filtering matrix 820 output valid data, where the valid data output by the 1st filtering unit includes the key value 1 and the calculation result of the group corresponding to the key value 1, and the 3rd filtering unit outputs valid data.
  • the valid data output by the filtering unit includes key value 3 and the calculation results of the group corresponding to key value 3.
  • the valid data output by the fourth filtering unit includes key value 9 and the calculation results of the group corresponding to key value 9.
  • Each compression unit in row 2 of compression triangle 830 obtains data from the compression unit located above the filter unit in row 1 of compression triangle 830 .
  • the first compression unit in the second row of the compression triangle 830 acquires the data in the compression unit in the first row; there is no compression unit above the first compression unit in the second row, and no data acquisition is performed.
  • the fourth filter unit outputs valid data.
  • the first compression unit in the fourth row of the compression triangle 830 receives the valid data output by the fourth filter unit.
  • Other compression units in the 4th row of the compression triangle 830 obtain data from the compression unit located at the upper left of the filter unit in the 2nd row of the compression triangle 830, that is, the 2-4 compression units in the 4th row of the compression triangle 830 obtain the data respectively.
  • the compression triangle 830 can output the data corresponding to each row in the same clock cycle.
  • the data processing device 800 can be implemented based on a dual inline memory module (DIMM), for example, it can be set in a low to medium load dual inline memory module (load reduced dual inline memory module, LRDIMM), thereby forming Near-storage computing architecture.
  • DIMM dual inline memory module
  • LRDIMM load reduced dual inline memory module
  • the data processing device 800 may also include a counting unit.
  • the counting unit is used to count subgraphs that meet the requirements.
  • the counting unit may record the number of key values output by the compression triangle 830. Moreover, for different input data, the counting unit can accumulate counts.
  • Figure 15 is a schematic structural diagram of a data processing system provided by an embodiment of the present application.
  • Data processing system 1600 includes a controller 1610 and a processing matrix 1620.
  • the processing matrix 1620 includes v ⁇ v processing units, where v is a positive integer greater than 1.
  • the controller 1610 is configured to sequentially input at least one key-value pair of the first target point among the two target points of the relationship graph along the second direction according to the input period into a plurality of processes located at the first edge of the processing matrix 1620 unit, and sequentially input at least one key-value pair of the second target point among the two target points along the first direction according to the input period into a plurality of processing units located at the second edge of the processing matrix 1620.
  • the input period can be the clock period or a positive integer multiple of the clock period.
  • At least one key-value pair of the first target point and at least one key-value pair of the second target point are input into the processing matrix starting from the same input period.
  • the first edge is adjacent to the second edge, the first direction is a direction away from the second edge, and the second direction is a direction away from the first edge.
  • the first edge and the second edge can be understood as two adjacent edges of the processing matrix 1620 .
  • Each key-value pair of the target point includes a key value of the target point and a relationship value group of the target point corresponding to the key value.
  • Different key values correspond to different points in the relationship diagram. set, the relationship value group of the target point corresponding to the key value of the target point is used to indicate whether the target point has a relationship with each point in the point set corresponding to the key value, in each of the target points In the point set corresponding to the key value in each key-value pair, there is a point that has a relationship with the target point.
  • Each processing unit is also configured to: according to the input cycle, transmit the key-value pair of the first target point to the next processing unit along the first direction, and transmit the key-value pair of the second target point The value pair is transmitted to the next processing unit along a second direction, the first direction being perpendicular to the first edge, and the second direction being perpendicular to the second edge.
  • the processing matrix 1620 can be understood as a systolic array architecture.
  • a systolic array structure data "flows" rhythmically among the array's processing units in a predetermined “streaming” manner. During the flow of data, all processing units process the data flowing through it simultaneously and in parallel, so it can achieve very high parallel processing speeds.
  • the key values in each key-value pair of a target point are different.
  • the processing unit transmits the key values of the first target point and the second target point.
  • the key value pair is no longer transmitted. Therefore, the equal key value can no longer be compared with other key values, thereby reducing the amount of calculation.
  • the controller 1610 is further configured to determine at least one key-value pair of the first target point and at least one key-value pair of the second target point according to the relationship graph data, where the relationship graph data includes a row offset.
  • vector, key value vector, relationship value vector, the key value vector includes at least one key value for each point in the relationship graph, and the row offset vector is used to indicate at least one key value for each point.
  • the relationship value vector includes a relationship value group of points corresponding to the key value of each point in the plurality of points. At least one key value of each point is within the key value.
  • the order in the vector is the same as the order in the relationship value vector of the relationship value groups of the points corresponding to the key values of each point.
  • the controller 1610 is further configured to,
  • the controller 1610 is further configured to perform multiple iterations until the maximum key value in the third key-value pair group is greater than the maximum key value in the fourth key-value pair group.
  • the controller is specifically configured to, before performing the multiple iterations and after each iteration, input multiple key-value pairs of the third key-value pair group sequentially according to the input cycle into the processing matrix.
  • the plurality of processing units at the first edge sequentially input the plurality of key-value pairs of the fourth key-value pair group into the plurality of processing units located at the second edge of the processing matrix according to the input period. multiple processing units.
  • the device 1600 may also include the filter array 820 and the compression triangle 830 shown in FIG. 7 .
  • Data processing system 1700 includes device 800 and memory 1710.
  • the memory is used to store relationship graph data and processing results obtained by the device 800 processing the relationship graph data.
  • Device 800 may be an ASIC.
  • the memory 1710 can be understood as an off-chip memory of the device 800 .
  • memory 1710 may be off-chip memory of device 800 .
  • the data processing system 1700 can be understood as a computing system in which storage and computing are separated.
  • the address index module 1810 may be a registering clock driver (RCD).
  • the storage module 1830 may include a KVP data area, a KVP address index area, and a result data area.
  • the KVP data area is used to store the KVP of each point in the relationship diagram.
  • the KVP of each point in the relationship diagram may be determined based on the relationship diagram data.
  • the relationship graph data includes row offset vectors, key value vectors and relationship value vectors.
  • the KVP address index area is used to store row offsets (row data).
  • the row offset stored in the KVP address index area may be a row offset vector in the relationship graph data. That is to say, each number in the row offset is stored in the KVP address index area in the form of an index address.
  • Each point in the relationship diagram corresponds to an offset information in the row offset. According to the offset information of each point, the index address of the KVP of the point in the KVP data area can be determined.
  • the result data area is used to store the intermediate calculation results, final calculation results, etc. of NMC 1820.
  • the result data area may include registers 1230 and result registers 1240 that effectively indicate each filtering unit 1200, a register 1404 of each compression unit 1400, and the like.
  • the address index module 1810 is used to determine the KVP of each target point based on the sequence number in the row offset. Index address.
  • Each number in the row offset corresponds to a point in the relationship diagram.
  • the address index module 1810 may determine the index address of the point based on the base address + the sequence number of the target point ⁇ offset.
  • the index address may include a chip select (CS) signal and a command address signal (command/address, C/A).
  • CS chip select
  • C/A command address/address
  • NMC 1820 can determine the location where the KVP of this point is stored in the KVP data area based on the index address. NMC 1820 can execute method 500 and method 700. NMC 1820 may include data processing apparatus 800, or NMC 1820 may be data processing system 1600. NMC 1820 can be called DIMMining.
  • Data processing system 1800 may be an LRDIMM.
  • the data processing system 1800 is implemented without changing the traditional storage function of the LRDIMM and without modifying the internal circuit of the dynamic random access memory (dynamic random access memory, DRAM) chip (chip).
  • dynamic random access memory dynamic random access memory, DRAM
  • the NMC 1820 includes a controller, a data forwarding unit, and a data processing device 800.
  • the storage module 1830 may also include a data buffer (DB), a cache (cache), etc.
  • DB data buffer
  • cache cache
  • KVPs of frequently accessed points can be stored in the cache. For example, when the access frequency of a certain point's KVP is higher than the first preset value, the KVP of that point can be stored in the cache; when the access frequency of a certain point's KVP is lower than the second preset value, the KVP of that point can be stored in the cache. The KVP at this point can be deleted from the cache.
  • the second preset value may be less than or equal to the first preset value.
  • the access frequency of KVP at a certain point can be determined based on the number of accesses within a period of time. The length of the period of time may be preset.
  • the data forwarding unit (data forwarding) is used to obtain data in the DRAM chip, DB or cache (cache), and input the data into the processing array 810 of the data processing device 800.
  • the data processing system 1800 provided by the embodiment of this application is based on the near-memory graph mining computing architecture of DIMM and can realize memory rank-level parallel computing. Multiple DIMM ranks can read and calculate in parallel, which can improve computing efficiency. Moreover, compared with the data processing system 1700, the data processing system 1800 avoids frequent data transfer between the CPU and the memory, and multiple ranks do not need to compete for the right to use the memory bus. Therefore, the near-storage computing architecture can achieve significant performance improvements compared to the traditional CPU+memory separation of storage and calculation architecture.
  • Figure 18 is a schematic diagram illustrating the performance comparison of the data processing system provided by the embodiment of the present application.
  • System 1800 avoids frequent data transfers between the CPU and memory, reduces communication time, and can achieve significant performance improvements.
  • Figure 20 is a schematic structural diagram of a control device of a data processing device provided by an embodiment of the present application.
  • the acquisition module 2010 is configured to acquire a first target data group and a second target data group.
  • the first target data group is a first data group among multiple first data groups of the first data set.
  • the second target data group is a first data group.
  • the target data group is a first data group in at least one second data group of the second data set, and each data group in each of the first data set and the second data set includes at least one Data, each data includes a key value, the data groups in each data set are arranged in a first order or a second order, and in the case of being arranged in the first order, any one of the data groups in each data set
  • Each key value in the data group is smaller than each key value in the data group located after any of the data groups, and any one of the data groups in each data group is arranged in the second order.
  • Each key value in is greater than each key value in the data group that follows any of the data groups.
  • the processing module 2020 is configured to perform multiple iterations.
  • Each iteration includes inputting the first target data set and the second target data set into the data processing device, the data processing device being used to determine the first target data set and the second target data The key values that are equal in .
  • the number of at least one second data group is multiple.
  • the data processing device includes a processing matrix, the processing matrix includes v ⁇ v processing units, v is a positive integer, at least one of each target data group in the first target data group and the second target data group The amount of data is less than or equal to v.
  • each processing unit in the processing matrix is also configured to transmit the first data to the edge in the next input cycle of receiving the first data and the second data.
  • the processing unit next in the first direction transmits the second data to the processing unit next in the second direction.
  • the processing module 2020 is also configured to, when the first key value is greater than or equal to the second key value, control the v filtering units along the first direction to sequentially output the The processing result corresponding to the second target data group.
  • the data processing device further includes a compression triangular matrix, the compression triangular matrix includes v rows of compression units along the first direction, and the number of the compression units increases row by row along the first direction.
  • the key values in each first data group are arranged from small to large; in each data set, when the data groups are arranged in the second order, the key values in each first data group are arranged from large to small.
  • the device 2000 may also be a data processing device.
  • the processing module 2020 is configured to determine an equal key value among at least one key value of the first target point and at least one key value of the second target point;
  • the acquisition module 2010 is configured to acquire the relationship graph data, which includes a row offset vector, a key value vector, and a relationship value vector; and determine the two target points according to the relationship graph data. at least one key-value pair for each target point in the relationship graph, the key-value vector includes at least one key value for each point in the relationship graph, and the row offset vector is used to indicate at least one key-value pair for each point in the relationship graph.
  • the processing module 2020 is further configured to perform preset operations on each bit of the relationship value group of the first target point and the relationship value group of the second target point corresponding to the equal key value.
  • the same bit in the relationship value group corresponding to different points corresponding to each key value corresponds to the same point in the point set corresponding to the key value, and the preset operation of each bit The result is used to indicate whether the relationship between the point corresponding to the bit and the two target points meets the preset situation.
  • the processing module 2020 includes a controller and a processing matrix.
  • Each processing unit in the plurality of processing units is configured to, when the key value of the first target point is equal to the key value of the second target point, process according to the input
  • the relationship value group of the first target point and the relationship value group of the second target point of the unit determine the query point in the relationship graph, and the relationship between the query point and the two target points
  • the relationship situation matches the default situation.
  • Each processing unit in the plurality of processing units is also configured to transmit the key-value pair of the first target point to the next processing unit along the first direction according to the input cycle. , transmit the key-value pair of the second target point to the next processing unit along the second direction.
  • each processing unit in the plurality of processing units is specifically configured to: when the key value of the first target point is not equal to the key value of the second target point , transmit the key-value pair of the first target point to the next processing unit along the first direction, and transmit the key-value pair of the second target point to the next processing unit along the second direction.
  • each processing unit in the plurality of processing units is specifically configured to: when the key value of the first target point is not equal to the key value of the second target point , transmit the key-value pair of the first target point to the next processing unit along the first direction, and transmit the key-value pair of the second target point to the next processing unit along the second direction.
  • the number of at least one key-value pair of the first target point is greater than v.
  • the controller is also configured to divide at least one key-value pair of the first target point into a plurality of first key-value pairs according to the order of the key values in the key-value pair, and to The size order of the key values in the pair divides at least one key value pair of the second target point into at least one second key value pair group, the plurality of first key value pair groups and the at least one second key value pair.
  • the iteration includes: sequentially inputting at least one key-value pair in the third key-value pair group into a plurality of the processing units located at the first edge among the plurality of processing units according to an input cycle, and At least one key-value pair in the fourth key-value pair group is sequentially input to a plurality of the processing units located at the second edge among the plurality of processing units according to the input cycle; in the third key-value pair group If the largest key value in the fourth key-value pair is smaller than the largest key value in the fourth key-value pair, the third key value in the plurality of first key-value pairs will be arranged in ascending order of key values. The next first key-value pair of the key-value pair serves as the third key-value pair.
  • the controller is further configured to, before performing the multiple iterations and after each iteration, input multiple key-value pairs of the third key-value pair group sequentially according to the input cycle into the processing matrix.
  • the plurality of processing units at the first edge sequentially input the plurality of key-value pairs of the fourth key-value pair group into the plurality of processing units located at the second edge of the processing matrix according to the input period. multiple processing units.
  • Figure 21 is a schematic structural diagram of a control device of a data processing device provided by an embodiment of the present application.
  • the control device 3000 includes a memory 3010 and at least one processor 3020.
  • Memory 3010 is used to store program instructions.
  • the processor 3020 is used to execute the program instructions to implement each step or method or operation or function performed by the data processing apparatus mentioned above.
  • the processor 3020 is configured to obtain a first target data group and a second target data group, where the first target data group is a first data group along a first order among the plurality of first data groups in the first data set.
  • data set the second target data set is a first data set in at least one second data set of the second data set, and each data set of the first data set and the second data set
  • Each data group in each data set includes at least one data, and each data includes a key value.
  • the data groups in each data set are arranged in the first order or the second order. In the case of the first order, Each key value in any of the data groups in each data set below is smaller than each key value in the data group located after any of the data groups, and each key value is arranged in the second order.
  • Each key value in any one of the data groups in the data set is greater than each key value in the data group located after any one of the data groups.
  • the processor 3020 is also configured to perform multiple iterations.
  • Each iteration includes inputting the first target data set and the second target data set into the data processing device, the data processing device being used to determine the first target data set and the second target data The key values that are equal in .
  • Each iteration also includes, in the case where the data groups in each data set are arranged in the first order and the first key value is less than or equal to the second key value, or, in the case where the data groups in each data set are If the data groups are arranged in the second order, and the third key value is greater than or equal to the fourth key value, obtain the first data group located after the first target data group in the first data set.
  • the first key value is the largest key value in the first target data group
  • the second key value is the largest key value in the second target data group.
  • the third key value is the smallest key value in the first target data group
  • the fourth key value is the smallest key value in the second target data group.
  • the number of at least one second data group is multiple.
  • the data groups in each data set are arranged in the first order and the first key value is greater than or equal to the second key value
  • the data groups in each data set If the data groups are arranged in the second order and the third key value is less than or equal to the fourth key value, obtain the second data group located after the second target data group in the second data set.
  • the data set serves as the second target data set.
  • the data processing device includes a processing matrix, the processing matrix includes v ⁇ v processing units, v is a positive integer, at least one of each target data group in the first target data group and the second target data group The amount of data is less than or equal to v.
  • the input of the first target data group and the second target data group into the data processing device includes: inputting the first target data group and the second target data group into a processing matrix according to the input rules, the The input rule is such that at least one of the data in the second target data group is sequentially input to the processing unit located at the second edge along the first direction according to the input cycle, and at least one of the data in the first target data group is The data is sequentially input to the processing unit located at the first edge along the second direction according to the input period, and the first target data group and the second data group are input starting from the same input period.
  • the first edge is adjacent to the second edge
  • the first direction is a direction away from the second edge
  • the second direction is a direction away from the first edge.
  • Each processing unit in the processing matrix is used to determine whether the key value in the first data input to the processing unit in the same input period is equal to the key value in the second data, so The first data belongs to the The data of the first target data group, the second data are the data belonging to the second target data group.
  • each processing unit in the processing matrix is also configured to transmit the first data to the edge in the next input cycle of receiving the first data and the second data.
  • the processing unit next in the first direction transmits the second data to the processing unit next in the second direction.
  • each processing unit in the processing matrix is used to compare the key values in the first data with the second If the key values in the data are not equal, the first data is transmitted to the next processing unit along the first direction, and the second data is transmitted to the next processing unit along the second direction. of the next processing unit.
  • the data processing device further includes a filtering matrix, the filtering matrix includes v filtering units, the v filtering units are respectively located along each of the v rows of the processing matrix along the first direction. after the last processing unit in the second direction.
  • Each processing unit in the processing matrix is further configured to: when the key value in the first data is equal to the key value in the second data, receive the first data and the next input cycle of the second data, transmit the processing result of the processing unit to the next unit along the second direction, and the unit is the processing unit or the filtering unit, and the The processing result includes the equal key values; or, in the next input cycle in which the processing result is received, the processing result is transmitted to the next unit along the second direction;
  • the processor 3020 is also configured to, when the first key value is greater than or equal to the second key value, control the v filtering units along the first direction to sequentially output the The processing result corresponding to the second target data group.
  • the data processing device further includes a compression triangular matrix, the compression triangular matrix includes v rows of compression units along the first direction, and the number of the compression units increases row by row along the first direction.
  • Each compression unit in the plurality of compression units is configured to: receive the processing result output by the filtering unit before the compression unit along the second direction, or receive the processing result along the first direction.
  • One row of the processing result output by the compression unit is configured to: receive the processing result output by the filtering unit before the compression unit along the second direction, or receive the processing result along the first direction.
  • Each compression unit in the plurality of compression units is configured to, in the next input cycle in which the processing result is received, transmit the processing result to the compression unit in the next row along the first direction.
  • different key values correspond to different point sets in the relationship diagram
  • the first data is used to represent all the points corresponding to the first target point in the relationship diagram and the key values in the first data.
  • the second data is used to represent at least one point in the point set corresponding to the second target point in the relationship diagram and the key value in the second data. Is there any relationship between them?
  • Each processing matrix is also used to output a processing result when the key value in the first data is equal to the key value in the second data, and the processing result is used to indicate the For a query point in the relationship diagram, the relationship between the query point and the two target points conforms to the preset situation.
  • the first data also includes a first relationship value group of the first target point corresponding to the key value
  • the second data also includes a first relationship value group of the second target point corresponding to the key value.
  • Second relational value group Second relational value group.
  • Each processing unit in the processing matrix is also configured to: when the key value in the first data is equal to the key value in the second data, perform a processing on the first relationship value
  • Each bit of the group and the second relationship value group performs a preset operation respectively, and the same bits in the first relationship value group and the second relationship value group corresponding to the equal key value correspond to the equal key value.
  • the value corresponding to the same point in the point set, the result of the preset operation of each bit Used to indicate whether the relationship between the point corresponding to the bit and the two target points meets the preset situation.
  • the key values in each first data group are arranged from small to large; in each data set, when the data groups are arranged in the second order, the key values in each first data group are arranged from large to small.
  • each unit in the above device may be integrated together in whole or in part, or may be implemented independently.
  • these units are integrated together and implemented as a system-on-a-chip (SOC).
  • SOC may include at least one processor for implementing any of the above methods or implementing the functions of each unit of the device.
  • the at least one processor may be of different types, such as a CPU and an FPGA, or a CPU and an artificial intelligence processor.
  • An embodiment of the present application also provides a computer program storage medium, which is characterized in that the computer program storage medium has program instructions, and when the program instructions are executed, the above method is executed.
  • An embodiment of the present application also provides a chip system, which is characterized in that the chip system includes at least one processor, and when the program instructions are executed in the at least one processor, the above method is executed.
  • An embodiment of the present application also provides a program product.
  • the computer program product includes program instructions. When the program instructions are executed in a computer device, the foregoing data processing method is executed.
  • An embodiment of the present application also provides a data processing system, including the aforementioned data processing device and a control device of the data processing device.
  • the processor in the embodiment of the present application can be a central processing unit (CPU).
  • the processor can also be other general-purpose processors, digital signal processors (DSP), or application-specific integrated circuits. (application specific integrated circuit, ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • non-volatile memory may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase electrically programmable read-only memory (EPROM, EEPROM) or flash memory.
  • Volatile memory can be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • static random access memory static random access memory
  • DRAM dynamic random access memory
  • RAM synchronous dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • double data rate SDRAM double data rate SDRAM
  • DDR SDRAM enhanced synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • seriallinkDRAM serial linkDRAM
  • direct memory bus random access memory direct rambus RAM, DRRAM
  • the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination.
  • the above-described embodiments may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may To be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center via a wireline (e.g.
  • the computer-readable storage medium may be any available medium that a computer can access, or a data storage device such as a server or a data center that contains one or more sets of available media.
  • the usable media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media.
  • the semiconductor medium may be a solid state drive.
  • At least one refers to one or more, and “plurality” refers to two or more.
  • At least one of the following” or similar expressions thereof refers to any combination of these items, including any combination of a single item (items) or a plurality of items (items).
  • at least one of a, b, or c can mean: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple .
  • Prefixes such as “first” and “second” are used in the embodiments of this application only to distinguish different description objects, and have no limiting effect on the position, order, priority, quantity or content of the described objects. For example, if the described object is a "key value”, then the ordinal words before “key value” in the “first key value” and “second key value” do not limit the position, order or priority between the “interfaces”; For another example, if the described object is "direction”, then the ordinal words before "direction” in "first direction” and “second direction” do not limit the position, order or priority between the "directions”.
  • the size of the sequence numbers of the above-mentioned processes does not mean the order of execution.
  • the execution order of each process should be determined by its functions and internal logic, and should not be used in the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Complex Calculations (AREA)

Abstract

本申请公开了一种数据处理装置的控制方法和装置,能够降低处理时间。该方法包括:获取两个目标数据组,该两个目标数据组分别为两个数据集合中第一个数据组,其中第一数据集合中第一数据组为多个,数据集合中任一个数据组的关键值均小于位于该任一个数据组之后的数据组中各个所述关键值,在第一目标数据组最大关键值小于或等于第二目标数据组最大关键值的情况下,将位于第一目标数据组的下一个第一数据组作为第一目标数据组。从而,能够降低运算量,提高运算效率。

Description

数据处理装置的控制方法与装置
本申请要求于2022年04月26日提交中国专利局、申请号为202210447052.8、申请名称为“数据处理装置的控制方法与装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理领域,并且更具体地,涉及一种数据处理装置的控制方法与装置。
背景技术
图挖掘算法是一种代表性的图处理算法与数据挖掘算法,用于在完整的图数据结构中查找特定子图模式并统计该子图模式的出现频次。图挖掘算法应用广泛,常见的应用案例包括社交媒体中社区网络分析、生物信息学中蛋白质分析、计算化学领域中的药物发现等。
可以利用一个点的邻居集合表示关系图中该点与其他点之间是否存在关系。该点的邻居集合包括多个数,每个数的值为与该点具有关系的点的序号。在进行图挖掘的过程中,可以利用数据处理装置对关系图中两个点的邻居集合进行比较,确定其中相等的序号。该相等的序号即为关系图中与两个点都具有关系的点的序号。
数据处理装置的处理能力有限。一般情况下,数据处理装置可以对包括数的数量均不超过预设值的两个列索引进行比较。在两个列索引中的某个列索引中数的数量超过预设值的情况下,需要对该某个列索引中的数进行分组以得到多个数组。之后,可以利用数据处理装置对该多个数组分别与该两个列索引中的另一个列索引进行比较。
对该多个数组分别与该两个列索引中的另一个列索引进行比较,运算量较大,所需的处理时间较长。
发明内容
本申请提供一种数据处理装置的控制方法和装置,能够降低运算量,缩短数据处理时间。
第一方面,提供一种数据处理装置的控制方法,其特征在于,所述方法包括:获取第一目标数据组和第二目标数据组,所述第一目标数据组为第一数据集合的多个第一数据组中的第一个数据组,所述第二目标数据组为第二数据集合的至少一个第二数据组中的第一个数据组,所述第一数据集合和所述第二数据集合的每个数据集合中的每个数据组包括至少一个数据,每个数据包括关键值,每个数据集合中的所述数据组是按照第一顺序或第二顺序排列的,在按照所述第一顺序排列的情况下每个数据集合中任一个所述数据组中的每个关键值均小于位于所述任一个数据组之后的数据组中每个关键值,在按照所述第二顺序排列的情况下每个数据集合中任一个所述数据组中的每个关键值均大于位于所述任一个数据组之后的数据组中每个关键值;进行多次迭代,每次迭代包括:将所述第一目标数据 组和所述第二目标数据组输入所述数据处理装置,所述数据处理装置用于确定所述第一目标数据组和所述第二目标数据中相等的所述关键值;在每个数据集合中的所述数据组按照所述第一顺序排列,且第一关键值小于或等于第二关键值的情况下,获取所述第一数据集合中位于所述第一目标数据组之后的第一数据组作为所述第一目标数据组,所述第一关键值为所述第一目标数据组中最大的关键值,所述第二关键值为所述第二目标数据组中最大的关键值;在每个数据集合中的所述数据组按照所述第二顺序排列,且第三关键值大于或等于第四关键值的情况下,获取所述第一数据集合中位于所述第一目标数据组之后的第一数据组作为所述第一目标数据组,所述第三关键值为所述第一目标数据组中最小的关键值,所述第四关键值为所述第二目标数据组中最小的关键值。
对于两个数据集合,每个数据集合中各个数据组是按照关键值的大小顺序排列的,在利用数据处理装置进行数据处理时,在其中第一数据集合中第一数据组的数量为多个的情况下,通过迭代,对于各个关键值均超过第二目标数据组中关键值范围的第一数据组,可以不再输入数据处理装置,即不再与第二目标数据组进行比较,降低与第二目标数据组进行比较的第一数据组的数量,从而降低运算量。
在多个第一数据组中关键值从小到大排列的情况下,第二目标数据组中关键值范围可以属于小于或等于第二目标数据组中最大的关键值。也就是说,在第一目标数据组中关键值的最大值大于或等于第二目标数据组中关键值的最大的情况下,可以不再将后续的第一数据组作为第一目标数据组与该第二目标数据组进行比较。
在多个第一数据组中关键值从大到小排列的情况下,第二目标数据组中关键值范围可以属于大于或等于第二目标数据组中最小的关键值。也就是说,在第一目标数据组中关键值的最小值小于或等于第二目标数据组中关键值的最大的情况下,可以不再将后续的第一数据组作为第一目标数据组与该第二目标数据组进行比较。
结合第一方面,在一些可能的实现方式中,至少一个第二数据组的数量为多个,在每个数据集合中的所述数据组按照所述第一顺序排列,且所述第一关键值大于或等于所述第二关键值的情况下,获取所述第二数据集合中位于所述第二目标数据组之后的第二数据组作为第二目标数据组;在每个数据集合中的所述数据组按照所述第二顺序排列,且所述第三关键值小于或等于所述第四关键值的情况下,获取所述第二数据集合中位于所述第二目标数据组之后的第二数据组作为第二目标数据组。
使得与某一个数据集合中某个数据组进行比较的另一个数据集合中的数据组包括大小在大于或等于该某个数组组中关键值最小值且小于或等于于该某个数据组中关键值最大值的范围内的关键值,并且使得该某个数组组与该另一个数据集合中仅包括该范围之外的关键值的数据组进行比较的可能性降低,提高运算效率,降低运算量。
结合第一方面,在一些可能的实现方式中,所述数据处理装置包括处理矩阵,所述处理矩阵包括v×v个处理单元,v为正整数,第一目标数据组和第二目标数据组中每个目标数据组中至少一个数据的数量小于或等于v。所述第一目标数据组中的第i个第一数据是在所述迭代的第j个输入周期输入位于第一边缘的v个所述处理单元中沿第二方向的第j个所述处理单元的,所述第二目标数据中的第p个第二数据是在所述迭代的第q个输入周期输入位于第二边缘的v个所述处理单元中沿第一方向的第q个所述处理单元的,所述第一边缘与所述第二边缘相邻,每个目标数据组中不同的所述数据输入的所述处理单元不同, 所述第一方向为从所述第二边缘指向所述处理矩阵内部且垂直所述第二边缘的方向,所述第二方向为从所述第一边缘指向所述处理矩阵内部且垂直所述第一边缘的方向;所述处理矩阵中的每个处理单元用于,确定在同一个所述输入周期输入所述处理单元的第一数据中的所述关键值与第二数据中的所述关键值是否相等;在v大于1的情况下,所述处理矩阵中的每个处理单元还用于,在接收所述第一数据和所述第二数据的下一个输入周期,将所述第一数据传输至沿所述第一方向的下一个所述处理单元,将所述第二数据传输至沿所述第二方向的下一个所述处理单元。
利用处理矩阵对第一目标数据组和第二目标数据组进行处理,可以提高处理效率。
结合第一方面,在一些可能的实现方式中,每个数据集合中不同的数据中的所述关键值不同,所述处理矩阵中的每个处理单元用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值不相等的情况下,将所述第一数据传输至沿所述第一方向的下一个所述处理单元,并将所述第二数据传输至沿所述第二方向的下一个所述处理单元。
在各个数据集合中不同的数据中的所述关键值不同的情况下,在第一数据中的关键值与第二数据中的关键值相等的情况下,该相等的关键值不会再与其他数据中的关键值相等。因此,处理单元可以仅在第一数据中的关键值与第二数据中的关键值不相等的情况下,将第一数据传输至沿第一方向的下一个所述处理单元,并将第二数据传输至沿第二方向的下一个所述处理单元。从而,可以减少数据的传输,并且可以降低运算量。
结合第一方面,在一些可能的实现方式中,所述数据处理装置还包括过滤矩阵,所述过滤矩阵包括v个过滤单元,所述v个过滤单元分别位于所述处理矩阵沿所述第一方向的v行中每一行沿所述第二方向的最后一个处理单元之后,所述处理矩阵中的每个处理单元还用于:在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,在接收所述第一数据和所述第二数据的下一个所述输入周期,将所述处理单元的处理结果沿所述第二方向传输至下一个单元,所述单元为所述处理单元或所述过滤单元,所述处理结果包括相等的所述关键值;或者,在接收所述处理结果的下一个所述输入周期,将所述处理结果沿所述第二方向传输至下一个单元;所述方法还包括:在所述第一关键值大于或等于所述第二关键值的情况下,控制沿所述第一方向的所述v个过滤单元按照所述输入周期依次输出所述第二目标数据组对应的所述处理结果。
可以将对应于某个第二数据组中各个关键值的处理结果在该第二数据组最后一次输入数据处理装置后统一输出,提高处理结果输出的灵活度。
结合第一方面,在一些可能的实现方式中,所述数据处理装置还包括压缩三角矩阵,所述压缩三角矩阵包括沿所述第一方向的v行压缩单元,沿所述第一方向所述压缩单元的数量逐行增加,所述多个压缩单元中的每个压缩单元用于:接收沿所述第二方向所述压缩单元之前的所述过滤单元输出的所述处理结果,或者,接收沿所述第一方向上一行的所述压缩单元输出的所述处理结果;在接收所述处理结果的下一个所述输入周期,向沿所述第一方向下一行的所述压缩单元传输所述处理结果。
通过在数据处理装置中设置压缩三角矩阵,可以使得某个第二数据组对应的处理结果在同一个输入周期内输出,提高处理结果输出的灵活度。
结合第一方面,在一些可能的实现方式中,不同的所述关键值对应于关系图中不同的点集,所述第一数据用于表示关系图中第一目标点与所述第一数据中的所述关键值对应的 所述点集中的至少一个点之间是否具有关系,所述第二数据用于表示关系图中第二目标点与所述第二数据中的所述关键值对应的所述点集中的至少一个点之间是否具有关系,每个处理矩阵还用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,输出处理结果,所述处理结果用于指示所述关系图中的查询点,所述查询点与所述两个目标点的之间的关系情况符合预设情况。
可以确定关系图中与两个目标点之间的连接情况预设情况的查询点,从而可以识别关系图中具有某种特定子图模型结构的子图,实现图挖掘。
结合第一方面,在一些可能的实现方式中,所述第一数据还包括所述关键值对应的所述第一目标点的第一关系值组,所述第二数据还包括所述关键值对应的所述第二目标点的第二关系值组,所述处理矩阵中的每个处理单元还用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,对所述第一关系值组和所述第二关系值组的各个位分别进行预设运算,相等的所述关键值对应的所述第一关系值组和第二关系值组中相同的位对应于所述相等的关键值对应的所述点集中相同的所述点,每个位的所述预设运算的结果用于指示所述位对应的所述点与所述两个目标点之间的关系情况是否符合所述预设情况。
第一数据中的关键值与第二数据中的关键值相等,即第一数据中的关键值与第二数据中关键值指示相同的点集,从而,根据对第一关系值组与第二关系值组的按位预设运算的结果,可以确定查询点。
利用关系图中各个点的键值对,通过确定关系图中两个目标点的键值对中相等的关键值,并根据相等的关键值对应的该两个点的关系值组确定与该两个目标点之间的关系情况符合预设情况的查询点。可以在降低用于表示关系图的数据的数据量的同时,可以降低确定查询点所需的处理时间。
结合第一方面,在一些可能的实现方式中,在每个数据集合中的所述数据组按照所述第一顺序排列的情况下,每个第一数据组中的所述关键值从小到大排列;在每个数据集合中的所述数据组按照所述第二顺序排列的情况下,每个第一数据组中的所述关键值从大到小排列。
第一数据集合中的各个关键值从小到大排列或从大到小排列,从而,可以从第一数据集合获取至少一个数据作为第一目标数据组,多次获得的第一目标数据组均为第一数据组。从而使得第一数据组的划分更加灵活。
第二方面,提供一种数据处理装置的控制装置,包括获取模块和处理模块。获取模块用于,获取第一目标数据组和第二目标数据组,所述第一目标数据组为第一数据集合的多个第一数据组中的第一个数据组,所述第二目标数据组为第二数据集合的至少一个第二数据组中的第一个数据组,所述第一数据集合和所述第二数据集合的每个数据集合中的每个数据组包括至少一个数据,每个数据集合中的所述数据组是按照第一顺序或第二顺序排列的,在按照所述第一顺序排列的情况下每个数据集合中任一个所述数据组中的每个关键值均小于位于所述任一个数据组之后的数据组中每个关键值,在按照所述第二顺序排列的情况下每个数据集合中任一个所述数据组中的每个关键值均大于位于所述任一个数据组之后的数据组中每个关键值。处理模块用于,进行多次迭代。每次迭代包括,将所述第一目标数据组和所述第二目标数据组输入所述数据处理装置,所述数据处理装置用于确定所述 第一目标数据组和所述第二目标数据中相等的所述关键值。每次迭代还包括,在每个数据集合中的所述数据组按照所述第一顺序排列,且第一关键值小于或等于第二关键值的情况下,获取所述第一数据集合中位于所述第一目标数据组之后的第一数据组作为所述第一目标数据组,所述第一关键值为所述第一目标数据组中最大的关键值,所述第二关键值为所述第二目标数据组中最大的关键值。每次迭代还包括,在每个数据集合中的所述数据组按照所述第二顺序排列,且第三关键值大于或等于第四关键值的情况下,获取所述第一数据集合中位于所述第一目标数据组之后的第一数据组作为所述第一目标数据组,所述第三关键值为所述第一目标数据组中最小的关键值,所述第四关键值为所述第二目标数据组中最小的关键值。
结合第二方面,在一些可能的实现方式中,至少一个第二数据组的数量为多个。在每个数据集合中的所述数据组按照所述第一顺序排列,且所述第一关键值大于或等于所述第二关键值的情况下,获取所述第二数据集合中位于所述第二目标数据组之后的第二数据组作为第二目标数据组。在每个数据集合中的所述数据组按照所述第二顺序排列,且所述第三关键值小于或等于所述第四关键值的情况下,获取所述第二数据集合中位于所述第二目标数据组之后的第二数据组作为第二目标数据组。
结合第二方面,在一些可能的实现方式中,所述数据处理装置包括处理矩阵,所述处理矩阵包括v×v个处理单元,v为正整数,第一目标数据组和第二目标数据组中每个目标数据组中至少一个数据的数量小于或等于v。所述第一目标数据组中的第i个第一数据是在所述迭代的第j个输入周期输入位于第一边缘的v个所述处理单元中沿第二方向的第j个所述处理单元的,所述第二目标数据中的第p个第二数据是在所述迭代的第q个输入周期输入位于第二边缘的v个所述处理单元中沿第一方向的第q个所述处理单元的,所述第一边缘与所述第二边缘相邻,每个目标数据组中不同的所述数据输入的所述处理单元不同,所述第一方向为从所述第二边缘指向所述处理矩阵内部且垂直所述第二边缘的方向,所述第二方向为从所述第一边缘指向所述处理矩阵内部且垂直所述第一边缘的方向。所述处理矩阵中的每个处理单元用于,确定在同一个所述输入周期输入所述处理单元的第一数据中的所述关键值与第二数据中的所述关键值是否相等。在v大于1的情况下,所述处理矩阵中的每个处理单元还用于,在接收所述第一数据和所述第二数据的下一个输入周期,将所述第一数据传输至沿所述第一方向的下一个所述处理单元,将所述第二数据传输至沿所述第二方向的下一个所述处理单元。
结合第二方面,在一些可能的实现方式中,每个数据集合中不同的数据中的所述关键值不同,所述处理矩阵中的每个处理单元用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值不相等的情况下,将所述第一数据传输至沿所述第一方向的下一个所述处理单元,并将所述第二数据传输至沿所述第二方向的下一个所述处理单元。
结合第二方面,在一些可能的实现方式中,所述数据处理装置还包括过滤矩阵,所述过滤矩阵包括v个过滤单元,所述v个过滤单元分别位于所述处理矩阵沿所述第一方向的v行中每一行沿所述第二方向的最后一个处理单元之后。所述处理矩阵中的每个处理单元还用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,在接收所述第一数据和所述第二数据的下一个所述输入周期,将所述处理单元的处理结果沿所述第二方向传输至下一个单元,所述单元为所述处理单元或所述过滤单元,所述处理 结果包括相等的所述关键值;或者,在接收所述处理结果的下一个所述输入周期,将所述处理结果沿所述第二方向传输至下一个单元。处理模块还用于,在所述第一关键值大于或等于所述第二关键值的情况下,控制沿所述第一方向的所述v个过滤单元按照所述输入周期依次输出所述第二目标数据组对应的所述处理结果。
结合第二方面,在一些可能的实现方式中,所述数据处理装置还包括压缩三角矩阵,所述压缩三角矩阵包括沿所述第一方向的v行压缩单元,沿所述第一方向所述压缩单元的数量逐行增加。所述多个压缩单元中的每个压缩单元用于,接收沿所述第二方向所述压缩单元之前的所述过滤单元输出的所述处理结果,或者,接收沿所述第一方向上一行的所述压缩单元输出的所述处理结果。所述多个压缩单元中的每个压缩单元还用于,在接收所述处理结果的下一个所述输入周期,向沿所述第一方向下一行的所述压缩单元传输所述处理结果。
结合第二方面,在一些可能的实现方式中,不同的所述关键值对应于关系图中不同的点集,所述第一数据用于表示关系图中第一目标点与所述第一数据中的所述关键值对应的所述点集中的至少一个点之间是否具有关系,所述第二数据用于表示关系图中第二目标点与所述第二数据中的所述关键值对应的所述点集中的至少一个点之间是否具有关系。每个处理矩阵还用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,输出处理结果,所述处理结果用于指示所述关系图中的查询点,所述查询点与所述两个目标点的之间的关系情况符合预设情况。
结合第二方面,在一些可能的实现方式中,所述第一数据还包括所述关键值对应的所述第一目标点的第一关系值组,所述第二数据还包括所述关键值对应的所述第二目标点的第二关系值组。所述处理矩阵中的每个处理单元还用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,对所述第一关系值组和所述第二关系值组的各个位分别进行预设运算,相等的所述关键值对应的所述第一关系值组和第二关系值组中相同的位对应于所述相等的关键值对应的所述点集中相同的所述点,每个位的所述预设运算的结果用于指示所述位对应的所述点与所述两个目标点之间的关系情况是否符合所述预设情况。
结合第二方面,在一些可能的实现方式中,在每个数据集合中的所述数据组按照所述第一顺序排列的情况下,每个第一数据组中的所述关键值从小到大排列;在每个数据集合中的所述数据组按照所述第二顺序排列的情况下,每个第一数据组中的所述关键值从大到小排列。
第三方面,提供一种数据处理装置的控制装置,包括存储器和至少一个处理器,所述存储器用于存储程序,当所述程序在所述至少一个处理器中执行时,所述控制装置用于执行第一方面所述的方法。
第四方面,提供了一种数据处理方法,包括:获取关系图中两个目标点中第一目标点的至少一个键值对和第二目标点的至少一个键值对,所述目标点的每个键值对包括所述目标点的关键值和所述关键值对应的所述目标点的关系值组,不同的所述关键值对应于所述关系图中不同的点集,所述目标点的关键值对应的所述目标点的关系值组用于指示所述目标点与所述关键值对应的点集中的每个点是否具有关系,在所述目标点的每个键值对中所述关键值对应的点集中存在与所述目标点具有关系的点;确定所述第一目标点的至少一个 所述关键值与所述第二目标点的的至少一个所述关键值中的相等关键值;根据所述相等关键值对应的所述第一目标点的所述关系值组和所述第二目标点的关系值组,确定所述关系图中的查询点,所述查询点与所述两个目标点之间的关系情况符合预设情况。
在所述目标点的每个键值对中所述关键值对应的点集中存在与所述目标点具有关系的点。也就是说,对于每个目标点,各个键值对中的关键值对应的点集中存在与该目标点之间具有关系的点。
利用关系图中各个点的键值对,通过确定关系图中两个目标点的键值对中相等的关键值,并根据相等的关键值对应的该两个点的关系值组确定与该两个目标点之间的关系情况符合预设情况的查询点。可以在降低用于表示关系图的数据的数据量的同时,可以降低确定查询点所需的处理时间。
结合第四方面,在一些可能实现方式中,所述获取关系图中两个目标点中每个目标点的至少一个键值对,包括:获取所述关系图数据,所述关系图数据包括行偏移向量、关键值向量、关系值向量;根据所述关系图数据,确定所述两个目标点中每个目标点的至少一个键值对,所述关键值向量包括所述关系图中多个点中每个点的至少一个关键值,所述行偏移向量用于指示每个点的至少一个关键值在所述关键值向量中的位置,所述关系值向量包括所述多个点中每个点的关键值对应的点的关系值组,每个点的至少一个关键值在所述关键值向量中的顺序与所述每个点的关键值对应的点的关系值组在所述关系值向量中的顺序相同。
关系图中各个点的键值对可以根据关系图数据确定。以关系图数据的格式存储关系图,可以降低存储空间。
结合第四方面,在一些可能实现方式中,不同的所述关键值对应的所述点集中点的数量相等。
不同的关键值对应的点集中点的数量相等,从而使得根据关系图数据确定目标点的键值对的方式更为简便。
结合第四方面,在一些可能实现方式中,所述根据所述相等关键值对应的所述第一目标点的所述关系值组和所述第二目标点的关系值组,确定所述关系图中的查询点,包括:对所述相等关键值对应的所述第一目标点的所述关系值组和所述第二目标点的关系值组的各个位分别进行预设运算,每个关键值对应的不同的所述点的所述关系值组中相同的位对应于所述关键值对应的所述点集中相同的所述点,每个位的所述预设运算的结果用于指示所述位对应的所述点与所述两个目标点之间的关系情况是否符合所述预设情况。
对两个目标点相等的关键值对应的该两个目标点各自的关系值组中各个位分别进行预设运算,运算可以并行进行,可以缩短运算时间。
结合第四方面,在一些可能实现方式中,所述确定所述第一目标点的至少一个所述关键值与所述第二目标点的至少一个所述关键值中的相等关键值,包括:将所述第一目标点的至少一个键值对沿第二方向按照输入周期依次输入位于处理矩阵的第一边缘的多个所述处理单元,将所述第二目标点的至少一个键值对沿第一方向按照所述输入周期依次输入位于所述处理矩阵的第二边缘的多个所述处理单元,以确定所述相等关键值,所述处理矩阵包括v×v个处理单元,v为大于1的正整数,所述第一目标点的至少一个键值对与所述第二目标点的至少一个键值对是在同一个所述输入周期开始输入所述处理矩阵的,所述 第一边缘与所述第二边缘相邻,所述第一方向为远离所述第二边缘的方向,所述第二方向为远离所述第一边缘的方向,每个处理单元用于确定输入所述处理单元的所述第一目标点的所述关键值与所述第二目标点的的所述关键值是否相等;所述根据所述相等关键值对应的所述第一目标点的所述关系值组和所述第二目标点的关系值组,确定所述关系图中的查询点,包括:利用所述处理矩阵对所述相等关键值对应的所述第一目标点的所述关系值组和所述第二目标点的关系值组进行处理,以确定所述查询点,所述多个处理单元中的每个处理单元用于,在所述第一目标点的所述关键值与所述第二目标点的的所述关键值相等的情况下,根据输入所述处理单元的所述第一目标点的所述关系值组和所述第二目标点的关系值组,确定所述关系图中的查询点,所述查询点与所述两个目标点之间的关系情况符合预设情况;所述多个处理单元中的每个处理单元还用于,按照所述输入周期,将所述第一目标点的所述键值对传输至沿所述第一方向的下一个所述处理单元,将第二目标点的所述键值对传输至沿所述第二方向的下一个所述处理单元。
处理矩阵可以理解为脉动阵列(systolic array),各个目标点的键值对按“流水”方式在处理矩阵的处理单元间有节奏地“流动”,所有的处理单元并行地对流经的数据进行处理可以提高处理速度,降低处理时间。
结合第四方面,在一些可能实现方式中,所述多个处理单元中的每个处理单元具体用于:在所述第一目标点的所述关键值与所述第二目标点的的所述关键值不相等的情况下,将所述第一目标点的所述键值对传输至沿所述第一方向的下一个所述处理单元,将第二目标点的所述键值对传输至沿所述第二方向的下一个所述处理单元。
一个目标点的各个键值对中的关键值是不同的。在第一目标点的关键值对第二目标点的关键值不相等的情况下,处理单元对第一目标点、第二目标点的关键值进行传输。而在在第一目标点的关键值对第二目标点的关键值相等的情况下,不再进行键值对的传输。从而,可以不再将该相等的关键值与其他关键值进行比较,降低运算量。
结合第四方面,在一些可能实现方式中,所述第一目标点的至少一个键值对的数量大于v,所述方法还包括:按照所述键值对中关键值的大小顺序将所述第一目标点的至少一个键值对划分为多个第一键值对组,并按照所述键值对中关键值的大小顺序将所述第二目标点的至少一个键值对划分为至少一个第二键值对组,所述多个第一键值对组和所述至少一个第二键值对组中的每个键值对组中所述键值对的数量小于或等于v,其中,所述多个第一键值对组中关键值最小的键值对组为第三键值对组,所述多个第二键值对组中关键值最小的键值对组为所述第四键值对组;进行多次迭代,直到所述第三键值对组中的最大关键值大于所述第四键值对组中的最大关键值,所述迭代包括:将所述第三键值对组中的至少一个所述键值对按照输入周期依次输入所述多个处理单元中的位于第一边缘的多个所述处理单元,将所述第四键值对组中的至少一个所述键值对按照输入周期依次输入所述多个处理单元中的位于第二边缘的多个所述处理单元;在所述第三键值对组中最大的关键值小于所述第四键值对组中最大的关键值的情况下,将按照关键值从小到大顺序排列的所述多个所述第一键值对组中所述第三键值对组的下一个第一键值对组作为所述第三键值对组;所述将所述第一目标点的至少一个键值对沿第二方向按照输入周期依次输入位于处理矩阵的第一边缘的多个所述处理单元,将所述第二目标点的至少一个键值对沿第一方向按照所述输入周期依次输入位于所述处理矩阵的第二边缘的多个所述处理单元,包括:在进 行所述多次迭代之前以及每次迭代之后,将所述第三键值对组的多个键值对按照所述输入周期依次输入位于所述处理矩阵的第一边缘的多个所述处理单元,将所述第四键值对组的多个键值对按照所述输入周期依次输入所述多个处理单元中的位于所述处理矩阵的第二边缘的多个所述处理单元。
按照关键值从小到大的顺序,依次将多个第一键值对组与第二键值对组进行比较。在第一键值对组中最大的关键值小于第二键值对组中最大的关键值的情况下,将下一个第一键值对组输入处理矩阵与第二键值对组进行比较。在第一键值对组中最大的关键值大于或等于第二键值对组中最大的关键值的情况下,可以不再与该第二键值对组进行比较,降低运算量。
特别地,在第一键值对组、第二键值对组均为多个的情况下,与每个第一键值对组与各个第二键值对组进行比较的方式相比,本申请提供的方法可以明显降低运算量,降低运算时间。
第五方面,提供一种数据处理装置,包括:获取模块和处理模块;所述获取模块用于,获取关系图中两个目标点中第一目标点的至少一个键值对和第二目标点的至少一个键值对,所述目标点的每个键值对包括所述目标点的关键值和所述关键值对应的所述目标点的关系值组,不同的所述关键值对应于所述关系图中不同的点集,所述目标点的关键值对应的所述目标点的关系值组用于指示所述目标点与所述关键值对应的点集中的每个点是否具有关系,在所述目标点的每个键值对中所述关键值对应的点集中存在与所述目标点具有关系的点;所述处理模块用于,确定所述第一目标点的至少一个所述关键值与所述第二目标点的的至少一个所述关键值中的相等关键值;所述处理模块还用于,根据所述相等关键值对应的所述第一目标点的所述关系值组和所述第二目标点的关系值组,确定所述关系图中的查询点,所述查询点与所述两个目标点之间的关系情况符合预设情况。
结合第五方面,在一些可能的实现方式中,所述获取模块具体用于:获取所述关系图数据,所述关系图数据包括行偏移向量、关键值向量、关系值向量;根据所述关系图数据,确定所述两个目标点中每个目标点的至少一个键值对,所述关键值向量包括所述关系图中多个点中每个点的至少一个关键值,所述行偏移向量用于指示每个点的至少一个关键值在所述关键值向量中的位置,所述关系值向量包括所述多个点中每个点的关键值对应的点的关系值组,每个点的至少一个关键值在所述关键值向量中的顺序与所述每个点的关键值对应的点的关系值组在所述关系值向量中的顺序相同。
结合第五方面,在一些可能的实现方式中,不同的所述关键值对应的所述点集中点的数量相等。
结合第五方面,在一些可能的实现方式中,所述处理模块具体用于,对所述相等关键值对应的所述第一目标点的所述关系值组和所述第二目标点的关系值组的各个位分别进行预设运算,每个关键值对应的不同的所述点的所述关系值组中相同的位对应于所述关键值对应的所述点集中相同的所述点,每个位的所述预设运算的结果用于指示所述位对应的所述点与所述两个目标点之间的关系情况是否符合所述预设情况。
结合第五方面,在一些可能的实现方式中,所述处理模块包括:控制器和处理矩阵,所述处理矩阵包括v×v个处理单元,v为大于1的正整数;所述控制器用于,将关系图的两个目标点中第一目标点的至少一个键值对沿第二方向按照输入周期依次输入位于所 述处理矩阵的第一边缘的多个所述处理单元,将所述两个目标点中第二目标点的至少一个键值对沿第一方向按照所述输入周期依次输入位于所述处理矩阵的第二边缘的多个所述处理单元,所述第一目标点的至少一个键值对与所述第二目标点的至少一个键值对是在同一个所述输入周期开始输入所述处理矩阵的,所述第一边缘与所述第二边缘相邻,所述第一方向为远离所述第二边缘的方向,所述第二方向为远离所述第一边缘的方向,所述目标点的每个键值对包括所述目标点的关键值和所述关键值对应的所述目标点的关系值组,不同的所述关键值对应于所述关系图中不同的点集,所述目标点的关键值对应的所述目标点的关系值组用于指示所述目标点与所述关键值对应的点集中的每个点是否具有关系,在所述目标点的每个键值对中所述关键值对应的点集中存在与所述目标点具有关系的点;所述多个处理单元中的每个处理单元用于:确定输入所述处理单元的所述第一目标点的所述关键值与所述第二目标点的的所述关键值是否相等;在所述第一目标点的所述关键值与所述第二目标点的的所述关键值相等的情况下,根据输入所述处理单元的所述第一目标点的所述关系值组和所述第二目标点的关系值组,确定所述关系图中的查询点,所述查询点与所述两个目标点之间的关系情况符合预设情况;按照所述输入周期,将所述第一目标点的所述键值对传输至沿所述第一方向的下一个所述处理单元,将第二目标点的所述键值对传输至沿所述第二方向的下一个所述处理单元。
结合第五方面,在一些可能的实现方式中,所述多个处理单元中的每个处理单元具体用于:在所述第一目标点的所述关键值与所述第二目标点的的所述关键值不相等的情况下,将所述第一目标点的所述键值对传输至沿所述第一方向的下一个所述处理单元,将第二目标点的所述键值对传输至沿所述第二方向的下一个所述处理单元。
结合第五方面,在一些可能的实现方式中,所述第一目标点的至少一个键值对的数量大于v,所述控制器还用于:
按照所述键值对中关键值的大小顺序将所述第一目标点的至少一个键值对划分为多个第一键值对组,并按照所述键值对中关键值的大小顺序将所述第二目标点的至少一个键值对划分为至少一个第二键值对组,所述多个第一键值对组和所述至少一个第二键值对组中的每个键值对组中所述键值对的数量小于或等于v,其中,所述多个第一键值对组中关键值最小的键值对组为第三键值对组,所述多个第二键值对组中关键值最小的键值对组为所述第四键值对组;进行多次迭代,直到所述第三键值对组中的最大关键值大于所述第四键值对组中的最大关键值,所述迭代包括:将所述第三键值对组中的至少一个所述键值对按照输入周期依次输入所述多个处理单元中的位于第一边缘的多个所述处理单元,将所述第四键值对组中的至少一个所述键值对按照输入周期依次输入所述多个处理单元中的位于第二边缘的多个所述处理单元;在所述第三键值对组中最大的关键值小于所述第四键值对组中最大的关键值的情况下,将按照关键值从小到大顺序排列的所述多个所述第一键值对组中所述第三键值对组的下一个第一键值对组作为所述第三键值对组;所述控制器具体用于,在进行所述多次迭代之前以及每次迭代之后,将所述第三键值对组的多个键值对按照所述输入周期依次输入位于所述处理矩阵的第一边缘的多个所述处理单元,将所述第四键值对组的多个键值对按照所述输入周期依次输入所述多个处理单元中的位于所述处理矩阵的第二边缘的多个所述处理单元。
第六方面,提供一种数据处理装置,包括:控制器和处理矩阵,所述处理矩阵包括v ×v个处理单元,v为大于1的正整数;所述控制器用于,将关系图的两个目标点中第一目标点的至少一个键值对沿第二方向按照输入周期依次输入所述多个处理单元中的位于所述处理矩阵的第一边缘的多个所述处理单元,将所述两个目标点中第二目标点的至少一个键值对沿第一方向按照所述输入周期依次输入所述多个处理单元中的位于所述处理矩阵的第二边缘的多个所述处理单元,所述第一目标点的至少一个键值对与所述第二目标点的至少一个键值对是在同一个所述输入周期开始输入所述处理矩阵的,所述第一边缘与所述第二边缘相邻,所述第一方向为远离所述第二边缘的方向,所述第二方向为远离所述第一边缘的方向,所述目标点的每个键值对包括所述目标点的关键值和所述关键值对应的所述目标点的关系值组,不同的所述关键值对应于所述关系图中不同的点集,所述目标点的关键值对应的所述目标点的关系值组用于指示所述目标点与所述关键值对应的点集中的每个点是否具有关系,在所述目标点的每个键值对中所述关键值对应的点集中存在与所述目标点具有关系的点;所述多个处理单元中的每个处理单元用于:确定输入所述处理单元的所述第一目标点的所述关键值与所述第二目标点的的所述关键值是否相等;在所述第一目标点的所述关键值与所述第二目标点的的所述关键值相等的情况下,根据输入所述处理单元的所述第一目标点的所述关系值组和所述第二目标点的关系值组,确定所述关系图中的查询点,所述查询点与所述两个目标点之间的关系情况符合预设情况;按照所述输入周期,将所述第一目标点的所述键值传输至沿第一方向的下一个所述处理单元,将第二目标点的所述键值传输至沿第二方向的下一个所述处理单元,所述第一方向与所述第一边缘垂直,所述第二方向与所述第二边缘垂直。
结合第六方面,在一些可能的实现方式中,所述多个处理单元中的每个处理单元具体用于:在所述第一目标点的所述关键值与所述第二目标点的的所述关键值不相等的情况下,将所述第一目标点的所述键值对传输至沿所述第一方向的下一个所述处理单元,将第二目标点的所述键值对传输至沿所述第二方向的下一个所述处理单元。
一个目标点的各个键值对中的关键值是不同的。在第一目标点的关键值对第二目标点的关键值不相等的情况下,处理单元对第一目标点、第二目标点的关键值进行传输。而在在第一目标点的关键值对第二目标点的关键值相等的情况下,不再进行键值对的传输。从而,可以不再将该相等的关键值与其他关键值进行比较,降低运算量。
结合第六方面,在一些可能的实现方式中,所述控制器还用于,根据关系图数据,确定所述第一目标点的至少一个键值对和第二目标点的至少一个键值对,所述关系图数据包括行偏移向量、关键值向量、关系值向量,所述关键值向量包括所述关系图中多个点中每个点的至少一个关键值,所述行偏移向量用于指示每个点的至少一个关键值在所述关键值向量中的位置,所述关系值向量包括所述多个点中每个点的关键值对应的点的关系值组,每个点的至少一个关键值在所述关键值向量中的顺序与所述每个点的关键值对应的点的关系值组在所述关系值向量中的顺序相同。
关系图中各个点的键值对可以根据关系图数据确定。以关系图数据的格式存储关系图,可以降低存储空间。
示例性地,控制器还用于,根据存储的关系图数据,确定所述第一目标点的至少一个键值对和第二目标点的至少一个键值对。
结合第六方面,在一些可能的实现方式中,不同的所述关键值对应的所述点集中点的 数量相等。
不同的关键值对应的点集中点的数量相等,从而使得根据关系图数据确定目标点的键值对的方式更为简便。
结合第六方面,在一些可能的实现方式中,所述多个处理单元中的每个处理单元用于:在所述第一目标点的所述关键值与所述第二目标点的的所述关键值相等的情况下,对所述相等关键值对应的所述第一目标点的所述关系值组和所述第二目标点的关系值组的各个位分别进行预设运算,每个关键值对应的不同的所述点的所述关系值组中相同的位对应于所述关键值对应的所述点集中相同的所述点,每个位的所述预设运算的结果用于指示所述位对应的所述点与所述两个目标点之间的关系情况是否符合预设情况。
对两个目标点相等的关键值对应的该两个目标点各自的关系值组中各个位分别进行预设运算,运算可以并行进行,可以缩短运算时间。
结合第六方面,在一些可能的实现方式中,所述第一目标点的至少一个键值对的数量大于v,所述控制器具体用于:按照所述键值对中关键值的大小顺序,将所述第一目标点的至少一个键值对划分为多个第一键值对组,并将所述第二目标点的至少一个键值对划分为至少一个第二键值对组,所述多个第一键值对组和所述至少一个第二键值对组中的每个键值对组中所述键值对的数量小于或等于v,其中,所述多个第一键值对组中关键值最小的键值对组为第三键值对组,所述多个第二键值对组中关键值最小的键值对组为所述第四键值对组;进行迭代,直到所述第三键值对组中的最大关键值大于所述第四键值对组中的最大关键值,所述迭代包括:将所述第三键值对组中的多个所述键值对按照输入周期依次输入所述多个处理单元中的位于第一边缘的多个所述处理单元,将所述第四键值对组中的多个所述键值对按照输入周期依次输入所述多个处理单元中的位于第二边缘的多个所述处理单元;在所述第三键值对组中最大的关键值小于所述第四键值对组中最大的关键值的情况下,将按照关键值从小到大顺序排列的所述多个所述第一键值对组中所述第三键值对组的下一个第一键值对组作为所述第三键值对组;所述控制器具体用于,在进行所述多次迭代之前以及每次迭代之后,将所述第三键值对组的多个键值对按照所述输入周期依次输入位于所述处理矩阵的第一边缘的多个所述处理单元,将所述第四键值对组的多个键值对按照所述输入周期依次输入所述多个处理单元中的位于所述处理矩阵的第二边缘的多个所述处理单元。
在第一目标点的至少一个键值对的数量大于v的情况下,按照关键值从小到大的顺序将第一目标点的至少一个键值对划分为多个第一键值对组,与第一目标点的至少一个第二键值对组分别进行比较。
按照关键值从小到大的顺序,依次将多个第一键值对组与第二键值对组进行比较。在第一键值对组中最大的关键值小于第二键值对组中最大的关键值的情况下,将下一个第一键值对组输入处理矩阵与第二键值对组进行比较。在第一键值对组中最大的关键值大于或等于第二键值对组中最大的关键值的情况下,可以不再与该第二键值对组进行比较,降低运算量。
特别地,在第一键值对组、第二键值对组均为多个的情况下,与每个第一键值对组与各个第二键值对组进行比较相比,本申请提供的方法可以明显降低运算量,降低运算时间。
第七方面,提供一种数据处理装置,包括存储器和至少一个处理器,所述存储器用于 存储程序,当所述程序在所述至少一个处理器中执行时,所述处理器用于执行第一方面中任意一种实现方式中的方法。
第八方面,提供一种计算机可读介质,该计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行第一方面或第四方面中的任意一种实现方式中的方法。
第九方面,提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第一方面或第四方面中的任意一种实现方式中的方法。
第十方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行上述第一方面、第四方面中的任意一种实现方式中的方法。
第十一方面,提供一种数据处理系统,包括第一方面所述的数据处理装置的控制装置和数据梳理装置。
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面或第四方面中的任意一种实现方式中的方法。
上述芯片具体可以是现场可编程门阵列(field-programmable gate array,FPGA)或者专用集成电路(application-specific integrated circuit,ASIC)。
应理解,本申请中,第一方面的方法具体可以是指第一方面以及第一方面中各种实现方式中的任意一种实现方式中的方法。
附图说明
图1是一种图挖掘算法的示意性结构图。
图2是一种数据格式的示意图。
图3是一种基于队列的集合合并的示意图。
图4是本申请实施例提供的一种数据处理方法的示意性结构图。
图5是本申请实施例提供的一种生成关系图数据的方法的示意性流程图。
图6是本申请实施例提供的另一种数据处理方法的示意性流程图。
图7是本申请实施例提供的一种数据处理装置的示意性结构图。
图8是本申请实施例提供的又一种数据处理方法的示意性流程图。
图9是本申请实施例提供的一种处理单元的示意性结构图。
图10是本申请实施例提供的一种键值对集合的示意图。
图11是本申请实施例提供的一种过滤单元的示意性结构图。
图12是本申请实施例提供的一种压缩三角的示意性结构图。
图13是本申请实施例提供的一种压缩单元的示意性结构图。
图14是本申请实施例提供的数据处理装置的处理时间的示意图。
图15是本申请实施例提供的一种数据处理装置的示意性结构图。
图16是本申请实施例提供的一种数据处理系统的示意性结构图。
图17是本申请实施例提供的另一种数据处理系统的示意性结构图。
图18是本申请实施例提供的数据处理系统的性能对比的示意图。
图19是本申请实施例提供的一种数据处理装置的控制方法的示意性流程图。
图20是本申请实施例提供的一种数据处理装置的控制装置的示意性结构图。
图21是本申请实施例提供的一种数据处理装置的控制装置的示意性结构图。
图22是本申请实施例提供的一种数据处理系统的示意性结构图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
图挖掘算法是一种代表性的图处理算法与数据挖掘算法,用于在完整的图数据结构中查找特定子图模式并统计该子图模式的出现频次。图挖掘算法应用广泛,常见的应用案例包括社交媒体中社区网络分析、生物信息学中蛋白质分析、计算化学领域中的药物发现等。
图1是一种图挖掘算法的示意性结构图。
将关系图110上的各个点(即点1至点N)依次作为子图模型中的点v0。
关系图也可以称为图谱或图数据结构,用于表示节点之间是否存在关系。两个节点之间存在关系,也可以理解为该两个节点之间具有关系。
关系图包括相互连接的点和边。关系图中的点也可以称为节点,可以用于表示实体。关系图中两点之间的边可以用于表示该两点之间存在关系。
实体指的是具有可区别性且独立存在的某种事物。如某一个人、某一个城市、某一种植物、某一种商品、某一个设备、某一个原子等。
在某两个点之间存在关系的情况下,该两个点通过“边”连接。在某两个点之间不存在关系的情况下,该两个点之间不存在“边”。通过“边”连接两个点互为邻居。
通过关系图,可以将不同种类的信息连接在一起而得到的一个关系网络。关系图提供了从“关系”的角度去分析问题的能力。
在子图模型中,点v1是点v0的邻居。在确定关系图110上某个点u0为点v0的情况下,将点u0的各个邻居依次作为点v1。
在子图模型中,点v2是点v0和点v1的共同邻居。在关系图110上确定某个点u1为点v1的情况下,将点u0和点u1的共同邻居中的点依次作为点v2。点u0和点u1的共同邻居属于关系图110上的点u0的邻居和点u1邻居的交集。
子图模型中,点v3是点v0和点v1的共同邻居,但不是点v2的邻居。关系图110上确定某个点u2为点v2的情况下,将点u0和点u1的共同邻居中不属于u2的邻居的点依次作为点v3。
依次将点u0和点u1的共同邻居中不属于u2的邻居的点作为点u3。从而,可以确定有效的子图。
深度优先搜索算法的核心是一系列嵌套的for循环与集合运算:
其中,集合A为关系图110中的点的集合,N(ui)表示点ui的邻居的集合,i∈{0,1,2,3},“∩”表示交集。N(u0)∩N(u1)-N(u2)表示点u0的邻居的集合和点u1邻居的集合的交集,再减去点u2的邻居的集合中的各个点。
每当图挖掘算法的for循环到达最内层,存在点u3的情况下,即存在有效的子图。
图挖掘算法对图1所示的关系图110进行处理,可以得到3个有效的子图。
可以利用压缩稀疏行(compressed sparse row,CSR)格式对图1所示的关系图110进行存储。
对于关系图,可以用每个点对应的向量表示该是否与其他点连接。每个点对应的向量可以与关系图中点的数量相等。如图1所示的关系图110包括8个点,则关系图110中每个点对应的向量可以包括8位(bit),每一位用于表示该点与其他点是否连接。不同向量中相同的位用于表示不同向量对应的点是否与某个相同的点连接。也就是说,不同向量中相同的位对应于相同的点。
示例性的,图1所示的关系图110中某个点对应的向量,第i位用于表示该某个点与点i是否存在连接,i∈[0,7],且i为整数。点0与点0之间不存在“边”,即不存在连接,点0对应的向量中第0位为0;点0与点1之间存在“边”,即存在连接,点0对应的向量的第1位为“1”;点0与点2之间不存在连接,点0对应的向量的第2位为为“0”。从而,可以确定点0对应的向量N(0)为“01010011”,点1对应的向量N(1)为“10101110”。
邻接矩阵可以包括关系图中各个点对应的向量。关系图中每个点对应的向量可以作为邻接矩阵中的一行。邻接矩阵中每一行对应的点的顺序可以与每个向量中每一位对应的点的顺序相同。
可以利用CSR格式对邻接矩阵进行存储,即可以利用CSR格式的数据表示关系图。
图2是CSR格式数据的示意图。
CSR格式的数据是通过对数据进行整体编码的方式得到的。CSR格式的数据包括三部分:行偏移,列索引以及图数据值。行偏移中第i个数用于表示矩阵第i行中第一个非零元素的起始位置。列索引用于表示矩阵中非零元素所在列的列坐标,图数据值用于表示非零元素的具体值。
在图1所示的关系图110的邻接矩阵中,第0行是N(0)=01010011,第1行是N(1)=10101110。第0行中非零元素所在列的列坐标分别为1、3、6、7,共4位;第1行中非零元素所在列的列坐标分别为0、2、4、5、6,共5位。因此,邻接矩阵利用CSR格式可以表示为:行偏移(row)“0,4……”,列索引“1,3,6,7,0,2,4,5,6……”,图数据值“1,1,1,1,1,1,1,1,1……”
利用CSR格式表示的关系图,每个点对应的列索引的部分可以理解为该点的邻居集合。确定两个点的邻居集合的交集,从而可以确定与该两个集合对应的点均具有关系的查询点。
对CSR格式存储的邻接矩阵中各个点的邻居集合进行集合合并运算时,首先需要根据点的序号及相应的行偏移值取出对应部分的列索引,然后对两组列索引的部分进行比较,筛选出相同的序号。该相同的序号指示的点即为查询点。
示例性地,根据行偏移“0,4,9……”,确定序号分别为点0对应的向量中的非零值的位置(即点0对应的邻居集合)为根据列索引中第0至第3个数字,和点1对应的向量中的非零值的位置(即点1对应的邻居集合)可以根据列索引中第4至第8个数字。之后,如图3所示,对列索引中第0至第3个数字与列索引中第4至第8个数字进行比较,即对点0的邻居集合与点1的邻居集合进行比较,确定点0的邻居集合与点1的邻居集合的交 集。
数据处理装置可以用于对两个点的列索引部分进行比较。但是,数据处理装置的处理能力有限。数据处理装置可以对包括序号的数量均不超过预设值的两个邻居集合进行比较。
在两个列索引中的某个列索引中数的数量超过预设值的情况下,需要对该某个列索引中的数进行分组以得到多个数组。之后,可以利用数据处理装置对该多个数组分别与该两个列索引中的另一个列索引进行比较,但是所需运算量较大,所需的处理时间较长。
为了解决上述问题,本申请实施例提供了一种数据处理方法。
图19是本申请实施例提供的一种基于数据处理装置的数据处理方法的示意性流程图。
数据处理装置包括处理矩阵,所述处理矩阵包括v×v个处理单元,v为正整数。
方法2200包括S2210至S2220。
在S2210,获取第一目标数据组和第二目标数据组,所述第一目标数据组为第一数据集合的多个第一数据组中的第一个数据组,所述第二目标数据组为第二数据集合的至少一个第二数据组中的第一个数据组,所述第一数据集合和所述第二数据集合的每个数据集合中的每个数据组包括至少一个数据,每个数据包括关键值,每个数据集合中的所述数据组是按照第一顺序或第二顺序排列的,在按照所述第一顺序排列的情况下每个数据集合中任一个所述数据组中的每个关键值均小于位于所述任一个数据组之后的数据组中每个关键值,在按照所述第二顺序排列的情况下每个数据集合中任一个所述数据组中的每个关键值均大于位于所述任一个数据组之后的数据组中每个关键值。
第一数据集合和第二数据集合可以存储在存储器中。第一数据集合包括多个第一数据组,第二数据集合包括多个第二数据组。
获取第一目标数据组和第二目标数据组,可以是从存储器中读取第一目标数据组和第二目标数据组。
在S2220,进行多次迭代。
每次迭代包括,将所述第一目标数据组和所述第二目标数据组输入所述数据处理装置,所述数据处理装置用于确定所述第一目标数据组和所述第二目标数据中相等的所述关键值。
每次迭代还包括,在每个数据集合中的所述数据组按照所述第一顺序排列,且第一关键值小于或等于第二关键值的情况下,或者,在每个数据集合中的所述数据组按照所述第二顺序排列,且第三关键值大于或等于第四关键值的情况下,获取所述第一数据集合中位于所述第一目标数据组之后的第一数据组作为所述第一目标数据组,所述第一关键值为所述第一目标数据组中最大的关键值,所述第二关键值为所述第二目标数据组中最大的关键值,所述第三关键值为所述第一目标数据组中最小的关键值,所述第四关键值为所述第二目标数据组中最小的关键值。
对于两个数据集合,每个数据集合中各个数据组是按照关键值的大小顺序排列的,在利用数据处理装置进行数据处理时,在其中第一数据集合中第一数据组的数量为多个的情况下,通过迭代,对于各个关键值均超过第二目标数据组中关键值范围的第一数据组,可以不再输入数据处理装置,即不再与第二目标数据组进行比较,降低与第二目标数据组进行比较的第一数据组的数量,从而降低运算量。
图10以第一顺序与第二顺序相同为例进行说明。左侧一列为第一数据集合,右侧一 列为第二数据集合。第一数据集合包括关键值分别为1、4、7、8、10等的5个数据,第二数据集合包括关键值分别为1、2、3、4、5、6、7、8、9等的9个数据。在v=3的情况下,第一数据集合可以包括2个第一数据组,其中,第1个第一数据组包括关键值分别为1、4、7的3个数据,第2个第一数据组包括关键值为8、10的2个数据。第二数据集合可以包括3个第二数据组,第1个第二数据组包括关键值分别为1、2、3的3个数据,第2个第二数据组包括关键值分别为4、5、6的3个数据,第3个第二数据组包括关键值分别为7、8、9的3个数据。
通过S2210,可以获取包括关键值1、4、7的第一数据组和包括关键值1、2、3的第二数据组,分别作为第一目标数据组和第二目标数据组。在进行S2220的过程中,由于第一目标数据组中最大的关键值7大于第二目标数据组中最大的关键值4,不再将第一目标数据组的下一个第一数据组作为第一目标数据组与包括关键值1、2、3的第二目标数据组进行比较,从而,可以降低与第二数据组进行比较的第一数据组的数量,降低运算量。
另外,进行方法2200的过程中,不需要提前获取全部的第一数据组、第二数据组,降低对执行方法2200的装置的处理能力和存储能力的需求,提高方法的灵活性和适应性。
如果第二数据集合中第二数据组的数量为多个,则在每个数据集合中的所述数据组按照所述第一顺序排列,且所述第一关键值大于或等于所述第二关键值的情况下,获取所述第二数据集合中位于所述第二目标数据组之后的第二数据组作为第二目标数据组。
如果第二数据集合中第二数据组的数量为多个,则在每个数据集合中的所述数据组按照所述第二顺序排列,且所述第三关键值小于或等于所述第四关键值的情况下,获取所述第二数据集合中位于所述第二目标数据组之后的第二数据组作为第二目标数据组。
仍然以图10为例进行说明。在进行S2220的过程中,第一目标数据组中最大的关键值7大于第二目标数据组中最大的关键值3,可以获取第1个第二数据组的下一个第二数据组并作为第二目标数据组。第2个第二数据组包括关键值分别为4、5、6的3个数据。
进行第二次迭代时,第一目标数据组中最大的关键值7大于第二目标数据组中最大的关键值6,可以获取第2个第二数据组的下一个第二数据组并作为第二目标数据组。
进行第三次迭代时,第一目标数据组中最大的关键值7小于第二目标数据组中最大的关键值9,可以获取当前的第一目标数据组(即第1个第一数据组)的下一个第一数据组并作为新的第一目标数据组。
在第一顺序与第二顺序相同,且第一数据组的数量与第二数据组的数量均为多个的情况下,将包括的关键值较小的目标数据组的所属的数据集合中的下一个目标数据组作为下一次迭代中一个目标数据组,而关键值较大的目标数据组作为下一次迭代中的另一个目标数据组,可以使得与某一个数据集合中某个数据组进行比较的另一个数据集合中的数据组包括大小在大于或等于该某个数据组中关键值最小值且小于或等于该某个数据组中关键值最大值的范围内的关键值,并且使得该某个数据组与该另一个数据集合中仅包括该范围之外的关键值的数据组进行比较的可能性降低,提高运算效率,降低运算量。
所述数据处理装置可以包括处理矩阵,所述处理矩阵包括v×v个处理单元,v为正整数,第一目标数据组和第二目标数据组中每个目标数据组中至少一个数据的数量小于或等于v。
在进行每次迭代的过程中,可以将所述第一目标数据组和所述第二目标数据组按照输 入规则输入处理矩阵,所述输入规则使得所述第二目标数据组中的第p个第二数据是在所述迭代的第q个输入周期输入位于第二边缘的v个所述处理单元中沿第一方向的第q个所述处理单元的,所述第一目标数据组第i个第一数据是在所述迭代的第j个输入周期输入位于第一边缘的v个所述处理单元中沿第二方向的第j个所述处理单元的,所述第一目标数据组与所述第二数据组是在同一个所述输入周期开始输入的,所述第一边缘与所述第二边缘相邻,所述第一方向为从所述第二边缘指向所述处理矩阵内部且垂直所述第二边缘的方向,所述第二方向为从所述第一边缘指向所述处理矩阵内部且垂直所述第一边缘的方向。
所述处理矩阵中的每个处理单元用于,确定在同一个所述输入周期输入所述处理单元的第一数据中的所述关键值与第二数据中的所述关键值是否相等,所述第一数据为属于所述第一目标数据组的所述数据,所述第二数据为属于所述第二目标数据组的所述数据。
在v大于1的情况下,所述处理矩阵中的每个处理单元还用于,在接收所述第一数据和所述第二数据的下一个输入周期,将所述第一数据传输至沿所述第一方向的下一个所述处理单元,将所述第二数据传输至沿所述第二方向的下一个所述处理单元。
利用处理矩阵对第一目标数据组和第二目标数据组进行处理,可以提高处理效率。
处理矩阵可以理解为逻辑上的矩阵,处理矩阵中各个处理单元实际的物理位置是否按照行列的方式排列,本申请实施例不做限制。第一方向、第二方向可以理解为处理矩阵中的逻辑方向。
在每个数据集合中不同的数据中的所述关键值不同的情况下,处理矩阵中的每个处理单元用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值不相等的情况下,将所述第一数据传输至沿所述第一方向的下一个所述处理单元,并将所述第二数据传输至沿所述第二方向的下一个所述处理单元。
在各个数据集合中不同的数据中的所述关键值不同的情况下,在第一数据中的关键值与第二数据中的关键值相等的情况下,该相等的关键值不会再与其他数据中的关键值相等。因此,处理单元可以仅在第一数据中的关键值与第二数据中的关键值不相等的情况下,将第一数据传输至沿第一方向的下一个所述处理单元,并将第二数据传输至沿第二方向的下一个所述处理单元。从而,可以减少数据的传输,并且可以降低运算量。
数据处理装置还可以包括过滤矩阵,所述过滤矩阵包括v个过滤单元,所述v个过滤单元分别位于所述处理矩阵沿所述第一方向的v行中每一行沿所述第二方向的最后一个处理单元之后。
所述处理矩阵中的每个处理单元还用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,在接收所述第一数据和所述第二数据的下一个所述输入周期,将所述处理单元的处理结果沿所述第二方向传输至下一个单元,所述单元为所述处理单元或所述过滤单元,所述处理结果包括相等的所述关键值。
所述处理矩阵中的每个处理单元还用于,在接收所述处理结果的下一个所述输入周期,将所述处理结果沿所述第二方向传输至下一个单元。
方法2200还包括,在所述第一关键值大于或等于所述第二关键值的情况下,控制沿所述第一方向的所述v个过滤单元按照所述输入周期依次输出所述第二目标数据组对应的所述处理结果。
从而,可以将对应于某个第二数据组中各个关键值的处理结果在该第二数据组最后一 次输入数据处理装置后统一输出,提高处理结果输出的灵活度。
数据处理装置还可以包括压缩三角矩阵。所述压缩三角矩阵包括沿所述第一方向的v行压缩单元,沿所述第一方向所述压缩单元的数量逐行增加。
v行压缩单元压缩单元可以分别沿第二方向位于一个过滤单元之后。
所述多个压缩单元中的每个压缩单元用于,接收沿所述第二方向所述压缩单元之前的所述过滤单元输出的所述处理结果,或者,接收沿所述第一方向上一行的所述压缩单元输出的所述处理结果。
所述多个压缩单元中的每个压缩单元用于,在接收所述处理结果的下一个所述输入周期,向沿所述第一方向下一行的所述压缩单元传输所述处理结果。
通过在数据处理装置中设置压缩三角矩阵,可以使得某个第二数据组对应的处理结果在同一个输入周期输出,提高处理结果输出的灵活度。
方法2200可以应用于图挖掘。
不同的所述关键值对应于关系图中不同的点集,所述第一数据用于表示关系图中第一目标点与所述第一数据中的所述关键值对应的所述点集中的至少一个点之间是否具有关系,所述第二数据用于表示关系图中第二目标点与所述第二数据中的所述关键值对应的所述点集中的至少一个点之间是否具有关系。
每个处理矩阵还用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,输出处理结果,所述处理结果用于指示所述关系图中的查询点,所述查询点与所述两个目标点的之间的关系情况符合预设情况。
预设情况可以是与第一目标点、第二目标点均具有关系,与第一目标点具有关系而与第二目标点不具有关系,与第一目标点不具有关系而与第二目标点具有关系,与第一目标点、第二目标点均不具有关系这四种情况中的一种。预设情况可以是根据图挖掘中的子图模型确定的。
利用方法2200,可以确定关系图中与两个目标点之间的连接情况预设情况的查询点,从而可以识别关系图中具有某种特定子图模型结构的子图。
对于两个目标点,各个数据中相同的关键值至多只有一个。因此,在已经确定第一目标点的某个数据与第二目标点的某个数据中的关键值相等的情况下,处理单元可以不再对第一目标点的该数据与第二目标点的该数据进行传输,从而降低运算量。
也就是说,在进行图挖掘的过程中,每个处理单元可以用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值不相等的情况下,将所述第一数据传输至沿所述第一方向的下一个所述处理单元,并将所述第二数据传输至沿所述第二方向的下一个所述处理单元。
应当理解,目标点的每个数据中不同关键值对应的点集中的点的数量可以相同或不同。各个关键值对应的点集中可以包括一个或多个点。
在一些实施例中,不同的关键值对应的点集中的点的数量可以均为1,则目标点的每个数据可以仅包括关键值。目标点的不同关键值可以用于指示关系图中与该目标点具有关系的不同点,或者,目标点的不同关键值可以用于指示关系图中与该目标点不具有关系的不同点。目标点的关键值指示的点是否与目标点具有关系,例如可以根据需要确定的查询点与两个目标点之间的关系情况确定,本申请实施例对此不作限定。
在另一些实施例中,不同的关键值对应的点集中的点的数量可以为多个。目标点的每个数据可以包括关键值和该关键值对应的目标点的关系值组。该关键值对应的目标点的关系值组表示所述目标点与所述关键值对应的所述点集中的各个点是否具有关系。
也就是说,所述第一数据还包括所述关键值对应的所述第一目标点的第一关系值组,所述第二数据还包括所述关键值对应的所述第二目标点的第二关系值组。
所述处理矩阵中的每个处理单元还用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,对所述第一关系值组和所述第二关系值组的各个位分别进行预设运算,相等的所述关键值对应的所述第一关系值组和第二关系值组中相同的位对应于所述相等的关键值对应的所述点集中相同的所述点,每个位的所述预设运算的结果用于指示所述位对应的所述点与所述两个目标点之间的关系情况是否符合所述预设情况。
第一数据中的关键值与第二数据中的关键值相等,即第一数据中的关键值与第二数据中关键值指示相同的点集,从而,根据对第一关系值组与第二关系值组的按位预设运算的结果,可以确定查询点。
应当理解,该处理单元的处理结果还可以包括对第一关系值组和第二关系值组的各个位进行预设运算的结果。
示例性地,所述过滤矩阵中的每个过滤单元用于,根据接收的所述处理结果中,确定是否存在所述查询点,并输出存在所述查询点的所述处理结果。
也就是说,过滤矩阵可以用于对处理矩阵输出的处理结果进行筛选,在处理矩阵输出的各个处理结果中,确定存在查询点的处理结果。
过滤矩阵具体可以参见图10和图11的说明。压缩三角矩阵具体可以参见图12和图13的说明。
在每个数据集合中的所述数据组按照所述第一顺序排列的情况下,每个第一数据组中的所述关键值从小到大排列;在每个数据集合中的所述数据组按照所述第二顺序排列的情况下,每个第一数据组中的所述关键值从大到小排列。
每个数据集合中的数据可以按照关键值从小到大或从大到小的顺序排列。从而,数据集合中数据组可以按照划分后的格式存储。或者,执行方法2000的装置可以对数据集合进行划分。
示例性地,在每个数据集合中的所述数据组按照所述第一顺序排列的情况下,在S2210,可以获取至少一个第一数据和至少一个第二数据,该至少一个第一数据即为第一目标数据组,该至少一个第二数据即为第二目标数据组。在进行S2220的过程中,每次获取的至少一个第一数据可以作为一个第一目标数据组,每次获取的至少一个第二数据即为第二目标数据组。在迭代结束后,可以将未曾获取的至少一个第一数据作为一个或多个第一数据组,将未曾获取的至少一个第二数据作为一个或多个第二数据组。
在利用包括处理矩阵的数据处理装置进行数据处理的情况下,每次可以获取v个第一数据和/或v个第二数据,从而能够提高运算处理效率。
每个数据集合中各个数据按照关键值的大小顺序排列,使得数据组的划分更加灵活。
示例性地,在迭代过程中,可以将从第一数据集合中获取数量不超过v的至少一个数据作为第一目标数据组。
类似的,每个第二数据组中的所述关键值沿所述第二顺序从小到大排列。
图4是本申请实施例提供的一种数据处理方法的示意性结构图。数据处理方法500包括S510至S530。
在S510,获取关系图中两个目标点中第一目标点的至少一个键值对和第二目标点的至少一个键值对,所述目标点的每个键值对包括所述目标点的关键值和所述关键值对应的所述目标点的关系值组,不同的所述关键值对应于所述关系图中不同的点集,所述目标点的关键值对应的所述目标点的关系值组用于指示所述目标点与所述关键值对应的点集中的每个点是否具有关系,在所述目标点的每个键值对中所述关键值对应的点集中存在与所述目标点具有关系的点。
在S520,确定所述第一目标点的至少一个所述关键值与所述第二目标点的的至少一个所述关键值中的相等关键值。
在S530,根据所述相等关键值对应的所述第一目标点的所述关系值组和所述第二目标点的关系值组,确定所述关系图中的查询点,所述查询点与所述两个目标点之间的关系情况符合预设情况。
一般情况下,在关系图中,每个点仅与关系图中少量的点具有关系。也就是说,如果利用每个点的关系向量中的各个位表示该点与关系图中各个点是否具有关系,其中,“1”表示具有关系,“0”表示不具有关系,则各个点的关系向量可以认为是稀疏数据。
将关系图中的各个点划分为多个点集,每个点集对应于一个关键值。对于每个点,可以利用每个关键值对应的该点的关系值组表示该关键值对应的点集中各个点与该点是否具有关系。如果某个点集中存在与该点对应的点,则该点集对应的关键值以及该关键值对应的该点的关系值组可以作为该点的一个键值对。
利用关系图中各个点的键值对表示关系图,可以实现数据压缩。
方法500利用关系图中各个点的键值对,通过确定关系图中两个目标点的键值对中相等的关键值,并根据相等的关键值对应的该两个点的关系值组确定与该两个目标点之间的关系情况符合预设情况的查询点。在降低用于表示关系图的数据的数据量的同时,可以降低确定查询点所需的处理时间。
在图3所示方法中,基于队列的集合合并,确定两个点的列索引队列中相同列索引的方式确定与该两个点均具有关系的点。与图3所示的方法相比,方法500中,通过关键值对中的关键值表示存在与目标点之间有关系的点的点集,关键值对的数量一般小于列索引的数量,从而,对关键值大小的比较能够降低运算量,降低运算时间。
不同关键值对应的点集可以包括或不包括相同的点。不同关键值对应的点集不包括相同的点,可以进一步提高数据压缩的程度,降低存储用于表示关系图的各个点的键值对占用的存储空间。
在S510,可以获取关系图数据,并根据关系图数据,确定所述两个目标点中每个目标点的至少一个键值对。
关系图的各个点的键值对可以利用关系图数据表示。关系图数据包括行偏移向量、关键值向量、关系值向量。
关键值向量包括关系图中每个点的至少一个关键值。
行偏移向量用于指示每个点的至少一个关键值在关键值向量中的位置。
示例性地,行偏移向量可以包括多个偏移信息,每个偏移信息用于表示一个点的至少 一个关键值在关键值向量中的起始位置。关系图中各个点的偏移信息在行偏移向量中的顺序可以与该各个点的关键值在关键值向量中的顺序相同。
偏移信息可以是序号。每个点的关键值在关键值向量中连续。某个点的偏移信息用于指示该点的至少一个关键值在关键值向量中的起始的序号。
关系值向量包括关系图中每个点的关键值对应的点的关系值组。每个点的至少一个关键值在关键值向量中的顺序与每个点的关键值对应的点的关系值组在所述关系值向量中的顺序相同。
利用关系图数据表示关系图可以进一步提高数据压缩程度。
不同的关键值对应的点集中点的数量可以相等或不相等。示例性地,基数和偶数的关键值可以对应于点集中不同的点的数量。
可以根据关键值向量中位于目标点的至少一个关键值之前的各个关键值对应的点集中点的数量,确定目标点的各个关键值对应的关系值组在关系值向量的起始位置。之后,以该起始位置为起点,根据目标点的各个关键值对应的点集中点的数量,确定目标点的各个关键值对应的关系值组。
在不同的关键值对应的点集中点的数量相等的情况下,根据目标点的至少一个关键值在关键值向量中的位置,以及点集中相等的点的数量,即可确定目标点的各个关键值对应的关系值组。从而使得根据关系图数据确定目标点的键值对的方式更为简便。
在S530,可以对所述相等关键值对应的所述第一目标点的所述关系值组和所述第二目标点的关系值组的各个位分别进行预设运算。
相同的关键值对应的不同点的关系值组中,相同的位对应于该关键值对应的点集中相同的点。
相同的关键值对应的关系值组中,不同的位可以对应于该关键值对应的点集中不同的点,从而能够降低计算量。
相同的关键值对应的关系值组中,对每个位的预设运算的结果用于指示该位对应的点与两个目标点之间的关系情况是否符合预设情况。
对相同的关键值对应的两个目标点的关系值组中的各个位分别进行的预设运算,可以并行进行。从而,可以提高计算效率。
进一步地,可以利用处理矩阵对第一目标点的至少一个键值对和第二目标点的至少一个键值对进行处理。
在S520,可以将所述第一目标点的至少一个键值对依次输入位于处理矩阵的第一边缘的多个所述处理单元,将所述第二目标点的至少一个键值对依次输入位于所述处理矩阵的第二边缘的多个所述处理单元,以确定所述相等关键值,所述处理矩阵包括v×v个处理单元,v为大于1的正整数,第一目标数据组中的第i个键值对在第j个输入周期输入位于第一边缘的至少一个所述处理单元中沿第二方向的第j个所述处理单元的,所述第二目标数据中的第p个键值对是在第q个输入周期输入位于第二边缘的至少一个所述处理单元中沿第一方向的第q个所述处理单元的,所述第一边缘与所述第二边缘相邻,所述第一方向为从所述第二边缘指向所述处理矩阵内部且垂直所述第二边缘的方向,所述第二方向为从所述第一边缘指向所述处理矩阵内部且垂直所述第一边缘的方向,每个处理单元用于确定输入所述处理单元的所述第一目标点的所述关键值与所述第二目标点的的所述关键 值是否相等。
在S530,可以利用所述处理矩阵对所述相等关键值对应的所述第一目标点的所述关系值组和所述第二目标点的关系值组进行处理,以确定所述查询点,所述多个处理单元中的每个处理单元用于,在所述第一目标点的所述关键值与所述第二目标点的的所述关键值相等的情况下,根据输入所述处理单元的所述第一目标点的所述关系值组和所述第二目标点的关系值组,确定所述关系图中的查询点,所述查询点与所述两个目标点之间的关系情况符合预设情况。
所述多个处理单元中的每个处理单元还用于,按照所述输入周期,将所述第一目标点的所述键值对传输至沿所述第一方向的下一个所述处理单元,将第二目标点的所述键值对传输至沿所述第二方向的下一个所述处理单元。
处理矩阵可以理解为脉动阵列(systolic array),各个目标点的键值对按“流水”方式在处理矩阵的处理单元间有节奏地“流动”,所有的处理单元并行地对流经的数据进行处理可以提高处理速度,降低处理时间。
处理矩阵的具体结构可以参见图7至图15的说明。
每个处理单元可以具体用于:在所述第一目标点的所述关键值与所述第二目标点的的所述关键值不相等的情况下,将所述第一目标点的所述键值对传输至沿所述第一方向的下一个所述处理单元,将第二目标点的所述键值对传输至沿所述第二方向的下一个所述处理单元。
一个目标点的各个键值对中的关键值是不同的。在第一目标点的关键值对第二目标点的关键值不相等的情况下,处理单元对第一目标点、第二目标点的关键值进行传输。而在在第一目标点的关键值对第二目标点的关键值相等的情况下,不再进行键值对的传输。从而,可以不再将该相等的关键值与其他关键值进行比较,降低运算量。
在第一目标点的关键值对的数量大于v的情况下,仍然可以利用处理矩阵对第一目标点的关键值对和第二目标点的关键值对进行处理。具体地,可以参见图10和图15的说明。
图5是本申请实施例提供的一种生成关系图数据的方法的示意性流程图。生成关系图数据的方法600包括S601至S603。
关系图包括相互连接的多个点,以及用于连接点的边。关系图可以表示为关系图中各个点的邻居向量。每个邻居向量中包括相同的位数。每个点的邻居向量中的第i位用于表示该点与关系图中的第i个点是否存在关系。也就是所,邻居向量中的第i位对应于关系图中的第i个点。可以用“0”表示不存在关系,用“1”表示存在关系。
对于关系图110,点0对应的邻居向量为N(0)=01010011,点1对应的邻居向量为N(1)=10101110。
在S601,可以按照预设方式将邻居向量划分为多个组。
每个组包括的位数可以是相同的。
例如,可以按照邻居向量中各个位的序号除以某个除数的余数,将邻居向量划分为多个组,不同的组对应于不同的余数。该除数可以是预设值。
或者,可以按照从左到右的顺序,按照固定的位数对各个邻居向量进行划分。对于邻居向量的位数不能整除组的位数的情况,可以在邻居向量的最后补“0”,从而使得各个组的位数相同。
下面以按照位数2对邻居向量进行划分为例进行说明。
各个邻居向量中对应于点0和点1的位为一组,对应于对应于点2和点3的位为一组,以此类推,完成对邻居向量的划分。
划分后的邻居向量N(0)包括4个组,依次为01、01、00、11。划分后的邻居向量N(1)包括4个组,依次为10、10、11、11。
在S602,为每个组分配关键值(key)。
一个组的关键值也可以理解为该组的标识。不同的邻居向量划分后的组中,包括邻居向量中相同位的组,关键值相同。示例性地,每个组的关键值可以为该组在邻居向量划分后的多个组中的序号。
对N(0)划分得到4个组01、01、00、11对应的关键值分别为0、1、2、3。对N(1)划分得到4个组10、10、11、11对应的关键值分别为0、1、2、3。
每个点的关键值数组可以包括该组的多个关键值,关键值数组中的关键值对应的组的各个位不全为0。也就是说,在形成关键值数组时,舍弃各个位均为0的组对应的关键值。
点0的关键值数组可以包括0、1、3,点1的关键值数组可以包括0、1、2、3。
关键值数组中,各个关键值可以按照预设顺序排列。示例性地,关键值数组中,各个关键值可以按照升序或降序排列。下面以关键值数组中,各个关键值按照升序排列为例记性说明。
每个点对应的关键值数组中的一个关键值和该关键值对应的组可以理解为一个键值对(<key,value>pair,KVP)。也就是说,每个点的键值对可以为一个或多个。
在S603,生成行关系图数据。
关系图数据包括行偏移、关键值向量、关系值向量。
关系值向量包括关系图中各个点的关键值数组。关系值向量中的第j个关系值数组为关系图中第j个点的关系值数组。
行偏移用于指示关系图中各个点的关系值数组在关系值向量中的起始位置。行偏移也可以称为row数据。行偏移的第j个数指示了关系图中第j个点的关键值数组在关系值向量中的起始位置。
关系值向量包括各个关键值数组对应的组。关系值向量中组的顺序与关键值数组的顺序相同。关系值向量也可以称为value数据。
关键值向量也可以称为key数据。
关系图数据的数据格式可以称为使用压缩稀疏行的位图(bitmap with compressed sparse row,BCSR)格式。
邻居矩阵包括多行,每一场用于表示一个点的邻居向量。通过方法600,可以实现对邻居矩阵的压缩。特别是在邻居矩阵中“0”的比例较高时,具有较好的压缩效果。
在图挖掘的过程中,需要计算两个点邻居集合的交集。对于通过方法600生成的关系图数据,可以利用图6所示的数据处理方法进行处理。
图6是本申请实施例提供的一种数据处理方法的示意性流程图。方法700包括S701至S702。
在S701之前,可以获取关系图数据。
关系图数据用于表示关系图中各个点之间是否存在关系。关系图数据包括行偏移、关 键值向量、关系值向量。
在S701,根据关系图数据,确定两个点相等的关键值。
该两个点可以是根据子图模型确定的。
根据关系图数据,可以确定关系图中各个点的键值对,即确定各个点的关键值数组,以及各个点的关键值组中各个关键值对应的组。
对于关系图110的关系图数据,可以确定点0的关键值数组“0、1、3”,点1的关键值数组“0、1、2、3”。对于点0,关键值0对应的组为01,关键值1对应的组为01,关键值3对应的组为11。对于点1,关键值0对应的组为10,关键值1对应的组为10,关键值2对应的组为11,关键值3对应的组为10。
从而,对于点0和点1,相等的关键值包括0、1、3。
在S702,确定该两个点在每个相等的关键值对应的组中按照预设计算方式得到的计算结果中值为1的位。
不同的预设计算方式对应于与该两个点之间关系的不同情况。例如,两个点的共同邻居,即与两个点均存在关系的情况,预设计算方式为a·b,其中,a、b表示该两个点在某个位的值;与点A具有关系,与点B不具有关系的情况,预设计算方式为a·b,其中,a表示点A在某个位的值,b表示对点B在该某个位的值取反。从而,计算结果为1的位,可以理解为与该两个点之间关系情况满足预设情况的位。下面,以确定两个点的共同邻居为例进行说明。
对于关键值0,点0对应的组为01,点1对应的组为10。对“01”和“10”进行按位比较,确定按位与运算结果为1的位。
示例性地,可以对两个点在每个相等的关键值对应的组进行按位与运算,计算结果为1的位即为与该两个均具有关系的点对应的位。
根据相等的关键值,可以确定该相等的关键值对应的点集。再根据该计算结果为1的位在所属的组中的位置,可以确定该位对应的关系图中的点。
示例性地,还可以获取对邻居矩阵的组的划分方式。根据对邻居矩阵的组的划分方式,以及包括计算结果为1的位的组对应的关键值,以及该计算结果为1的位在所属的组中的位置,确定该位对应的关系图中的点。
例如,对点0和点1在关键值为0对应的组、关键值为1对应的组、关键值为3对应的组分别进行比较,可以确定关键值为0和1对应的组不存在计算结果为1的位,关键值为3的组中计算结果为1的位为组中的第0位。关键值3对应的组在邻居矩阵中对应的组包括点6和点7,其中,第0位对应的点为点6。从而,可以确定点0和点1的共同邻居为点6。
通过方法700,通过确定两个点对应的关键值数组中相等的关键值,并确定两个点在每个相等的关键值对应的组中相等的位,可以确定两个点的共同邻居。
确定两个点对应的关键值数组中相等的关键值,可以利用图3所示的基于队列的集合合并实现。确定两个点在每个相等的关键值对应的组中相等的位,可以利用按位比较实现。
按位比较过程中,对多个位的值的比较可以并行进行,计算效率较高。
相比于对图2中CSR格式的数据中两个点对应的不同列索引部分进行比较,方法700中确定两个点对应的关键值数组中相等的关键值,能够有效减少需要进行比较的数的数量, 提高处理效率。
上文结合图1至图6的描述了本申请实施例的方法实施例,下面结合图7至图10,描述本申请实施例的装置实施例。应理解,方法实施例的描述与装置实施例的描述相互对应,因此,未详细描述的部分可以参见前面方法实施例。
图7是本申请实施例提供的一种数据处理装置的示意性结构图。
数据处理装置800包括处理矩阵(processing matrix,PM)810、过滤阵列(filter array,FA)820、压缩三角(compact triangle,CT)830。
处理矩阵810包括v×v个处理单元(processing element,PE),用于对两个点的键值对进行处理。
每个点的键值对包括该点的关键值数组,以及各个关键值对应的组。处理矩阵810中最左侧的v个处理单元可以接收关系图中点A的多个键值对,不同的处理单元用于接收A的键值对中不同的键值对。处理矩阵810中最上侧的v个处理单元可以接收关系图中点B的多个键值对,不同的处理单元用于接收B的键值对中不同的键值对。点A和点B是不同的点。
最左侧的v个处理单元沿从上至下的顺序,在上一个处理单元接收键值对之后一个时钟周期后,下一个处理单元接收键值对。最上侧的v个处理单元沿从左至右的顺序,在上一个处理单元接收键值对之后一个时钟周期后,下一个处理单元接收键值对。
处理单元用于对接收的两个点的键值对中的关键值进行比较。也可以理解为,处理单元用于对接收的两个点的键值对中的关键值进行匹配。如果接收的两个点的键值对中的关键值相等,则可以认为该两个点的关键值匹配成功。
如果关键值相等,则处理单元对两个点的键值对中的组进行预设计算方式的计算,以确定组中符合该预设计算方式对应的关系情况的位,并输出该关键值以及计算结果输出。
如果关键值不相等,则处理单元对两个点的键值对中的组不再进行预设计算方式的计算比较,并在下一个时钟周期将从左侧相邻的处理单元接收的键值对传输至右侧的相邻的处理单元,将从上侧相邻的处理单元接收的关键值以及该关键值对应的组传输至下侧的相邻的处理单元。
如图8中的(a)所示,在时间点t0,点A的键值对A0输入从左侧第一行第一列的处理单元PE 0,点B的键值对B0从上侧输入PE 0。
PE 0对键值对A0与键值对B0中的关键值进行比较。如果键值对A0与键值对B0中的关键值相等,PE 0对键值对A0与键值对B0中的组的各个位进行预设计算方式的计算。PE 0对键值对A0和键值对B0不再进行传输。处理矩阵810可以将该相等的关键值以及计算结果输出。
如果键值对A0与键值对B0中的关键值不相等,PE 0可以不再对键值对A0与键值对B0中的组中进行预设计算方式的计算。
如图8中的(B)所示,PE 0可以在时间点t1将键值对A0传输至右侧即第一行第二列的PE 1,将键值对B0传输至下侧即第二行第一列的PE 2。并且,在时间点t1,点A的键值对A1输入从左侧第二行第一列的处理单元PE 2,点B的键值对B1从上侧输入PE1。
从而,PE 1可以对键值对A0与键值对B1中的关键值进行比较。如果键值对A0与 键值对B1中的关键值相等,PE 1对键值对A0与键值对B1中的组中的各个位进行进行预设计算方式的计算。PE 1对键值对A0和键值对B1不再进行传输。处理矩阵810可以将该关键值以及计算结果输出。
如果键值对A0与键值对B1中的关键值不相等,PE 1可以不再确定键值对A0与键值对B1中的组中相等的位,并可以在时间点t2将键值对A0传输至右侧的PE,将键值对B1传输至下侧的PE 3。如果PE 1右侧不存在其他PE,则PE 1可以可以删除从左侧接收的键值对,即删除键值对A0。
PE 2可以对键值对A1与键值对B0中的关键值进行比较。如果键值对A1与键值对B0中的关键值相等,PE 2对键值对A1与键值对B0中的组各个位进行进行预设计算方式的计算。PE 2对键值对A1和键值对B0不再进行传输。处理矩阵810可以将该相等的关键值以及对各个位计算结果输出。
如果键值对A1与键值对B0中的关键值不相等,PE 2可以不再对键值对A1与键值对B0中的组进行预设计算方式的计算,并可以在时间点t2将键值对A1传输至右侧的PE3,将键值对B1传输至下侧的PE。如果PE 2下侧不存在其他PE,则PE 2可以可以删除从上侧接收的键值对,即删除键值对B0。
如果键值对A0与键值对B1、键值对A1与键值对B0中的关键值均不相等,则在时间点t2,键值对A1与键值对B1传输至PE 3,从而,PE 3可以对键值对A1与键值对B1中的关键值进行比较。
图9是本申请实施例提供的一种处理单元的示意性结构图。
处理单元1000包括比较单元1010和计算单元1020。
比较单元1010用于比较从上方输入的关键值kt与从左侧输入的关键值kl。比较单元1010可以理解为键值比较器。
计算单元1020用于,在关键值kt与关键值kl相等的情况下,对从上方输入的组vt与从左侧输入的组vl的各个位分别进行预设计算方式的计算。
在关键值kt与关键值kl相等的情况下,处理单元1000的输出数据可以包括关键值kt以及对从上方和从左侧输入处理单元1000的两个组的各个位分别进行计算的结果。处理单元1000的输出数据可以传输至位于处理单元1000右侧的处理单元。
比较单元1010还可以接收从左侧输入的有效指示1。有效指示1用于指示处理单元1000所在行位于处理单元1000左侧的处理单元对输入该行的点A的关键值是否与点B的关键值匹配成功。
在处理单元1000接收的有效指示1指示匹配成功的情况下,处理单元1000中的比较单元1010、计算单元1020可以不再进行运算。从而,可以降低计算量。
处理单元1000还可以输出有效指示2。在有效指示1指示匹配成功,或关键值kt与kl相等的情况下,处理单元100输出的有效指示2指示处理单元1000所在行已经匹配成功。处理单元1000输出的有效指示2可以传输至处理单元1000右侧的处理单元。示例性地,有效指示1、有效指示2为“1”,可以表示匹配成功;反之,有效指示1、有效指示2为“0”,可以表示未匹配成功。
也就是说,在有效指示1指示未匹配成功的情况下,处理单元1000可以接收上方输入的关键值kt、组vt,以及从左侧输入的关键值kl、组vl。比较单元1010用于对关键值 kt与关键值kl进行比较。
在关键值kt与关键值kl不相等的情况下,处理单元1000输出的有效指示2有关于指示未匹配成功。
在关键值kt与关键值kl相等的情况下,计算单元1020用于,对组vt与组vl的各个位分别进行预设计算方式的计算。在关键值kt与关键值kl相等的情况下,处理单元1000输出的有效指示2有关于指示匹配成功,并且,处理单元1000输出匹配成功的关键值以及组的计算结果。
在有效指示1指示匹配成功的情况下,处理单元1000可以接收左侧处理单元输出的关键值以及组的计算结果,并输出指示信息2以指示匹配成功,并输出该关键值以及组的计算结果。
点A和/或点B的键值对的数量可能超过v个。在点A、点B中各个或多个点的键值对的数量可能超过v个的情况下,可以将超过v个的键值对划分为多个键值对集合,每个键值对集合中键值对的数量不超过v。
从而,某些键值对集合需要多侧输入处理矩阵810,存在某个键值对集合需要与其他多个键值对集合中的关键值进行比较的情况,如图10所示。
处理矩阵810可以包括3×3个处理单元,点A的关键值包括1、4、7、8、10等,点B的关键值包括1、2、3、4、5、6、7、8、9等。
将点A的多个键值对、点B的多个键值对分别按照从关键值小到大排列,并按照每个键值对集合中包括3个键值对的方式分别对点A的多个键值对、点B的多个键值进行划分。点A的2个关键值集合中关键值分别为1、4、7和8、10,点B的3个关键值集合中关键值分别为1-3、4-6和7-9。
首先,利用处理矩阵810,对点A的多个键值对集合中关键值最小的关键值集合、点B的多个键值对集合中关键值最小的键值对集合进行处理,也就是将包括关键值1、4、7的点A的键值对集合、包括关键值1-3的点B的键值对集合分别从左侧和上侧输入处理矩阵810,如图10所示。
之后,可以对输入处理矩阵810的两个键值对集合中最大的关键值进行大小比较。
分别属于两个键值对集合的两个最大关键值大小不相等的情况下,将两个最大关键值中较大的关键值所属的键值对集合再次输入处理矩阵810,并将另一个点的下一个最小关键值的键值对集合输入处理矩阵810。
如图10所示,点A的键值对集合中关键值7大于点B的键值对集合中的关键值3,将包括关键值1、4、7点A的键值对集合、包括关键值4-6的点B的键值对集合,分别从左侧和上侧输入处理矩阵810。
包括关键值1、4、7的点A的键值对集合、包括关键值1-3的点B的键值对集合在同一个时钟周期开始输入处理矩阵810。包括关键值1、4、7的点A的键值对集合、包括关键值4-6的点B的键值对集合也是在同一个时钟周期开始输入处理矩阵810。
应当理解,从包括关键值1-3的点B的键值对集合开始输入处理矩阵810的时间,到包括关键值4-6的点B的键值对集合开始输入处理矩阵810的时间,之间的时间差可以是一个或多个时钟周期。
过滤阵列820可以用于对某个键值对集合与其他多个键值对集合中的关键值进行比 较的结果进行合并,从而输出对应于该某个键值对的中间结果。
过滤阵列820包括v个过滤单元(filter unit,FU)。示例性地,每个过滤单元可以对应于处理矩阵810中的一行。在处理矩阵810左侧重复输入某个键值对集合的情况下,过滤阵列820用于将该键值对集合与其他多个键值对集合进行关键值比较的结果进行合并。
另外,过滤单元还可以用于根据组vt与组vl的各个位的计算结果,确定是否存在符合预设关系情况的点。符合预设关系情况的点,即计算结果为1的位对应的点。
图11是本申请实施例提供的一种过滤单元的示意性结构图。
过滤单元1200包括逻辑处理单元1210、有效指示更新单元1220、有效指示寄存器1230、结果寄存器1240。
逻辑处理单元1210用于确定过滤单元1200对应的处理矩阵810的行中处理单元1000输出的对组的计算结果是否为全“0”。
在确定对组的计算结果为全“0”的情况下,可以认为该行的数据无效。在对组的计算结果为全“0”的情况下,逻辑处理单元1210可以输出“0”;反之,在对组的计算结果为不全“0”的情况下,逻辑处理单元1210可以输出“1”。
有效指示更新单元1220可以接收逻辑处理单元1210的输出,以及过滤单元1200对应的处理矩阵810的行中最后一个处理单元1000输出的有效指示2。
在该逻辑处理单元1210对应的处理矩阵810的行中最后一个处理单元1000输出有效指示2为“0”(即指示过滤单元1200对应的处理单元1000的行对关键值未匹配成功的),或逻辑处理单元1210输出“0”的情况下,有效指示更新单元1220输出的结果为“0”,即指示输出无效。
有效指示寄存器1230用于存储有效指示更新单元1220输出的结果。
结果寄存器1240用于在逻辑处理单元1210输出为“1”的情况下,存储组的计算结果。
过滤单元1200还可以获取信号F与信号L。信号F与信号L可以存储在寄存器中。信号F的初始值为1,信号L的初始值为0。
在将点A的键值对集合从左侧输入处理矩阵810,将点B的键值对集合从上方输入处理矩阵810之后,可以确定点A的键值对集合中最大关键值与点B的键值对集合中最大关键值之间的大小关系。
示例性地,键值对集合输入周期可以等于每个键值对集合中各个键值对输入处理矩阵的周期的v倍。在点A对应的最大关键值大于点B对应的最大关键值的情况下,信号F被设置为在下一个键值对集合输入周期值为“0”,信号L被立刻设置为“0”;反之,信号F被设置为在下一个键值对集合输入周期值为“1”,信号L被立刻设置为“1”。
或者,在点A对应的最大关键值大于点B对应的最大关键值的情况下,信号F被设置为在下一个键值对集合输入周期值为“0”,信号L被立刻设置为“0”;在点A对应的最大关键值等于点B对应的最大关键值的情况下,信号F被设置为在下一个键值对集合输入周期值为“1”,信号L被立刻设置为“0”;在点A对应的最大关键值小于点B对应的最大关键值的情况下,信号F被设置为在下一个键值对集合输入周期值为“1”,信号L被立刻设置为“1”。
根据值为“0”的信号F输入处理矩阵810的点A的键值对集合是重复输入的。有效指示更新单元1220可以根据逻辑处理单元1210的输出、过滤单元1200对应的处理矩阵810 的行中最后一个处理单元1000输出的有效指示2、以及有效指示寄存器1230中存储的结果,对有效指示3进行更新。
根据逻辑处理单元1210的输出、过滤单元1200对应的处理矩阵810的行中最后一个处理单元1000输出的有效指示2,可以确定新的有效指示结果。在逻辑处理单元1210的输出、过滤单元1200对应的处理矩阵810的行中最后一个处理单元1000输出的有效指示2均为“1”的情况下,新的有效指示结果可以是有效,即有效指示3指示有效;反之,在逻辑处理单元1210的输出、过滤单元1200对应的处理矩阵810的行中最后一个处理单元1000输出的有效指示2中的任一个为“0”的情况下,新的有效指示结果可以是无效,即有效指示3指示无效。
在有效指示寄存器1230中存储的结果可以理解为历史有效指示结果。
在新的有效指示结果与历史有效指示结果中任一个为有效的情况下,有效指示更新单元1220输出的结果可以是“1”,即指示有效。也就是所,可以对新的有效指示结果与历史有效指示结果进行与运算。
有效指示寄存器1230用于存储有效指示更新单元1220输出的结果,实现对有效指示寄存器1230的输出(即有效指示3)的更新。
在逻辑处理单元1210输出为“1”的情况下,过滤单元1200将新的组的计算结果写入结果寄存器1240。
信号L设置为“1”的情况下,输入处理矩阵810的点A的键值对集合是最后一次输入处理矩阵810。因此,在信号L设置为“1”的情况下,可以在过滤单元1200完成对结果寄存器1240中存储的组的计算结果的更新之后,结果寄存器1240可以输出存储的数据。
示例性地,在信号L设置为“1”的情况下,如果有效指示寄存器1230中存储的有效指示结果为“1”,则结果寄存器1240输出存储的数据。反之,如果有效指示寄存器1230中存储的有效指示结果为“0”,则结果寄存器1240可以不进行数据输出。
之后,结果寄存器1240和有效指示寄存器1230可以清空存储的数据。
在下一个键值对集合输入周期,如果信号F为“1”,则在该周期输入处理矩阵810的点A的键值对集合是非重复输入的。结果寄存器1240可以在信号F为“1”的情况下可以进行初始化。
在一些实施例中,初始化可以是结果寄存器1240存储的数据清空。
在另一些实施例中,初始化可以是在该过滤单元1200对应的处理矩阵的行输入的点A的键值对中的组写入结果寄存器1240。从而,在逻辑处理单元1210输出为“1”的情况下,可以将新的组的计算结果与结果寄存器1240中的数据进行与运算,从而实现对组的计算结果的更新。
信号L设置为“1”的情况下,输入处理矩阵810的点A的键值对集合是还需要再次输入处理矩阵810。过滤单元1200的输出为无效状态。
数据处理装置800还可以包括控制器(未示出)。控制器用于设置信号F和信号L。
压缩三角830包括形成直角三角形的v×(v+1)/2个压缩单元(compactionunit,CU),用于对过滤阵列820输出的数据进行压缩。
在压缩三角830中,第一行中的压缩单元为1个,从上到下的每一行压缩单元的数量逐渐减小,相邻两行间压缩单元的数量相差一个。
如图12所示,压缩三角830的每一行对应于过滤矩阵820中的一个过滤单元。
对于压缩三角830的每一行,在对应的过滤单元没有输出有效数据的情况下,压缩三角830的该行中的每个压缩单元接收该压缩单元上方的压缩单元中的数据;在对应的过滤单元输出有效数据的情况下,压缩三角830的该行中最左侧的压缩单元用于接收该过滤单元输出的点A和点B相等的关键值以及该关键值对应的组的计算结果,其他压缩单元接收该压缩单元左上方的压缩单元中的数据。
在过滤单元输出的有效指示3为“1”的情况下,可以理解为过滤单元输出有效数据。
图13是本申请实施例提供的一种压缩单元的示意性结构图。
压缩单元1400包括接口1401、接口1402、接口1403、寄存器1404。
接口1401用于连接压缩单元1400上方的压缩单元。接口1402用于连接压缩单元1400左上方的压缩单元。接口1403用于接收压缩单元1400所在行对应的过滤单元输出的有效指示3。
在有效指示3为“1”的情况下,接口1401将压缩单元1400上方的压缩单元中的数据存储在寄存器1404中。在有效指示3为“0”的情况下,接口1402将压缩单元1400左上方的压缩单元中的数据存储在寄存器1404中。
示例性地,过滤矩阵820中第1、3、4个过滤单元输出有效数据,其中,第1个过滤单元输出的有效数据包括关键值1以及关键值1对应的组的计算结果,第3个过滤单元输出的有效数据包括关键值3以及关键值3对应的组的计算结果,第4个过滤单元输出的有效数据包括关键值9以及关键值9对应的组的计算结果.
压缩三角830中的第1、3、4行分别对应于第1、3、4个过滤单元。压缩三角830中的第1、2、3、4行中压缩单元的数量分别为1、2、3、4。
第1个过滤单元输出有效数据。压缩三角830中第1行的压缩单元接收第1个过滤单元输出的有效数据。
第2个过滤单元未输出的有效数据。压缩三角830中第2行的各个压缩单元从压缩三角830中第1行中位于该过滤单元上方的压缩单元获取数据。压缩三角830中第2行的第1个压缩单元获取第1行的压缩单元中的数据;第2行的第1个压缩单元上方不存在压缩单元,不进行数据获取。
第3个过滤单元输出有效数据。压缩三角830中第3行第1个(即第3行最左侧)压缩单元接收第3个过滤单元输出的有效数据。压缩三角830中第3行其他压缩单元从压缩三角830中第2行中位于该过滤单元左上方的压缩单元获取数据,即压缩三角830中第3行的第2、3个压缩单元分别获取第2行的2个压缩单元中的数据。
第4个过滤单元输出有效数据。压缩三角830中第4行第1个压缩单元接收第4个过滤单元输出的有效数据。压缩三角830中第4行其他压缩单元从压缩三角830中第2行中位于该过滤单元左上方的压缩单元获取数据,即压缩三角830中第4行的第2-4个压缩单元分别获取第3行的3个压缩单元中的数据。
由此方式,压缩三角830最下方一行的压缩单元可以将获取的数据输出,从而,输出结果即为压缩后的数据。
如图14所示,处理矩阵810中,对于任意一行,从第1列的处理单元接收数据(即接收点A和点B的键值对),到第v列的处理单元接收数据(接收的数据可以是点A和 点B的键值对,也可以是键值对和该键值对对应的组的计算结果)需要经过v个时钟周期(clock cycles,CCs)。而从第1行第1列的处理单元数据输入,到第v行第1列的处理单元数据输入也需要经过v个时钟周期。
过滤矩阵820中,各个过滤单元均利用一个时钟周期对该过滤单元对应的处理矩阵810的行输出的数据进行过滤。
压缩三角830中,从第一行接收数据到最后一行输出数据需要经过v个时钟周期,从最后一行接收数据到最后一行输出数据需要经过1个时钟周期。
从而,压缩三角830可以将各个行对应的数据在同一时钟周期输出。
数据处理装置800可以基于双列直插式存储模块(dual inline memory module,DIMM)实现,例如可以设置在中低负载双列直插式存储模块(load reduced dual inline memory module,LRDIMM),从而形成近存储计算架构。
数据处理装置800还可以包括计数单元。计数单元用于对符合要求的子图进行计数。
也就是说,计数单元可以记录压缩三角830输出的关键值的数量。并且,对于不同的输入数据,计数单元可以累计计数。
计数单元可以包括加法器和寄存器。
图15是本申请实施例提供的一种数据处理系统的示意性结构图。
数据处理系统1600包括控制器1610和处理矩阵1620。
处理矩阵1620包括v×v个处理单元,v为大于1的正整数。
控制器1610用于,将关系图的两个目标点中第一目标点的至少一个键值对沿第二方向按照输入周期依次输入位于所述处理矩阵1620的第一边缘的多个所述处理单元,将所述两个目标点中第二目标点的至少一个键值对沿第一方向按照所述输入周期依次输入位于所述处理矩阵1620的第二边缘的多个所述处理单元。
输入周期可以是时钟周期,也可以是时钟周期的正整数倍。
所述第一目标点的至少一个键值对与所述第二目标点的至少一个键值对是在同一个所述输入周期开始输入所述处理矩阵的。
所述第一边缘与所述第二边缘相邻,所述第一方向为远离所述第二边缘的方向,所述第二方向为远离所述第一边缘的方向。第一边缘与第二边缘可以理解为处理矩阵1620的两个相邻的边。
所述目标点的每个键值对包括所述目标点的关键值和所述关键值对应的所述目标点的关系值组,不同的所述关键值对应于所述关系图中不同的点集,所述目标点的关键值对应的所述目标点的关系值组用于指示所述目标点与所述关键值对应的点集中的每个点是否具有关系,在所述目标点的每个键值对中所述关键值对应的点集中存在与所述目标点具有关系的点。
所述多个处理单元中的每个处理单元用于:确定输入所述处理单元的所述第一目标点的所述关键值与所述第二目标点的的所述关键值是否相等。
每个处理单元还用于:在所述第一目标点的所述关键值与所述第二目标点的的所述关键值相等的情况下,根据输入所述处理单元的所述第一目标点的所述关系值组和所述第二目标点的关系值组,确定所述关系图中的查询点,所述查询点与所述两个目标点之间的关系情况符合预设情况。
每个处理单元还用于:按照所述输入周期,将所述第一目标点的所述键值对传输至沿第一方向的下一个所述处理单元,将第二目标点的所述键值对传输至沿第二方向的下一个所述处理单元,所述第一方向与所述第一边缘垂直,所述第二方向与所述第二边缘垂直。
处理矩阵1620可以理解为脉动阵列结构(systolic array architecture)。在脉动阵列结构中,数据按预先确定的“流水”方式在阵列的处理单元间有节奏地“流动”。在数据流动的过程中,所有的处理单元同时并行地对流经它的数据进行处理,因而它可以达到很高的并行处理速度。
可选地,所述多个处理单元中的每个处理单元具体用于:在所述第一目标点的所述关键值与所述第二目标点的的所述关键值不相等的情况下,将所述第一目标点的所述键值对传输至沿所述第一方向的下一个所述处理单元,将第二目标点的所述键值对传输至沿所述第二方向的下一个所述处理单元。
一个目标点的各个键值对中的关键值是不同的。在第一目标点的关键值对第二目标点的关键值不相等的情况下,处理单元对第一目标点、第二目标点的关键值进行传输。而在在第一目标点的关键值对第二目标点的关键值相等的情况下,不再进行键值对的传输。从而,可以不再将该相等的关键值与其他关键值进行比较,降低运算量。
可选地,控制器1610还用于,根据关系图数据,确定所述第一目标点的至少一个键值对和第二目标点的至少一个键值对,所述关系图数据包括行偏移向量、关键值向量、关系值向量,所述关键值向量包括所述关系图中多个点中每个点的至少一个关键值,所述行偏移向量用于指示每个点的至少一个关键值在所述关键值向量中的位置,所述关系值向量包括所述多个点中每个点的关键值对应的点的关系值组,每个点的至少一个关键值在所述关键值向量中的顺序与所述每个点的关键值对应的点的关系值组在所述关系值向量中的顺序相同。
可选地,不同的所述关键值对应的所述点集中点的数量相等。
可选地,所述多个处理单元中的每个处理单元用于:在所述第一目标点的所述关键值与所述第二目标点的的所述关键值相等的情况下,对所述相等关键值对应的所述第一目标点的所述关系值组和所述第二目标点的关系值组的各个位分别进行预设运算,每个关键值对应的不同的所述点的所述关系值组中相同的位对应于所述关键值对应的所述点集中相同的所述点,每个位的所述预设运算的结果用于指示所述位对应的所述点与所述两个目标点之间的关系情况是否符合预设情况。
可选地,所述第一目标点的至少一个键值对的数量大于v,控制器1610还用于,
按照所述键值对中关键值的大小顺序,将所述第一目标点的至少一个键值对划分为多个第一键值对组,并将所述第二目标点的至少一个键值对划分为至少一个第二键值对组,所述多个第一键值对组和所述至少一个第二键值对组中的每个键值对组中所述键值对的数量小于或等于v,其中,所述多个第一键值对组中关键值最小的键值对组为第三键值对组,所述多个第二键值对组中关键值最小的键值对组为所述第四键值对组。
控制器1610还用于,进行多次迭代,直到所述第三键值对组中的最大关键值大于所述第四键值对组中的最大关键值。
每次迭代包括:将所述第三键值对组中的多个所述键值对按照输入周期依次输入所述多个处理单元中的位于第一边缘的多个所述处理单元,将所述第四键值对组中的多个所述 键值对按照输入周期依次输入所述多个处理单元中的位于第一边缘的多个所述处理单元;在所述第三键值对组中最大的关键值小于所述第四键值对组中最大的关键值的情况下,将按照关键值从小到大顺序排列的所述多个所述第一键值对组中所述第三键值对组的下一个第一键值对组作为所述第三键值对组。
所述控制器具体用于,在进行所述多次迭代之前以及每次迭代之后,将所述第三键值对组的多个键值对按照所述输入周期依次输入位于所述处理矩阵的第一边缘的多个所述处理单元,将所述第四键值对组的多个键值对按照所述输入周期依次输入所述多个处理单元中的位于所述处理矩阵的第二边缘的多个所述处理单元。
在某个目标点的键值对的数量大于处理矩阵中一个边缘的键值对数量的情况下,可以按照该某个目标点的键值对中关键值的大小,对该目标点的键值对进行分组。之后,可以按照关键值从小到大的顺序,选取关键值最小的键值对组输入处理矩阵。在该某个目标点输入处理矩阵的键值对组中的最大关键值小于另一个目标点输入处理矩阵的键值对组的最大关键值的情况下,将该某个目标点的下一个键值对组输入处理矩阵,与该另一个目标点再次输入处理矩阵的键值对组进行比较。
从而,在目标点的键值对的数量大于处理矩阵中一个边缘的键值对数量的情况下,降低运算量。
数据处理系统1600中的处理矩阵1620的结构具体可以参见图7至图9的说明。装置1600还可以包括图7所示的过滤阵列820、压缩三角830。
图16是本申请实施例提供的一种数据处理系统的示意性结构图。
数据处理系统1700包括装置800和存储器1710。其中,存储器用于存储关系图数据和装置800对关系图数据进行处理得到的处理结果。
装置800可以是ASIC。存储器1710可以理解为装置800的片外存储器。示例性地,存储器1710可以是装置800的片外内存。
因此,数据处理系统1700可以理解为存算分离的计算系统。
图17是本申请实施例提供的一种数据处理系统的示意性结构图。
数据处理系统1800包括地址索引模块1810、两个存储等级(rank)级近存储计算单元(nearmemory computing,NMC)1820、存储模块1830。
存储模块1830可以包括多个动态随机存取存储器(dynamic random access memory,DRAM)芯片(chip)1831。
在采用8字节(x8)DRAM chip的情况下,存储模块1830中x8 DRAM chip的数量可以是8个,半个rank为一个NMC提供数据(32字节(byte))。也就是说,一个rank的64byte数据被拆分为两个32byte,分别为一个NMC提供数据。
地址索引模块1810可以是注册时钟驱动器(registering clock driver,RCD)。
存储模块1830可以包括KVP数据区、KVP地址索引区和结果数据区。
KVP数据区用于存储关系图中各个点的KVP。关系图中各个点的KVP可以是根据关系图数据确定的。关系图数据包括行偏移向量、关键值向量和关系值向量确定的。
示例性地,各个点的KVP可以存储为32位(bit),其中,关键值可以位于该32位中较低的位,关键值对应的关系值组可以位于该32位中较高的位。32位即4字节,一个x8 DRAM chip可以同时输出两个节点的键值对。
KVP地址索引区用于存储行偏移(row数据)。KVP地址索引区中存储的行偏移可以是关系图数据中的行偏移向量。也就是说,行偏移中的各个数以索引地址的形式存在KVP地址索引区中。关系图中的每个点对应于行偏移中的一个偏移信息。根据每个点的偏移信息,可以确定该点的KVP在KVP数据区的索引地址。
结果数据区用于存储NMC 1820的中间计算结果、最终计算结果等。示例性地,结果数据区可以包括有效指示各个过滤单元1200的寄存器1230和结果寄存器1240、各个压缩单元1400的寄存器1404等。
在需要确定关系图中与两个目标点具有某种关系情况的查询点的情况下,地址索引模块1810用于确定根据每个目标点在行偏移中的序号,确定该目标点的KVP的索引地址。
行偏移中的各个数可以是连续存储的。行偏移中的每个数对应于关系图中的一个点。地址索引模块1810可以基于基地址+目标点的序号×偏移量的方式确定该点的索引地址。
索引地址可以包括芯片选择(chip select,CS)信号与指令地址信号(command/address,C/A)。
NMC 1820可以根据索引地址确定该点的KVP在KVP数据区中存储的位置。NMC 1820可以执行方法500、方法700。NMC 1820可以包括数据处理装置800,或者,NMC 1820可以是数据处理系统1600。NMC 1820可以称为DIMMining。
数据处理系统1800可以是LRDIMM。
数据处理系统1800是在不改变LRDIMM传统存储功能、不修改动态随机存取存储器(dynamic random access memory,DRAM)芯片(chip)内部电路的前提下实现的。
由于多个rank之间、同一个rank内不同NMC完全并行且不占用外部带宽,理论情况下数据处理系统1800的有效带宽利用率可以达到一个通道中rank数量的2倍。
示例性地,NMC 1820包括控制器、数据转发单元和数据处理装置800。
存储模块1830还可以包括数据缓存(databuffer,DB)、缓存(cache)等。
示例性地,关系图中各个点的KVP可以存储在DRAM chip中,在计算过程中,关系图中各个点的KVP可以加载到DB中。
由于关系图的不规则,部分点对应的数据会被频繁访问。因此,可以将频繁访问的点的KVP可以存储在cache中。示例性地,当某个点KVP的访问频率高于第一预设值,可以将该点的KVP可以存储在cache中;当某个点KVP的访问频率低于第二预设值,可以将该点的KVP可以从cache中删除。第二预设值可以小于或等于第一预设值。某个点KVP的访问频率可以是根据一段时间内的访问次数确定的。该一段时间的时间长度可以是预设的。
数据转发单元(data forwarding)用于获取DRAM chip、DB或缓存(cache)中的数据,并将该数据输入数据处理装置800的处理阵列810。
本申请实施例提供的数据处理系统1800,基于DIMM的近存储图挖掘计算架构,可以实现内存rank级的并行计算。多个DIMM rank可以并行进行读数和计算,能够提高运算效率。并且,并且与数据处理系统1700相比,数据处理系统1800避免了CPU和内存之间频繁的数据搬运,而且多个rank无需竞争内存总线的使用权。因此,近存储计算架构相比于传统CPU+内存的存算分离架构可以取得显著的性能提升。
图18是本申请实施例提供的数据处理系统的性能对比的示意图。
为分别确定关系图中子图模型为三节点全连接(clique finding,CF)、四节点全连接(4CF)、五节点全连接(5CF)和3节点链(motifcounting,MF)的子图,利用系统1700和系统1800分别对关系图中各个点的键值对进行处理。以利用系统1700进行处理所需的总时间为1,利用系统1800进行处理所需的总时间如图18所示。关系图1和关系图2为不同的关系图数据。
可以看出,系统1700、系统1800在运行过程中,进行DRAM访问、缓存访问以及运算所需的时间几乎相同。与系统1700相比,系统1800避免了CPU和内存之间频繁的数据搬运,降低了通信的时间,可以取得显著的性能提升。
图20是本申请实施例提供的一种数据处理装置的控制装置的示意性结构图。
数据处理装置的控制装置2000可以包括获取模块2010和处理模块2020。
获取模块2010用于,获取第一目标数据组和第二目标数据组,所述第一目标数据组为第一数据集合的多个第一数据组中的第一个数据组,所述第二目标数据组为第二数据集合的至少一个第二数据组中的第一个数据组,所述第一数据集合和所述第二数据集合的每个数据集合中的每个数据组包括至少一个数据,每个数据包括关键值,每个数据集合中的所述数据组是按照第一顺序或第二顺序排列的,在按照所述第一顺序排列的情况下每个数据集合中任一个所述数据组中的每个关键值均小于位于所述任一个数据组之后的数据组中每个关键值,在按照所述第二顺序排列的情况下每个数据集合中任一个所述数据组中的每个关键值均大于位于所述任一个数据组之后的数据组中每个关键值。
处理模块2020用于,进行多次迭代。
每次迭代包括,将所述第一目标数据组和所述第二目标数据组输入所述数据处理装置,所述数据处理装置用于确定所述第一目标数据组和所述第二目标数据中相等的所述关键值。
每次迭代还包括,在每个数据集合中的所述数据组按照所述第一顺序排列,且第一关键值小于或等于第二关键值的情况下,或者,在每个数据集合中的所述数据组按照所述第二顺序排列,且第三关键值大于或等于第四关键值的情况下,获取所述第一数据集合中位于所述第一目标数据组之后的第一数据组作为所述第一目标数据组,所述第一关键值为所述第一目标数据组中最大的关键值,所述第二关键值为所述第二目标数据组中最大的关键值,所述第三关键值为所述第一目标数据组中最小的关键值,所述第四关键值为所述第二目标数据组中最小的关键值。
可选地,至少一个第二数据组的数量为多个。
在每个数据集合中的所述数据组按照所述第一顺序排列,且所述第一关键值大于或等于所述第二关键值的情况下,或者,在每个数据集合中的所述数据组按照所述第二顺序排列,且所述第三关键值小于或等于所述第四关键值的情况下,获取所述第二数据集合中位于所述第二目标数据组之后的第二数据组作为第二目标数据组。
可选地,所述数据处理装置包括处理矩阵,所述处理矩阵包括v×v个处理单元,v为正整数,第一目标数据组和第二目标数据组中每个目标数据组中至少一个数据的数量小于或等于v。
所述将所述第一目标数据组和所述第二目标数据组输入所述数据处理装置,包括:将所述第一目标数据组和第二目标数据组按照输入规则输入处理矩阵,所述输入规则使得所 述第二目标数据组中的至少一个所述数据是沿第一方向按照输入周期依次输入位于第二边缘的所述处理单元的,所述第一目标数据组中的至少一个所述数据是沿第二方向按照所述输入周期依次输入位于第一边缘的所述处理单元的,所述第一目标数据组与所述第二数据组是在同一个所述输入周期开始输入的,所述第一边缘与所述第二边缘相邻,所述第一方向为远离所述第二边缘的方向,所述第二方向为远离所述第一边缘的方向。
所述处理矩阵中的每个处理单元用于,确定在同一个所述输入周期输入所述处理单元的第一数据中的所述关键值与第二数据中的所述关键值是否相等,所述第一数据为属于所述第一目标数据组的所述数据,所述第二数据为属于所述第二目标数据组的所述数据。
在v大于1的情况下,所述处理矩阵中的每个处理单元还用于,在接收所述第一数据和所述第二数据的下一个输入周期,将所述第一数据传输至沿所述第一方向的下一个所述处理单元,将所述第二数据传输至沿所述第二方向的下一个所述处理单元。
可选地,每个数据集合中不同的数据中的所述关键值不同,所述处理矩阵中的每个处理单元用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值不相等的情况下,将所述第一数据传输至沿所述第一方向的下一个所述处理单元,并将所述第二数据传输至沿所述第二方向的下一个所述处理单元。
可选地,所述数据处理装置还包括过滤矩阵,所述过滤矩阵包括v个过滤单元,所述v个过滤单元分别位于所述处理矩阵沿所述第一方向的v行中每一行沿所述第二方向的最后一个处理单元之后。
所述处理矩阵中的每个处理单元还用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,在接收所述第一数据和所述第二数据的下一个所述输入周期,将所述处理单元的处理结果沿所述第二方向传输至下一个单元,所述单元为所述处理单元或所述过滤单元,所述处理结果包括相等的所述关键值;或者,在接收所述处理结果的下一个所述输入周期,将所述处理结果沿所述第二方向传输至下一个单元。
处理模块2020还用于,在所述第一关键值大于或等于所述第二关键值的情况下,控制沿所述第一方向的所述v个过滤单元按照所述输入周期依次输出所述第二目标数据组对应的所述处理结果。
可选地,所述数据处理装置还包括压缩三角矩阵,所述压缩三角矩阵包括沿所述第一方向的v行压缩单元,沿所述第一方向所述压缩单元的数量逐行增加。
所述多个压缩单元中的每个压缩单元用于,接收沿所述第二方向所述压缩单元之前的所述过滤单元输出的所述处理结果,或者,接收沿所述第一方向上一行的所述压缩单元输出的所述处理结果。
所述多个压缩单元中的每个压缩单元还用于,在接收所述处理结果的下一个所述输入周期,向沿所述第一方向下一行的所述压缩单元传输所述处理结果。
可选地,不同的所述关键值对应于关系图中不同的点集,所述第一数据用于表示关系图中第一目标点与所述第一数据中的所述关键值对应的所述点集中的至少一个点之间是否具有关系,所述第二数据用于表示关系图中第二目标点与所述第二数据中的所述关键值对应的所述点集中的至少一个点之间是否具有关系。
每个处理矩阵还用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,输出处理结果,所述处理结果用于指示所述关系图中的查询点,所述查 询点与所述两个目标点的之间的关系情况符合预设情况。
可选地,所述第一数据还包括所述关键值对应的所述第一目标点的第一关系值组,所述第二数据还包括所述关键值对应的所述第二目标点的第二关系值组。
所述处理矩阵中的每个处理单元还用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,对所述第一关系值组和所述第二关系值组的各个位分别进行预设运算,相等的所述关键值对应的所述第一关系值组和第二关系值组中相同的位对应于所述相等的关键值对应的所述点集中相同的所述点,每个位的所述预设运算的结果用于指示所述位对应的所述点与所述两个目标点之间的关系情况是否符合所述预设情况。
可选地,在每个数据集合中的所述数据组按照所述第一顺序排列的情况下,每个第一数据组中的所述关键值从小到大排列;在每个数据集合中的所述数据组按照所述第二顺序排列的情况下,每个第一数据组中的所述关键值从大到小排列。
在另一些实施例中,装置2000还可以是数据处理装置。
获取模块2010用于,获取关系图中两个目标点中第一目标点的至少一个键值对和第二目标点的至少一个键值对,所述目标点的每个键值对包括所述目标点的关键值和所述关键值对应的所述目标点的关系值组,不同的所述关键值对应于所述关系图中不同的点集,所述目标点的关键值对应的所述目标点的关系值组用于指示所述目标点与所述关键值对应的点集中的每个点是否具有关系,在所述目标点的每个键值对中所述关键值对应的点集中存在与所述目标点具有关系的点;
处理模块2020用于,确定所述第一目标点的至少一个所述关键值与所述第二目标点的的至少一个所述关键值中的相等关键值;
处理模块2020还用于,根据所述相等关键值对应的所述第一目标点的所述关系值组和所述第二目标点的关系值组,确定所述关系图中的查询点,所述查询点与所述两个目标点之间的关系情况符合预设情况。
可选地,获取模块2010用于,获取所述关系图数据,所述关系图数据包括行偏移向量、关键值向量、关系值向量;根据所述关系图数据,确定所述两个目标点中每个目标点的至少一个键值对,所述关键值向量包括所述关系图中多个点中每个点的至少一个关键值,所述行偏移向量用于指示每个点的至少一个关键值在所述关键值向量中的位置,所述关系值向量包括所述多个点中每个点的关键值对应的点的关系值组,每个点的至少一个关键值在所述关键值向量中的顺序与所述每个点的关键值对应的点的关系值组在所述关系值向量中的顺序相同。
可选地,不同的所述关键值对应的所述点集中点的数量相等。
可选地,处理模块2020还用于,对所述相等关键值对应的所述第一目标点的所述关系值组和所述第二目标点的关系值组的各个位分别进行预设运算,每个关键值对应的不同的所述点的所述关系值组中相同的位对应于所述关键值对应的所述点集中相同的所述点,每个位的所述预设运算的结果用于指示所述位对应的所述点与所述两个目标点之间的关系情况是否符合所述预设情况。
可选地,处理模块2020包括控制器和处理矩阵。
所述控制器用于,将所述第一目标点的至少一个键值对沿第二方向按照输入周期依次输入位于处理矩阵的第一边缘的多个所述处理单元,将所述第二目标点的至少一个键值对 沿第一方向按照所述输入周期依次输入位于所述处理矩阵的第二边缘的多个所述处理单元,以确定所述相等关键值。
所述处理矩阵包括v×v个处理单元,v为大于1的正整数,所述第一目标点的至少一个键值对与所述第二目标点的至少一个键值对是在同一个所述输入周期开始输入所述处理矩阵的,所述第一边缘与所述第二边缘相邻,所述第一方向为远离所述第二边缘的方向,所述第二方向为远离所述第一边缘的方向,每个处理单元用于确定输入所述处理单元的所述第一目标点的所述关键值与所述第二目标点的的所述关键值是否相等。
所述多个处理单元中的每个处理单元用于,在所述第一目标点的所述关键值与所述第二目标点的的所述关键值相等的情况下,根据输入所述处理单元的所述第一目标点的所述关系值组和所述第二目标点的关系值组,确定所述关系图中的查询点,所述查询点与所述两个目标点之间的关系情况符合预设情况。
所述多个处理单元中的每个处理单元还用于,按照所述输入周期,将所述第一目标点的所述键值对传输至沿所述第一方向的下一个所述处理单元,将第二目标点的所述键值对传输至沿所述第二方向的下一个所述处理单元。
可选地,所述多个处理单元中的每个处理单元具体用于:在所述第一目标点的所述关键值与所述第二目标点的的所述关键值不相等的情况下,将所述第一目标点的所述键值对传输至沿所述第一方向的下一个所述处理单元,将第二目标点的所述键值对传输至沿所述第二方向的下一个所述处理单元。
可选地,所述第一目标点的至少一个键值对的数量大于v。所述控制器还用于,按照所述键值对中关键值的大小顺序将所述第一目标点的至少一个键值对划分为多个第一键值对组,并按照所述键值对中关键值的大小顺序将所述第二目标点的至少一个键值对划分为至少一个第二键值对组,所述多个第一键值对组和所述至少一个第二键值对组中的每个键值对组中所述键值对的数量小于或等于v,其中,所述多个第一键值对组中关键值最小的键值对组为第三键值对组,所述多个第二键值对组中关键值最小的键值对组为所述第四键值对组。
所述控制器还用于,进行多次迭代,直到所述第三键值对组中的最大关键值大于所述第四键值对组中的最大关键值。
所述迭代包括:将所述第三键值对组中的至少一个所述键值对按照输入周期依次输入所述多个处理单元中的位于第一边缘的多个所述处理单元,将所述第四键值对组中的至少一个所述键值对按照输入周期依次输入所述多个处理单元中的位于第二边缘的多个所述处理单元;在所述第三键值对组中最大的关键值小于所述第四键值对组中最大的关键值的情况下,将按照关键值从小到大顺序排列的所述多个所述第一键值对组中所述第三键值对组的下一个第一键值对组作为所述第三键值对组。
所述控制器还用于,在进行所述多次迭代之前以及每次迭代之后,将所述第三键值对组的多个键值对按照所述输入周期依次输入位于所述处理矩阵的第一边缘的多个所述处理单元,将所述第四键值对组的多个键值对按照所述输入周期依次输入所述多个处理单元中的位于所述处理矩阵的第二边缘的多个所述处理单元。
图21是本申请实施例提供的一种数据处理装置的控制装置的示意性结构图。
控制装置3000包括存储器3010和至少一个处理器3020。
存储器3010用于存储程序指令。处理器3020用于执行该程序指令,以实现前文中的数据处理装置执行的各个步骤或方法或操作或功能。
示例性地,处理器3020用于,获取第一目标数据组和第二目标数据组,所述第一目标数据组为第一数据集合的多个第一数据组中沿第一顺序的第一个数据组,所述第二目标数据组为第二数据集合的至少一个第二数据组中的的第一个数据组,所述第一数据集合和所述第二数据集合的每个数据集合中的每个数据组包括至少一个数据,每个数据包括关键值,每个数据集合中的所述数据组是按照第一顺序或第二顺序排列的,在按照所述第一顺序排列的情况下每个数据集合中任一个所述数据组中的每个关键值均小于位于所述任一个数据组之后的数据组中每个关键值,在按照所述第二顺序排列的情况下每个数据集合中任一个所述数据组中的每个关键值均大于位于所述任一个数据组之后的数据组中每个关键值。
处理器3020还用于,进行多次迭代。
每次迭代包括,将所述第一目标数据组和所述第二目标数据组输入所述数据处理装置,所述数据处理装置用于确定所述第一目标数据组和所述第二目标数据中相等的所述关键值。
每次迭代还包括,在每个数据集合中的所述数据组按照所述第一顺序排列,且第一关键值小于或等于第二关键值的情况下,或者,在每个数据集合中的所述数据组按照所述第二顺序排列,且第三关键值大于或等于第四关键值的情况下,获取所述第一数据集合中位于所述第一目标数据组之后的第一数据组作为所述第一目标数据组,所述第一关键值为所述第一目标数据组中最大的关键值,所述第二关键值为所述第二目标数据组中最大的关键值所述第三关键值为所述第一目标数据组中最小的关键值,所述第四关键值为所述第二目标数据组中最小的关键值。
可选地,至少一个第二数据组的数量为多个。
在每个数据集合中的所述数据组按照所述第一顺序排列,且所述第一关键值大于或等于所述第二关键值的情况下,或者,在每个数据集合中的所述数据组按照所述第二顺序排列,且所述第三关键值小于或等于所述第四关键值的情况下,获取所述第二数据集合中位于所述第二目标数据组之后的第二数据组作为第二目标数据组。
可选地,所述数据处理装置包括处理矩阵,所述处理矩阵包括v×v个处理单元,v为正整数,第一目标数据组和第二目标数据组中每个目标数据组中至少一个数据的数量小于或等于v。
所述将所述第一目标数据组和所述第二目标数据组输入所述数据处理装置,包括:将所述第一目标数据组和第二目标数据组按照输入规则输入处理矩阵,所述输入规则使得所述第二目标数据组中的至少一个所述数据是沿第一方向按照输入周期依次输入位于第二边缘的所述处理单元的,所述第一目标数据组中的至少一个所述数据是沿第二方向按照所述输入周期依次输入位于第一边缘的所述处理单元的,所述第一目标数据组与所述第二数据组是在同一个所述输入周期开始输入的,所述第一边缘与所述第二边缘相邻,所述第一方向为远离所述第二边缘的方向,所述第二方向为远离所述第一边缘的方向。
所述处理矩阵中的每个处理单元用于,确定在同一个所述输入周期输入所述处理单元的第一数据中的所述关键值与第二数据中的所述关键值是否相等,所述第一数据为属于所 述第一目标数据组的所述数据,所述第二数据为属于所述第二目标数据组的所述数据。
在v大于1的情况下,所述处理矩阵中的每个处理单元还用于,在接收所述第一数据和所述第二数据的下一个输入周期,将所述第一数据传输至沿所述第一方向的下一个所述处理单元,将所述第二数据传输至沿所述第二方向的下一个所述处理单元。
可选地,每个数据集合中不同的数据中的所述关键值不同,所述处理矩阵中的每个处理单元用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值不相等的情况下,将所述第一数据传输至沿所述第一方向的下一个所述处理单元,并将所述第二数据传输至沿所述第二方向的下一个所述处理单元。
可选地,所述数据处理装置还包括过滤矩阵,所述过滤矩阵包括v个过滤单元,所述v个过滤单元分别位于所述处理矩阵沿所述第一方向的v行中每一行沿所述第二方向的最后一个处理单元之后。
所述处理矩阵中的每个处理单元还用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,在接收所述第一数据和所述第二数据的下一个所述输入周期,将所述处理单元的处理结果沿所述第二方向传输至下一个单元,所述单元为所述处理单元或所述过滤单元,所述处理结果包括相等的所述关键值;或者,在接收所述处理结果的下一个所述输入周期,将所述处理结果沿所述第二方向传输至下一个单元;
处理器3020还用于,在所述第一关键值大于或等于所述第二关键值的情况下,控制沿所述第一方向的所述v个过滤单元按照所述输入周期依次输出所述第二目标数据组对应的所述处理结果。
可选地,所述数据处理装置还包括压缩三角矩阵,所述压缩三角矩阵包括沿所述第一方向的v行压缩单元,沿所述第一方向所述压缩单元的数量逐行增加。
所述多个压缩单元中的每个压缩单元用于:,接收沿所述第二方向所述压缩单元之前的所述过滤单元输出的所述处理结果,或者,接收沿所述第一方向上一行的所述压缩单元输出的所述处理结果。
所述多个压缩单元中的每个压缩单元用于,在接收所述处理结果的下一个所述输入周期,向沿所述第一方向下一行的所述压缩单元传输所述处理结果。
可选地,不同的所述关键值对应于关系图中不同的点集,所述第一数据用于表示关系图中第一目标点与所述第一数据中的所述关键值对应的所述点集中的至少一个点之间是否具有关系,所述第二数据用于表示关系图中第二目标点与所述第二数据中的所述关键值对应的所述点集中的至少一个点之间是否具有关系。
每个处理矩阵还用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,输出处理结果,所述处理结果用于指示所述关系图中的查询点,所述查询点与所述两个目标点的之间的关系情况符合预设情况。
可选地,所述第一数据还包括所述关键值对应的所述第一目标点的第一关系值组,所述第二数据还包括所述关键值对应的所述第二目标点的第二关系值组。
所述处理矩阵中的每个处理单元还用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,对所述第一关系值组和所述第二关系值组的各个位分别进行预设运算,相等的所述关键值对应的所述第一关系值组和第二关系值组中相同的位对应于所述相等的关键值对应的所述点集中相同的所述点,每个位的所述预设运算的结果 用于指示所述位对应的所述点与所述两个目标点之间的关系情况是否符合所述预设情况。
可选地,在每个数据集合中的所述数据组按照所述第一顺序排列的情况下,每个第一数据组中的所述关键值从小到大排列;在每个数据集合中的所述数据组按照所述第二顺序排列的情况下,每个第一数据组中的所述关键值从大到小排列。
具体地,此外,以上装置中的各单元可以全部或部分可以集成在一起,或者可以独立实现。在一种实现中,这些单元集成在一起,以片上系统(system-on-a-chip,SOC)的形式实现。该SOC中可以包括至少一个处理器,用于实现以上任一种方法或实现该装置各单元的功能,该至少一个处理器的种类可以不同,例如包括CPU和FPGA,CPU和人工智能处理器,CPU和图形处理器(graphics processing unit,GPU)等。
本申请实施例还提供一种计算机程序存储介质,其特征在于,所述计算机程序存储介质具有程序指令,当所述程序指令被执行时,使得前文中的方法被执行。
本申请实施例还提供一种芯片系统,其特征在于,所述芯片系统包括至少一个处理器,当程序指令在所述至少一个处理器中执行时,使得前文中的方法被执行。
本申请实施例还提供一种程序产品,所述计算机程序产品包括程序指令,当所述程序指令在计算机设备中被执行时,使得前文的数据处理方法被执行。
本申请实施例还提供一种数据处理系统,包括前文所述的数据处理装置和数据处理装置的控制装置。
应理解,本申请实施例中的处理器可以为中央处理单元(central processing unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
还应理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的随机存取存储器(random access memory,RAM)可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlinkDRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DRRAM)。
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令或计算机程序。在计算机上加载或执行所述计算机指令或计算机程序时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可 以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘。
应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系,但也可能表示的是一种“和/或”的关系,具体可参考前后文进行理解。
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。
本申请实施例中采用诸如“第一”、“第二”的前缀词,仅仅为了区分不同的描述对象,对被描述对象的位置、顺序、优先级、数量或内容等没有限定作用。例如,被描述对象为“关键值”,则“第一关键值”和“第二关键值”中“关键值”之前的序数词并不限制“接口”之间的位置或顺序或优先级;再如,被描述对象为“方向”,则“第一方向”和“第二方向”中“方向”之前的序数词并不限制“方向”之间的位置或顺序或优先级。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (23)

  1. 一种数据处理装置的控制方法,其特征在于,所述方法包括:
    获取第一目标数据组和第二目标数据组,所述第一目标数据组为第一数据集合的多个第一数据组中的第一个数据组,所述第二目标数据组为第二数据集合的至少一个第二数据组中的第一个数据组,所述第一数据集合和所述第二数据集合的每个数据集合中的每个数据组包括至少一个数据,每个数据包括关键值,每个数据集合中的所述数据组是按照第一顺序或第二顺序排列的,在按照所述第一顺序排列的情况下每个数据集合中任一个所述数据组中的每个关键值均小于位于所述任一个数据组之后的数据组中每个关键值,在按照所述第二顺序排列的情况下每个数据集合中任一个所述数据组中的每个关键值均大于位于所述任一个数据组之后的数据组中每个关键值;
    进行多次迭代,每次迭代包括:
    将所述第一目标数据组和所述第二目标数据组输入所述数据处理装置,所述数据处理装置用于确定所述第一目标数据组和所述第二目标数据中相等的所述关键值;
    在每个数据集合中的所述数据组按照所述第一顺序排列,且第一关键值小于或等于第二关键值的情况下,或者,在每个数据集合中的所述数据组按照所述第二顺序排列,且第三关键值大于或等于第四关键值的情况下,获取所述第一数据集合中位于所述第一目标数据组之后的第一数据组作为所述第一目标数据组,所述第一关键值为所述第一目标数据组中最大的关键值,所述第二关键值为所述第二目标数据组中最大的关键值,所述第三关键值为所述第一目标数据组中最小的关键值,所述第四关键值为所述第二目标数据组中最小的关键值。
  2. 根据权利要求1所述的方法,其特征在于,至少一个第二数据组的数量为多个,
    在每个数据集合中的所述数据组按照所述第一顺序排列,且所述第一关键值大于或等于所述第二关键值的情况下,或者,在每个数据集合中的所述数据组按照所述第二顺序排列,且所述第三关键值小于或等于所述第四关键值的情况下,获取所述第二数据集合中位于所述第二目标数据组之后的第二数据组作为第二目标数据组。
  3. 根据权利要求1或2所述的方法,其特征在于,所述数据处理装置包括处理矩阵,所述处理矩阵包括v×v个处理单元,v为正整数,第一目标数据组和第二目标数据组中每个目标数据组中至少一个数据的数量小于或等于v,
    所述第一目标数据组中的第i个第一数据是在所述迭代的第j个输入周期输入位于第一边缘的v个所述处理单元中沿第二方向的第j个所述处理单元的,所述第二目标数据中的第p个第二数据是在所述迭代的第q个输入周期输入位于第二边缘的v个所述处理单元中沿第一方向的第q个所述处理单元的,所述第一边缘与所述第二边缘相邻,每个目标数据组中不同的所述数据输入的所述处理单元不同,所述第一方向为从所述第二边缘指向所述处理矩阵内部且垂直所述第二边缘的方向,所述第二方向为从所述第一边缘指向所述处理矩阵内部且垂直所述第一边缘的方向,i、j、p、q均为正整数;
    所述处理矩阵中的每个处理单元用于,确定在同一个所述输入周期输入所述处理单元的第一数据中的所述关键值与第二数据中的所述关键值是否相等;
    在v大于1的情况下,所述处理矩阵中的每个处理单元还用于,在接收所述第一数据和所述第二数据的下一个输入周期,将所述第一数据传输至沿所述第一方向的下一个所述处理单元,将所述第二数据传输至沿所述第二方向的下一个所述处理单元。
  4. 根据权利要求3所述的方法,其特征在于,每个数据集合中不同的数据中的所述关键值不同,所述处理矩阵中的每个处理单元用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值不相等的情况下,将所述第一数据传输至沿所述第一方向的下一个所述处理单元,并将所述第二数据传输至沿所述第二方向的下一个所述处理单元。
  5. 根据权利要求3或4所述的方法,其特征在于,所述数据处理装置还包括过滤矩阵,所述过滤矩阵包括v个过滤单元,所述v个过滤单元分别位于所述处理矩阵沿所述第一方向的v行中每一行沿所述第二方向的最后一个处理单元之后,
    所述处理矩阵中的每个处理单元还用于:
    在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,在接收所述第一数据和所述第二数据的下一个所述输入周期,将所述处理单元的处理结果沿所述第二方向传输至下一个单元,所述单元为所述处理单元或所述过滤单元,所述处理结果包括相等的所述关键值;或者,
    在接收所述处理结果的下一个所述输入周期,将所述处理结果沿所述第二方向传输至下一个单元;
    所述方法还包括:在所述第一关键值大于或等于所述第二关键值的情况下,控制沿所述第一方向的所述v个过滤单元按照所述输入周期依次输出所述第二目标数据组对应的所述处理结果。
  6. 根据权利要求5所述的方法,其特征在于,所述数据处理装置还包括压缩三角矩阵,所述压缩三角矩阵包括沿所述第一方向的v行压缩单元,沿所述第一方向所述压缩单元的数量逐行增加,
    所述多个压缩单元中的每个压缩单元用于:
    接收沿所述第二方向所述压缩单元之前的所述过滤单元输出的所述处理结果,或者,接收沿所述第一方向上一行的所述压缩单元输出的所述处理结果;
    在接收所述处理结果的下一个所述输入周期,向沿所述第一方向下一行的所述压缩单元传输所述处理结果。
  7. 根据权利要求3-6中任一项所述的方法,其特征在于,不同的所述关键值对应于关系图中不同的点集,所述第一数据用于表示关系图中第一目标点与所述第一数据中的所述关键值对应的所述点集中的至少一个点之间是否具有关系,所述第二数据用于表示关系图中第二目标点与所述第二数据中的所述关键值对应的所述点集中的至少一个点之间是否具有关系,
    每个处理矩阵还用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,输出处理结果,所述处理结果用于指示所述关系图中的查询点,所述查询点与所述两个目标点的之间的关系情况符合预设情况。
  8. 根据权利要求3-7中任一项所述的方法,其特征在于,所述第一数据还包括所述关键值对应的所述第一目标点的第一关系值组,所述第二数据还包括所述关键值对应的所述第二目标点的第二关系值组,
    所述处理矩阵中的每个处理单元还用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,对所述第一关系值组和所述第二关系值组的各个位分别进行预设运算,相等的所述关键值对应的所述第一关系值组和第二关系值组中相同的位对应于所述相等的关键值对应的所述点集中相同的所述点,每个位的所述预设运算的结果用于指示所述位对应的所述点与所述两个目标点之间的关系情况是否符合所述预设情况。
  9. 根据权利要求1-8中任一项所述的方法,其特征在于,在每个数据集合中的所述数据组按照所述第一顺序排列的情况下,每个第一数据组中的所述关键值从小到大排列;在每个数据集合中的所述数据组按照所述第二顺序排列的情况下,每个第一数据组中的所述关键值从大到小排列。
  10. 一种数据处理装置的控制装置,其特征在于,包括:获取模块和处理模块,
    所述获取模块用于,获取第一目标数据组和第二目标数据组,所述第一目标数据组为第一数据集合的多个第一数据组中的第一个数据组,所述第二目标数据组为第二数据集合的至少一个第二数据组中的第一个数据组,所述第一数据集合和所述第二数据集合的每个数据集合中的每个数据组包括至少一个数据,每个数据包括关键值,每个数据集合中的所述数据组是按照第一顺序或第二顺序排列的,在按照所述第一顺序排列的情况下每个数据集合中任一个所述数据组中的每个关键值均小于位于所述任一个数据组之后的数据组中每个关键值,在按照所述第二顺序排列的情况下每个数据集合中任一个所述数据组中的每个关键值均大于位于所述任一个数据组之后的数据组中每个关键值;
    所述处理模块用于,进行多次迭代,每次迭代包括:
    将所述第一目标数据组和所述第二目标数据组输入所述数据处理装置,所述数据处理装置用于确定所述第一目标数据组和所述第二目标数据中相等的所述关键值;
    在每个数据集合中的所述数据组按照所述第一顺序排列,且第一关键值小于或等于第二关键值的情况下,或者,在每个数据集合中的所述数据组按照所述第二顺序排列,且第三关键值大于或等于第四关键值的情况下,获取所述第一数据集合中位于所述第一目标数据组之后的第一数据组作为所述第一目标数据组,所述第一关键值为所述第一目标数据组中最大的关键值,所述第二关键值为所述第二目标数据组中最大的关键值,所述第三关键值为所述第一目标数据组中最小的关键值,所述第四关键值为所述第二目标数据组中最小的关键值。
  11. 根据权利要求10所述的装置,其特征在于,至少一个第二数据组的数量为多个,
    在每个数据集合中的所述数据组按照所述第一顺序排列,且所述第一关键值大于或等于所述第二关键值的情况下,或者,在每个数据集合中的所述数据组按照所述第二顺序排列,且所述第三关键值小于或等于所述第四关键值的情况下,获取所述第二数据集合中位于所述第二目标数据组之后的第二数据组作为第二目标数据组。
  12. 根据权利要求10或11所述的装置,其特征在于,所述数据处理装置包括处理矩阵,所述处理矩阵包括v×v个处理单元,v为正整数,第一目标数据组和第二目标数据组中每个目标数据组中至少一个数据的数量小于或等于v,
    所述第一目标数据组中的第i个第一数据是在所述迭代的第j个输入周期输入位于第一边缘的v个所述处理单元中沿第二方向的第j个所述处理单元的,所述第二目标数据中的第p个第二数据是在所述迭代的第q个输入周期输入位于第二边缘的v个所述处理单元 中沿第一方向的第q个所述处理单元的,所述第一边缘与所述第二边缘相邻,每个目标数据组中不同的所述数据输入的所述处理单元不同,所述第一方向为从所述第二边缘指向所述处理矩阵内部且垂直所述第二边缘的方向,所述第二方向为从所述第一边缘指向所述处理矩阵内部且垂直所述第一边缘的方向,i、j、p、q均为正整数;
    所述处理矩阵中的每个处理单元用于,确定在同一个所述输入周期输入所述处理单元的第一数据中的所述关键值与第二数据中的所述关键值是否相等;
    在v大于1的情况下,所述处理矩阵中的每个处理单元还用于,在接收所述第一数据和所述第二数据的下一个输入周期,将所述第一数据传输至沿所述第一方向的下一个所述处理单元,将所述第二数据传输至沿所述第二方向的下一个所述处理单元。
  13. 根据权利要求12所述的装置,其特征在于,每个数据集合中不同的数据中的所述关键值不同,所述处理矩阵中的每个处理单元用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值不相等的情况下,将所述第一数据传输至沿所述第一方向的下一个所述处理单元,并将所述第二数据传输至沿所述第二方向的下一个所述处理单元。
  14. 根据权利要求12或13所述的装置,其特征在于,所述数据处理装置还包括过滤矩阵,所述过滤矩阵包括v个过滤单元,所述v个过滤单元分别位于所述处理矩阵沿所述第一方向的v行中每一行沿所述第二方向的最后一个处理单元之后,
    所述处理矩阵中的每个处理单元还用于:在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,在接收所述第一数据和所述第二数据的下一个所述输入周期,将所述处理单元的处理结果沿所述第二方向传输至下一个单元,所述单元为所述处理单元或所述过滤单元,所述处理结果包括相等的所述关键值;或者,在接收所述处理结果的下一个所述输入周期,将所述处理结果沿所述第二方向传输至下一个单元;
    所述处理模块还用于,在所述第一关键值大于或等于所述第二关键值的情况下,控制沿所述第一方向的所述v个过滤单元按照所述输入周期依次输出所述第二目标数据组对应的所述处理结果。
  15. 根据权利要求14所述的装置,其特征在于,所述数据处理装置还包括压缩三角矩阵,所述压缩三角矩阵包括沿所述第一方向的v行压缩单元,沿所述第一方向所述压缩单元的数量逐行增加,
    所述多个压缩单元中的每个压缩单元用于:
    接收沿所述第二方向所述压缩单元之前的所述过滤单元输出的所述处理结果,或者,接收沿所述第一方向上一行的所述压缩单元输出的所述处理结果;
    在接收所述处理结果的下一个所述输入周期,向沿所述第一方向下一行的所述压缩单元传输所述处理结果。
  16. 根据权利要求12-15中任一项所述的装置,其特征在于,不同的所述关键值对应于关系图中不同的点集,所述第一数据用于表示关系图中第一目标点与所述第一数据中的所述关键值对应的所述点集中的至少一个点之间是否具有关系,所述第二数据用于表示关系图中第二目标点与所述第二数据中的所述关键值对应的所述点集中的至少一个点之间是否具有关系,
    每个处理矩阵还用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,输出处理结果,所述处理结果用于指示所述关系图中的查询点,所述查 询点与所述两个目标点的之间的关系情况符合预设情况。
  17. 根据权利要求12-16中任一项所述的装置,其特征在于,所述第一数据还包括所述关键值对应的所述第一目标点的第一关系值组,所述第二数据还包括所述关键值对应的所述第二目标点的第二关系值组,
    所述处理矩阵中的每个处理单元还用于,在所述第一数据中的所述关键值与所述第二数据中的所述关键值相等的情况下,对所述第一关系值组和所述第二关系值组的各个位分别进行预设运算,相等的所述关键值对应的所述第一关系值组和第二关系值组中相同的位对应于所述相等的关键值对应的所述点集中相同的所述点,每个位的所述预设运算的结果用于指示所述位对应的所述点与所述两个目标点之间的关系情况是否符合所述预设情况。
  18. 根据权利要求10-17中任一项所述的装置,其特征在于,在每个数据集合中的所述数据组按照所述第一顺序排列的情况下,每个第一数据组中的所述关键值从小到大排列;在每个数据集合中的所述数据组按照所述第二顺序排列的情况下,每个第一数据组中的所述关键值从大到小排列。
  19. 一种数据处理装置的控制装置,其特征在于,包括存储器和至少一个处理器,所述存储器用于存储程序,当所述程序在所述至少一个处理器中执行时,所述处理器用于执行如权利要求1至9中任一项所述的方法。
  20. 一种计算机程序产品,其特征在于,包括程序指令,当所述程序指令被执行时,如权利要求1至9中任一项所述的方法被执行。
  21. 一种计算机可读存储介质,其特征在于,所述计算机可读介质存储用于设备执行的程序代码,当所述程序指令被执行时,如权利要求1至9中任一项所述的方法被执行。
  22. 一种芯片,其特征在于,所述芯片包括至少一个处理器,当程序指令被所述至少一个处理器中执行时,使得如权利要求1至9中任一项所述的方法被执行。
  23. 一种数据处理系统,其特征在于,包括权利要求10-19中任一项所述的数据处理装置的控制装置和所述数据处理装置。
PCT/CN2023/090020 2022-04-26 2023-04-23 数据处理装置的控制方法与装置 WO2023207832A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210447052.8 2022-04-26
CN202210447052.8A CN116991910A (zh) 2022-04-26 2022-04-26 数据处理装置的控制方法与装置

Publications (1)

Publication Number Publication Date
WO2023207832A1 true WO2023207832A1 (zh) 2023-11-02

Family

ID=88517679

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/090020 WO2023207832A1 (zh) 2022-04-26 2023-04-23 数据处理装置的控制方法与装置

Country Status (2)

Country Link
CN (1) CN116991910A (zh)
WO (1) WO2023207832A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB1323230A (en) * 1971-01-15 1973-07-11 Ibm Data processing apparatus
CA2070035A1 (en) * 1991-05-29 1992-11-30 Shingo Ishihara Arrangement and method of ascertaining data word number of maximum or minimum in a plurality of data words
JP2001306300A (ja) * 2000-04-25 2001-11-02 Nec Microcomputer Technology Ltd ソート処理方法及び送受信データの順序決定方法
WO2003091872A1 (fr) * 2002-04-26 2003-11-06 Nihon University School Juridical Person Dispositif de tri par fusion en parallele, procede et programme y relatifs
CN103294702A (zh) * 2012-02-27 2013-09-11 上海淼云文化传播有限公司 一种数据处理方法、装置及系统
CN111259012A (zh) * 2020-01-20 2020-06-09 中国平安人寿保险股份有限公司 数据均匀化方法、装置、计算机设备及存储介质
CN113850395A (zh) * 2021-09-24 2021-12-28 北京九章云极科技有限公司 一种数据处理方法及系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB1323230A (en) * 1971-01-15 1973-07-11 Ibm Data processing apparatus
CA2070035A1 (en) * 1991-05-29 1992-11-30 Shingo Ishihara Arrangement and method of ascertaining data word number of maximum or minimum in a plurality of data words
JP2001306300A (ja) * 2000-04-25 2001-11-02 Nec Microcomputer Technology Ltd ソート処理方法及び送受信データの順序決定方法
WO2003091872A1 (fr) * 2002-04-26 2003-11-06 Nihon University School Juridical Person Dispositif de tri par fusion en parallele, procede et programme y relatifs
CN103294702A (zh) * 2012-02-27 2013-09-11 上海淼云文化传播有限公司 一种数据处理方法、装置及系统
CN111259012A (zh) * 2020-01-20 2020-06-09 中国平安人寿保险股份有限公司 数据均匀化方法、装置、计算机设备及存储介质
CN113850395A (zh) * 2021-09-24 2021-12-28 北京九章云极科技有限公司 一种数据处理方法及系统

Also Published As

Publication number Publication date
CN116991910A (zh) 2023-11-03

Similar Documents

Publication Publication Date Title
US11741014B2 (en) Methods and systems for handling data received by a state machine engine
US10372653B2 (en) Apparatuses for providing data received by a state machine engine
US9058465B2 (en) Counter operation in a state machine lattice
US10671295B2 (en) Methods and systems for using state vector data in a state machine engine
WO2019140973A1 (zh) 编码方法、解码方法和装置
US11947979B2 (en) Systems and devices for accessing a state machine
WO2022001550A1 (zh) 一种地址生成的方法、相关装置以及存储介质
WO2016109570A1 (en) Systems and devices for accessing a state machine
WO2023207832A1 (zh) 数据处理装置的控制方法与装置
WO2024012180A1 (zh) 一种矩阵计算方法及装置
CN110554886B (zh) 数据拆分结构、方法及其片上实现
Gopal Krishna COMPRESSION TECHNIQUES FOR EXTREME-SCALE GRAPHS AND MATRICES: SEQUENTIAL AND PARALLEL ALGORITHMS
CN117459070A (zh) 一种数据聚类压缩方法
CN118170935A (zh) 一种图像检索的方法、系统以及装置
JP2006209471A (ja) ダイナミックプログラミング法計算を高速に実行する計算装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23795265

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023795265

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2023795265

Country of ref document: EP

Effective date: 20240403