CN113448739A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN113448739A
CN113448739A CN202111015203.4A CN202111015203A CN113448739A CN 113448739 A CN113448739 A CN 113448739A CN 202111015203 A CN202111015203 A CN 202111015203A CN 113448739 A CN113448739 A CN 113448739A
Authority
CN
China
Prior art keywords
vector
memory
feature
identification
tensor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111015203.4A
Other languages
Chinese (zh)
Other versions
CN113448739B (en
Inventor
袁满
陈浪石
张�杰
李永
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba China Co Ltd
Alibaba Cloud Computing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd, Alibaba Cloud Computing Ltd filed Critical Alibaba China Co Ltd
Priority to CN202111015203.4A priority Critical patent/CN113448739B/en
Publication of CN113448739A publication Critical patent/CN113448739A/en
Application granted granted Critical
Publication of CN113448739B publication Critical patent/CN113448739B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Image Analysis (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The application provides a data processing method and device. In the application, at least one obtaining operator may be used to perform an operation of obtaining at least one feature tensor in a distributed system according to at least one identification tensor, so that a plurality of feature vectors may be obtained by using fewer obtaining operators (for example, one obtaining operator is used, the number of obtaining operators used may be less than the number of vector identifiers of a plurality of feature vectors, and the like), so that, although the identification tensor includes the vector identifiers of a plurality of feature vectors, the vector identifiers of a plurality of feature vectors are all operated by the same obtaining operator, only one "starting operator" and one "scheduling operator" are needed, the number of "starting operators" and the number of "scheduling operators" are reduced, thereby saving the time required for obtaining a plurality of feature vectors, and improving the efficiency of obtaining a plurality of feature vectors, and save hardware resources.

Description

Data processing method and device
Technical Field
The present application relates to the field of new generation information technology, and in particular, to a data processing method and apparatus.
Background
The sparse model is a machine learning model in which input samples include discrete input (sparse input/sparse input) features, and is widely applied to the fields of search, advertisement, recommendation and the like. The sparse model has a characteristic of large parameter number, so that training of the sparse model can be completed only by a distributed computing cluster, and the training of the sparse model is mainly based on a Central Processing Unit (CPU) cluster at present. However, the inventor finds that the training process of the sparse model at present often takes a long time, which results in low training efficiency and occupies a large amount of system resources in the training process.
Disclosure of Invention
The application discloses a data processing method and device.
In a first aspect, the present application shows a data processing method based on a distributed system, where the distributed system includes a plurality of electronic devices, and each electronic device stores a part of feature vectors in a model, and the method is applied to a first electronic device in the plurality of electronic devices, and the method includes: obtaining vector identifications of a plurality of feature vectors in the model; generating at least one identification tensor according to the vector identification of the plurality of eigenvectors; obtaining at least one feature tensor in the distributed system according to the at least one identification tensor based on at least one obtaining operator, wherein the at least one feature tensor comprises the plurality of feature vectors; and splitting the at least one feature tensor to obtain the plurality of feature vectors.
In an optional implementation, the obtaining at least one feature tensor in the distributed system according to the at least one identification tensor based on at least one obtaining operator includes: based on the at least one obtaining operator, searching eigenvectors corresponding to the vector identifiers in the identifier tensor in the eigenvectors stored in the first electronic device; and/or respectively sending the vector identifiers in the identifier tensor to second electronic equipment in a distributed system where the corresponding feature vectors are located, and receiving the feature vectors returned by the second electronic equipment and searched according to the received vector identifiers; the second electronic device comprises an electronic device in the distributed system other than the first electronic device; and based on the at least one acquisition operator, stitching the received eigenvectors and/or the eigenvectors found in the first electronic device to obtain the feature tensor.
In an optional implementation manner, the searching, in the feature vectors stored in the first electronic device, for the feature vector corresponding to the vector identifier in the identifier tensor includes: searching whether an eigenvector corresponding to the vector identifier in the identifier tensor exists in the eigenvector stored in a first memory in the first electronic device; under the condition that the feature vector corresponding to the vector identification in the identification tensor exists, acquiring the feature vector corresponding to the vector identification in the identification tensor stored in a first memory, wherein the feature vector corresponding to the vector identification in the identification tensor stored in the first memory is acquired in a second memory in the first electronic device in advance and is cached in the first memory; the data access rate of the first memory is greater than the data access rate of the second memory; under the condition that the eigenvector corresponding to the vector identification in the identification tensor does not exist, searching the eigenvector corresponding to the vector identification in the identification tensor from the eigenvector stored in the second memory.
In an optional implementation, the method further includes: acquiring the frequency of respectively accessing each feature vector in the model stored in the second memory; selecting at least part of feature vectors from the feature vectors in the model stored in the second memory in the order of the accessed frequency from high to low; buffering the at least partial feature vector in the first memory.
In an optional implementation, the method further includes: acquiring feature vectors in the model required to be used in at least one round of training after the one round of training in the second memory during one round of training of the model; caching the feature vectors in the model required to be used in the at least one round of training in the first memory.
In an optional implementation, the generating at least one identification tensor according to the vector identification of the plurality of eigenvectors includes: using a generating operator to deduplicate the vector identifications of the plurality of characteristic vectors to obtain the deduplicated vector identifications, determining the number of electronic devices where the characteristic vectors corresponding to the deduplicated vector identifications are located, segmenting the deduplicated vector identifications according to the number to obtain segmented vector identifications, and generating at least two identification tensors according to the segmented vector identifications.
In an optional implementation, the method further includes: obtaining the type of hardware resources in the first electronic device required to be used in the process of obtaining the plurality of feature vectors in the model; and grouping the feature vectors in the model according to the number of the types and a load balancing principle to obtain a plurality of feature vector groups.
In an optional implementation manner, the model includes a plurality of feature matrices, and each feature matrix includes at least two feature vectors; the grouping the feature vectors in the model according to the number of the types and the load balancing principle to obtain a plurality of feature vector groups includes: and dividing the plurality of feature matrixes into the number of feature matrix groups, wherein the difference between the number of feature vectors in each feature matrix group is smaller than a preset difference.
In an optional implementation, the generating at least one identification tensor according to the vector identification of the plurality of eigenvectors includes: grouping the vector identifications of the plurality of eigenvectors according to the eigenvector group where the corresponding eigenvector is located to obtain at least two vector identification groups; and respectively generating an identification tensor according to the vector identification in each vector identification group to obtain at least two identification tensors.
In an optional implementation, the obtaining at least one feature tensor in the distributed system according to the at least one identification tensor based on at least one obtaining operator includes: sequentially calling a plurality of types of hardware resources in the first electronic device according to a first identification tensor of the at least two identification tensors, so as to obtain an eigenvector corresponding to the vector identification in the first identification tensor in an eigenvector group through the called hardware resources; sequentially calling the hardware resources of the multiple types according to a second identification tensor of the at least two identification tensors so as to obtain an eigenvector corresponding to the vector identification in the second identification tensor in the eigenvector group through the called hardware resources; the second identified tensor comprises an identified tensor of the at least two identified tensors other than the first identified tensor; in the process of sequentially calling the hardware resources of the multiple types according to the second identification tensor, judging whether the hardware resource of the target type is used or not under the condition that the hardware resource of the target type in the multiple types needs to be called according to the second identification tensor; and under the condition that the hardware resource of the target category is not used, calling the hardware resource of the target category according to the second identification tensor.
In a second aspect, the present application shows a data processing method, comprising:
in the process of training a sparse model or in the process of processing data by using a trained sparse model, if a target characteristic vector in a plurality of characteristic vectors in the sparse model needs to be acquired, searching the target characteristic vector in a first memory;
responding to the target feature vector when the target feature vector is found in the first memory; the target eigenvector is selected from the plurality of eigenvectors and cached in the first memory in advance according to the sequence of the access frequencies of the plurality of eigenvectors stored in the second memory from high to low, and the data access rate of the first memory is greater than that of the second memory.
In a third aspect, the present application shows a data processing method, comprising:
under the condition of a next round of training of multi-round training of a sparse model, acquiring feature vectors in the sparse model required to be used in the next round of training in a first memory; performing a next round of training of the multiple rounds of training on the sparse model using the feature vectors; the feature vector is obtained in the second memory and cached in the first memory under the condition of the previous round of training of multiple rounds of training on the sparse model in advance, and the data access rate of the first memory is greater than that of the second memory.
In a fourth aspect, the present application shows a data processing apparatus based on a distributed system, where the distributed system includes a plurality of electronic devices, each electronic device stores a part of feature vectors in a model, and the apparatus is applied to a first electronic device in the plurality of electronic devices, and the apparatus includes: the first acquisition module is used for acquiring vector identifications of a plurality of feature vectors in the model; a generating module, configured to generate at least one identification tensor according to the vector identifications of the plurality of eigenvectors; a second obtaining module, configured to obtain at least one feature tensor in the distributed system according to the at least one identification tensor based on at least one obtaining operator, where the at least one feature tensor includes the plurality of feature vectors; and the splitting module is used for splitting the at least one characteristic tensor to obtain the plurality of characteristic vectors.
In an optional implementation manner, the second obtaining module includes: the searching unit is used for searching the eigenvector corresponding to the vector identifier in the identifier tensor in the eigenvector stored in the first electronic equipment based on the at least one obtaining operator; and/or the sending unit is used for sending the vector identifiers in the identifier tensor to second electronic equipment in a distributed system where the corresponding eigenvectors are located respectively, and the receiving unit is used for receiving the eigenvectors returned by the second electronic equipment and searched according to the received vector identifiers; the second electronic device comprises an electronic device in the distributed system other than the first electronic device; and the stitching unit is configured to stitch the received eigenvector and/or the eigenvector found in the first electronic device based on the at least one obtaining operator to obtain the feature tensor.
In an optional implementation manner, the search unit includes: a first searching subunit, configured to search, in an eigenvector stored in a first memory in the first electronic device, whether an eigenvector corresponding to the vector identifier in the identifier tensor exists; a first obtaining subunit, configured to, in a case where there exists an eigenvector corresponding to a vector identifier in the identifier tensor, obtain an eigenvector corresponding to the vector identifier in the identifier tensor stored in a first memory, where the eigenvector corresponding to the vector identifier in the identifier tensor stored in the first memory is obtained in advance in a second memory in the first electronic device and is cached in the first memory; the data access rate of the first memory is greater than the data access rate of the second memory; and the second searching subunit is configured to search, in the feature vectors stored in the second memory, the feature vectors corresponding to the vector identifiers in the identification tensor under the condition that there is no feature vector corresponding to the vector identifier in the identification tensor.
In an optional implementation manner, the search unit further includes: a second obtaining subunit, configured to obtain frequencies at which feature vectors in the model stored in the second memory are respectively accessed; a selecting subunit, configured to select at least part of feature vectors from feature vectors in the model stored in the second memory in an order from high to low in the accessed frequency; a first caching subunit configured to cache the at least part of the feature vector in the first memory.
In an optional implementation manner, the search unit further includes: a third obtaining subunit, configured to obtain, in the second memory, feature vectors in the model required to be used in at least one of the rounds of training after the one round of training in a process of one of the rounds of training on the model; a second buffer subunit, configured to buffer, in the first memory, the feature vectors in the model required to be used for the at least one round of training.
In an optional implementation manner, the generating module is specifically configured to: using a generating operator to deduplicate the vector identifications of the plurality of characteristic vectors to obtain the deduplicated vector identifications, determining the number of electronic devices where the characteristic vectors corresponding to the deduplicated vector identifications are located, segmenting the deduplicated vector identifications according to the number to obtain segmented vector identifications, and generating at least two identification tensors according to the segmented vector identifications.
In an optional implementation, the apparatus further comprises: a third obtaining module, configured to obtain a type of a hardware resource that needs to be used in the first electronic device in a process of obtaining the plurality of feature vectors in the model; and the grouping module is used for grouping the feature vectors in the model according to the number of the types and the load balancing principle to obtain a plurality of feature vector groups.
In an optional implementation manner, the model includes a plurality of feature matrices, and each feature matrix includes at least two feature vectors; the grouping module is specifically configured to: and dividing the plurality of feature matrixes into the number of feature matrix groups, wherein the difference between the number of feature vectors in each feature matrix group is smaller than a preset difference.
In an optional implementation manner, the generating module includes: the grouping unit is used for grouping the vector identifications of the plurality of characteristic vectors according to the characteristic vector group where the corresponding characteristic vector is located to obtain at least two vector identification groups; and the generating unit is used for respectively generating an identification tensor according to the vector identifications in each vector identification group to obtain at least two identification tensors.
In an optional implementation manner, the second obtaining module includes: the first invoking unit is used for sequentially invoking multiple types of hardware resources in the first electronic device according to a first identification tensor of the at least two identification tensors so as to acquire an eigenvector corresponding to the vector identification in the first identification tensor in an eigenvector group through the invoked hardware resources; the second calling unit is used for sequentially calling the hardware resources of the multiple types according to a second identification tensor of the at least two identification tensors so as to obtain an eigenvector corresponding to the vector identification in the second identification tensor in an eigenvector group through the called hardware resources; the second identified tensor comprises an identified tensor of the at least two identified tensors other than the first identified tensor; a determining unit, configured to determine, in a process of sequentially retrieving the hardware resources of the multiple categories according to the second identification tensor, whether the hardware resource of a target category in the multiple categories is being used when the hardware resource of the target category needs to be retrieved according to the second identification tensor; and a third calling unit, configured to, when the hardware resource of the target category is not being used, call the hardware resource of the target category according to the second identification tensor.
In a fifth aspect, the present application shows a data processing apparatus comprising: the searching module is used for searching a target characteristic vector in a first memory if the target characteristic vector in a plurality of characteristic vectors in the sparse model needs to be acquired in the process of training the sparse model or in the process of processing data by using the trained sparse model; the response module is used for responding to the target characteristic vector under the condition that the target characteristic vector is found in the first memory; the target eigenvector is selected from the plurality of eigenvectors and cached in the first memory in advance according to the sequence of the access frequencies of the plurality of eigenvectors stored in the second memory from high to low, and the data access rate of the first memory is greater than that of the second memory.
In a sixth aspect, the present application shows a data processing apparatus comprising: the fourth acquisition module is used for acquiring the feature vectors in the sparse model required to be used in the next round of training in the first memory under the condition that the next round of training of the sparse model is required to be performed for multiple rounds of training; a training module for performing a next round of training of a plurality of rounds of training on the sparse model using the feature vectors; the feature vector is obtained in the second memory and cached in the first memory under the condition of the previous round of training of multiple rounds of training on the sparse model in advance, and the data access rate of the first memory is greater than that of the second memory.
In a seventh aspect, the present application shows an electronic device comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to perform the method of the first aspect.
In an eighth aspect, the present application illustrates a non-transitory computer readable storage medium having instructions which, when executed by a processor of an electronic device, enable the electronic device to perform the method of the first aspect.
In a ninth aspect, the present application shows a computer program product, wherein the instructions of the computer program product, when executed by a processor of an electronic device, enable the electronic device to perform the method according to the first aspect.
In a tenth aspect, the present application shows an electronic device comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to perform the method of the second aspect.
In an eleventh aspect, the present application illustrates a non-transitory computer-readable storage medium having instructions which, when executed by a processor of an electronic device, enable the electronic device to perform the method of the second aspect.
In a twelfth aspect, the present application shows a computer program product, wherein the instructions of the computer program product, when executed by a processor of an electronic device, enable the electronic device to perform the method according to the second aspect.
In a thirteenth aspect, the present application shows an electronic device comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to perform the method according to the third aspect.
In a fourteenth aspect, the present application illustrates a non-transitory computer-readable storage medium having instructions which, when executed by a processor of an electronic device, enable the electronic device to perform the method of the third aspect.
In a fifteenth aspect, the present application shows a computer program product, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the method according to the third aspect.
Compared with the prior art, the embodiment of the application has the following advantages:
in one case, the operations of obtaining the feature vectors respectively corresponding to the vector identifications are independent from each other. That is, for any one of the vector identifiers of the plurality of feature vectors, the feature vector corresponding to the vector identifier is acquired separately, and in order to be able to acquire the feature vector corresponding to the vector identifier, in one embodiment, one acquisition operator is used separately to acquire the feature vector corresponding to the vector identifier, and the same is true for each of the other vector identifiers of the plurality of feature vectors.
As can be seen, in the above manner, one acquisition operator needs to be used for each vector identifier.
For the obtaining operator, if it is necessary to obtain the feature vector corresponding to the vector identifier using the obtaining operator, the obtaining operator needs to be started, the obtaining operator is scheduled to an object that needs to be operated (for example, the vector identifier, and the like), and then the obtaining operator can be used to perform the operation of "obtaining the feature vector corresponding to the vector identifier".
It can be seen that, the above process involves a flow of "start operator" - "scheduling operator" - "execute operation", and actually "execute operation" can achieve the purpose of obtaining the feature vector corresponding to the vector identifier. Thus, the "start operator" and the "schedule operator" which are not actually the "execution operation" take time and occupy hardware resources.
In the case of more vector identifications (more feature vectors need to be acquired separately), more time is consumed for the "start operator" and the "scheduling operator", and more hardware resources are wasted for the "start operator" and the "scheduling operator".
In the present application, at least one obtaining operator may be used to perform an operation of "obtaining at least one feature tensor in a distributed system according to at least one identification tensor", so that a plurality of feature vectors may be obtained by using fewer obtaining operators (for example, one obtaining operator is used, the number of obtaining operators used may be less than the number of vector identifiers of a plurality of feature vectors, and the like), and thus, although the identification tensor includes the vector identifiers of a plurality of feature vectors, the vector identifiers of a plurality of feature vectors are all operated by the same obtaining operator, only one "starting operator" and one "scheduling operator" are needed, the number of "starting operators" and the number of "scheduling operators" are reduced, thereby saving the time required for obtaining a plurality of feature vectors, and improving the efficiency of obtaining a plurality of feature vectors, and save hardware resources.
Drawings
Fig. 1 is a schematic flow chart of a data processing method shown in the present application.
Fig. 2 is a schematic flow chart of a data processing method shown in the present application.
Fig. 3 is a flow chart diagram illustrating a data processing method according to the present application.
Fig. 4 is a block diagram of a distributed system shown in the present application.
Fig. 5 is a schematic flow chart of a data processing method based on a distributed system according to the present application.
Fig. 6 is a block diagram illustrating a distributed system-based data processing apparatus according to the present application.
Fig. 7 is a block diagram showing a structure of a data processing apparatus according to the present application.
Fig. 8 is a block diagram showing a structure of a data processing apparatus according to the present application.
Fig. 9 is a schematic diagram of the structure of an apparatus shown in the present application.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.
Referring to fig. 1, a flow chart of a data processing method of the present application is shown, where the method is applied to an electronic device, and the method may include:
in step S101, in a case where a target feature vector of a plurality of feature vectors in a model needs to be acquired, the target feature vector is looked up in a first memory in an electronic device.
In the present application, in the process of training a model or in the process of processing online data using a model after the model is trained, it is necessary to obtain a feature vector in the model. The model includes a sparse model, etc., and of course, other types of models may be included, and the type of model is not limited in the present application.
The model comprises a plurality of feature vectors, the feature vectors in the model need to be different at different stages in the process of training the model or different stages in the process of processing the online data by using the model, and the electronic equipment can acquire a part of feature vectors in the model under the condition that the part of feature vectors in the model need to be used.
In one embodiment of the present application, a plurality of feature vectors in the model are each stored in a second memory, the second memory comprising: main memory in a CPU in an electronic device, and the like.
At least a part of the feature vectors in the model stored in the second memory may be cached in the first memory in advance, and the data access rate of the first memory is greater than that of the second memory. In this way, when the at least a portion of the feature vectors in the model needs to be acquired later, the at least a portion of the feature vectors in the model may not be acquired in the second memory, and the at least a portion of the feature vectors in the model may be directly acquired in the first memory, so as to improve the rate of acquiring the at least a portion of the feature vectors in the model, and further improve the efficiency of training the model or the efficiency of processing online data based on the model.
The first memory includes: a memory in a GPU (Graphics Processing Unit) in the electronic device, and the like.
The specific process of caching at least a part of feature vectors in the plurality of feature vectors in the model stored in the second memory in the first memory may refer to a process shown later, and is not described in detail herein.
In the case where the target feature vector is found in the first memory, in step S102, the target feature vector is responded to.
In the present application, the target feature vector in the model stored in the second memory may or may not have been previously cached in the first memory.
In the case where the target feature vector has been previously cached in the first memory, the target feature vector may be found in the first memory and then may be responded to.
In this application, responding to the target feature vector includes training a model using the target feature vector or processing online data using the feature vector, or the like.
In the case that the target feature vector is not found in the first memory, in step S103, the target feature vector is found in a second memory in the electronic device, and the target feature vector is responded to.
Wherein the data access rate of the first memory is greater than the data access rate of the second memory.
If the target feature vector is not cached in the first memory in advance, the target feature vector cannot be found in the first memory, and in order to obtain the target feature vector, the target feature vector may be found in a second memory in the electronic device, and then the target feature vector may be responded to.
In the present application, in a case where a target feature vector of a plurality of feature vectors in a model needs to be obtained, the target feature vector is searched in a first memory in an electronic device. And responding to the target characteristic vector under the condition that the target characteristic vector is found in the first memory. And under the condition that the target characteristic vector is not found in the first memory, finding the target characteristic vector in a second memory in the electronic equipment, and responding to the target characteristic vector, wherein the data access rate of the first memory is greater than that of the second memory.
In this way, in a case where a target feature vector of a plurality of feature vectors in a model needs to be obtained, the electronic device may directly obtain the target feature vector in the first memory, and then may respond to the target feature vector, for example, training the model based on the target feature vector or processing online data based on the target feature vector, and the like.
Secondly, the target feature vector may be a feature vector with a high frequency of being visited among a plurality of feature vectors in the model, belonging to a hotspot feature vector, similar to a "elephant flow", and thus, the target feature vector may be a feature vector with a high frequency of being visited during a time period in which the current time is located among the plurality of feature vectors in the model. The frequency that the target feature vector will be accessed in the time period of the current time is often greater than the frequency that other feature vectors except the non-target feature vector in the model stored in the second memory will be accessed in the time period of the current time, so that, in the case of training the model or using the model to process online data, the number of times that the target feature vector is used is often greater than the number of times that other feature vectors except the non-target feature vector in the model are used.
In one embodiment of the present application, the number of target feature vectors to be acquired is multiple. Wherein, the plurality of target feature vectors have an arrangement order. When a plurality of target feature vectors are used to train a model or a plurality of target feature vectors are used to process online data, the plurality of target feature vectors need to be combined according to the arrangement order, and then the combined target feature vectors are used to train the model or the combined target feature vectors are used to process online data.
Wherein a portion of the plurality of target feature vectors is located in the first memory and another portion of the plurality of target feature vectors is located in the second memory.
Since the data access rate of the first memory is greater than that of the second memory, in order to improve the efficiency of responding to the target feature vector, based on the embodiment shown in fig. 1, the process of responding to the target feature vector includes: and transferring the other part of target feature vectors searched in the second memory to the first memory, and acquiring the arrangement sequence among the target feature vectors in the plurality of target feature vectors. The plurality of target feature vectors may then be combined in the first memory in an order of arrangement between the respective target feature vectors and in response to the combined plurality of target feature vectors.
In an embodiment of the present application, the process of searching for the target feature vector in the first memory or the second memory is to search for the target feature vector in the first memory or the second memory according to the vector identifier of the target feature vector, and in one example, in the correspondence between the vector identifier and the feature vector stored in the first memory or the second memory, search for the feature vector corresponding to the vector identifier of the feature vector and use the feature vector as the target feature vector.
In the case that a plurality of target feature vectors in a plurality of feature vectors in a model need to be obtained initially, the vector identifiers of the respective target feature vectors are obtained, and the arrangement order among the vector identifiers of the respective target feature vectors is also known, so that the arrangement order among the vector identifiers of the respective target feature vectors can be used as the arrangement order among the respective target feature vectors, and the like.
In another embodiment of the present application, referring to fig. 2, the method further comprises:
in step S201, the frequency at which each feature vector in the model stored in the second memory is accessed, respectively, is acquired.
In the present application, the second memory stores the feature vectors and the like in the model, and the feature vectors and the like in the model stored in the second memory may be accessed by a processor (for example, a CPU and a GPU) and the like in the electronic device.
In one scenario, in the process of training a model by an electronic device, it is often necessary to acquire feature vectors in the model stored in the second memory, and then perform a correlation calculation according to the training data and the acquired feature vectors in the model, where a specific calculation manner depends on the model, and is not described in detail herein.
In the process of training a model by an electronic device, feature vectors in the model need to be used at different times or at different stages.
As such, after the electronic device has trained the model for a period of time, the respective feature vectors in the model stored in the second memory may be accessed differently, e.g., the frequency with which the respective feature vectors are accessed over a period of time may be different, etc.
The frequency with which the feature vector is accessed includes: the frequency with which the feature vector is accessed in a time period before and closer to the current time, etc.
In one example, the time period before and closer to the current time includes: and taking the current time as the time period of the ending time. For example, the duration lasting for 2 seconds, the duration lasting for 3 seconds, the duration lasting for 5 seconds, etc. with the current time as the ending time, of course, the duration lasting for 10 seconds, the duration lasting for 20 seconds, the duration lasting for 50 seconds, etc. may be used, and the specific duration is not limited in the present application.
In this application, for any one feature vector in the model stored in the second memory, each time the feature vector is accessed once, the number of times of accesses corresponding to the vector identification of the feature vector may be increased in the correspondence between the vector identification stored in the second memory and the number of times of accesses, and the same is true for each of the other feature vectors in the model stored in the second memory.
Therefore, in this step, the frequency of accessing each feature vector in the model stored in the second memory may be obtained according to the correspondence between the vector identifier stored in the second memory and the number of times of accessing.
In step S202, at least a part of the feature vectors is selected from the feature vectors in the model stored in the second memory in order of the frequency of access from high to low.
In the present application, the second memory stores a plurality of feature vectors in the model. The feature vectors in the model need to be used both during training of the model and during processing of the on-line data using the model, and in the case of needing to use the feature vectors in the model, in one way, the feature vectors that need to be used may be retrieved in the second memory.
However, in the present application, the interface bandwidth of the second memory is low, which results in a low data access rate of the second memory, which results in a low rate of the electronic device acquiring the feature vectors in the model in the second memory, and thus may result in low efficiency of the electronic device in training the model or processing online data using the model.
Therefore, in order to improve the efficiency of the electronic device in training the model or processing the online data using the model, in the present application, the rate at which the electronic device acquires the feature vectors in the model may be improved.
In order to increase the rate at which the electronic device obtains the feature vectors in the model, at least one first memory may be provided, the data access rate of the first memory being greater than the data access rate of the second memory, i.e., the rate at which the electronic device accesses data in the first memory is greater than the rate at which the electronic device accesses data in the second memory.
Further, a part of the feature vectors of the model stored in the second memory may be cached in the first memory, so that the electronic device may directly obtain the part of the feature vectors in the first memory when the part of the feature vectors in the model needs to be obtained later.
Because the speed of the electronic equipment accessing data in the first memory is greater than the speed of the electronic equipment accessing data in the second memory, the speed of the electronic equipment acquiring the part of the feature vectors can be improved, and further the efficiency of the electronic equipment training a model or the efficiency of processing online data by using the model is improved.
In this application, the feature vectors in the model stored in the second memory are different, the feature vectors in the second memory that need to be acquired by the electronic device are different at different stages in the process of training the model or at different stages in the process of processing online data using the model, some of the feature vectors are accessed more times in the current time period, some of the feature vectors are accessed less times in the current time period, the feature vectors that are accessed more times in the current time period relate to more stages in the process of training the model or more stages in the process of processing online using the model, and the feature vectors that are accessed less times in the current time period relate to less stages in the process of training the model or less stages in the process of processing online using the model.
Therefore, in order to improve the efficiency of the electronic device in training the model or the efficiency of processing the online data using the model as much as possible, in another embodiment of the present application, the feature vector with higher frequency of being accessed in the plurality of feature vectors in the model stored in the second memory may be cached in the first memory. Specifically, topN feature vectors of the accessed frequency of the plurality of feature vectors may be cached in the first memory. N includes positive integers, such as 5, 10, 15 or 20, etc., and the specific values may be determined according to actual conditions, which is not limited in the present application.
In one example, at least a part of feature vectors may be selected from feature vectors in the model stored in the second memory in an order from high to low, where the at least a part of feature vectors may include one feature vector or at least two feature vectors, and the specific number may be determined according to actual situations and is not described in detail herein.
In step S203, the at least part of the feature vector is buffered in a first memory.
In this application, for any one of the at least some feature vectors, the feature vector in the second memory may be copied, and the copied feature vector may be cached in the first memory, so that the feature vector is available in both the first memory and the second memory. Or, the feature vector may be directly cut from the second memory to the first memory, so that the feature vector no longer exists in the second memory, thereby saving the storage space of the second memory, and then the feature vector may be cut from the first memory to the second memory again when necessary. Wherein if the first memory has cached the feature vector, the caching of the feature vector in the first memory need not be repeated.
The same is true for each of the other at least partial feature vectors.
In this application, the higher frequency with which the at least part of the feature vector is accessed tends to indicate that: the at least part of the feature vector is accessed more frequently in a short period of time than the other feature vectors in the second memory except for the at least part of the feature vector, the more stages the at least part of the feature vector is involved in training the model or the more stages the at least part of the feature vector is involved in processing the on-line data using the model, and therefore, caching the at least part of the feature vector in the first memory may improve the efficiency of training the model or the efficiency of processing the on-line data using the model.
In an embodiment of the present application, the process of caching at least part of the feature vectors with high access frequency stored in the second memory in the first memory by using the embodiment shown in fig. 2 may be periodic, that is, the embodiment shown in fig. 2 may be executed at intervals to improve the accuracy and timeliness of the feature vectors with high access frequency cached in the first memory.
In this application, in the case that the at least part of the feature vector needs to be cached in the first memory, sometimes the free storage space in the first memory is greater than or equal to the occupied space of the at least part of the feature vector, so that the first memory can accommodate the at least part of the feature vector, and therefore, the at least part of the feature vector can be directly cached in the first memory.
However, sometimes the free storage space in the first memory is smaller than the space occupied by the at least part of the feature vectors because other feature vectors in the model are stored in the first memory, so that the first memory cannot accommodate the at least part of the feature vectors, and the at least part of the feature vectors cannot be directly cached in the first memory, and then the electronic device cannot acquire the at least part of the feature vectors in the first memory, which results in a failure to increase the rate of acquiring the at least part of the feature vectors, and further a failure to increase the efficiency of training the model or the efficiency of processing online data using the model.
Therefore, in order to solve the above problem, on the basis of the embodiment shown in fig. 2, in an embodiment of the present application, before caching the at least part of the feature vector in the first memory, referring to fig. 3, the method further includes:
in step S301, it is determined whether the free storage space of the first memory is smaller than the space occupied by the at least part of the feature vector.
In case the free storage space of the first memory is larger than or equal to the occupied space of the at least part of the feature vector, the at least part of the feature vector may be directly cached in the first memory.
When the free storage space of the first memory is smaller than the space occupied by the at least part of the feature vectors, step S302 may be executed to cache the at least part of the feature vectors in the first memory, so that the at least part of the feature vectors may be obtained in the first memory later, so as to improve the rate of obtaining the at least part of the feature vectors, and further improve the efficiency of training the model or the efficiency of processing the online data using the model.
In the case that the free storage space of the first memory is smaller than the occupied space of the at least part of the feature vectors, in step S302, the frequency at which the feature vectors in the model stored in the first memory are respectively accessed is obtained.
In step S303, according to the sequence from low to high of the frequency of accessing each feature vector in the model stored in the first memory, deleting at least one feature vector in the model stored in the first memory so that the free storage space of the first memory is greater than or equal to the space occupied by at least part of the feature vectors, and then executing step S203: the at least part of the feature vector is buffered in a first memory.
The first memory can accommodate the at least part of the feature vector, that is, the at least part of the feature vector can be cached in the first memory, and then the electronic device can acquire the at least part of the feature vector in the first memory, so that the speed of acquiring the data of the at least part of the feature vector is increased, and the efficiency of training the model or the efficiency of processing the data on the line by using the model is increased.
The feature vector with the lowest access frequency can be deleted in the first memory, and if the free storage space of the first memory is smaller than the occupied space of the at least partial feature vector, the feature vector with the lowest access frequency is continuously deleted from the rest feature vectors in the first storage space until the free storage space of the first memory is larger than or equal to the occupied space of the at least partial feature vector.
In the first memory, the number of times of accessing the feature vector with high frequency in a short period of time later is often greater than the number of times of accessing the feature vector with low frequency later in a short period of time, and the phase involved in the process of training the model or the phase involved in the process of processing on the model processing line with high frequency will be greater than the phase involved in the process of training the model or the phase involved in the process of processing on the model processing line with low frequency, so that in the first memory, the feature vector with low frequency of accessing is preferentially deleted, and the feature vector with high frequency of accessing is kept, thereby improving the efficiency of training the model or the efficiency of processing data on the line with the model as much as possible.
The present application is illustrated by way of example and not by way of limitation of the scope of the present application. In the process of training the sparse model or in the process of processing data by using the trained sparse model, if a target characteristic vector in a plurality of characteristic vectors in the sparse model needs to be acquired, searching the target characteristic vector in a first memory; responding to the target characteristic vector under the condition that the target characteristic vector is found in the first memory; the target characteristic vector is selected from the plurality of characteristic vectors and cached in the first memory in advance according to the sequence of the access frequencies of the plurality of characteristic vectors stored in the second memory from high to low, and the data access rate of the first memory is greater than that of the second memory.
In one embodiment of the present application, in the process of training the model, multiple rounds of training can be performed on the model, and the feature vectors in the model involved in each round of training are different. That is, the feature vectors in the model that need to be obtained when one round of training is performed are different from the feature vectors in the model that need to be obtained when the next round is performed.
For any round of training, when the feature vectors needed to be used for the round of training need to be acquired, in order to be able to acquire the feature vectors needed to be used for the round of training in the first memory to improve the speed of acquiring the feature vectors, before the round of training is started, the feature vectors needed to be used for the round of training can be cached in the first memory, so that when the round of training is performed later, the feature vectors needed to be used for the round of training can be directly acquired in the first memory to improve the speed of acquiring the feature vectors, and further the efficiency of training the model is improved.
Further, in order to save the storage space of the first memory, in another embodiment of the present application, after the round of training is finished, the feature vectors used by the round of training can be deleted in the first memory.
The same is true for each of the other rounds of training.
In one example, during one of the multiple rounds of training, the feature vectors in the model needed to be used in at least one round of training after the one round of training may be obtained in the second memory, and then the feature vectors in the model needed to be used in at least one round of training after the one round of training may be buffered in the first memory, so that the feature vectors in the model needed to be used in the at least one round of training may be directly obtained in the first memory during the at least one round of training, thereby improving the efficiency of training the model.
Further, in order to save the storage space of the first memory, in another embodiment of the present application, after the at least one round of training is finished, the feature vectors in the model required to be used in the at least one round of training may be deleted in the first memory.
The present application is illustrated by way of example and not by way of limitation of the scope of the present application. Under the condition of a next round of training of multi-round training of the sparse model, acquiring a feature vector in the sparse model required to be used in the next round of training in a first storage; performing a next round of training of the multiple rounds of training on the sparse model by using the feature vectors; the feature vector is obtained in the second memory and cached in the first memory under the condition of the previous round of training of carrying out multiple rounds of training on the sparse model in advance, and the data access rate of the first memory is greater than that of the second memory.
In the multi-round training, the previous round and the next round may be adjacent, and the previous round is located before the next round.
In an embodiment of the application, the electronic device may determine, according to an actual usage situation of the first memory, a cache space in the first memory, where the cache space is used to cache the feature vectors in the model, and the cache space may provide a user selection right for a user to select, and may also be adaptively adjusted according to the actual situation.
Or, in another embodiment of the present application, the user may control the electronic device to display an adjustment interface, where the adjustment interface includes an adjustment control for adjusting a cache space in the first memory for caching the feature vectors in the model, and the user may implement adjustment of the cache space in the first memory for caching the feature vectors in the model by operating the adjustment control.
In yet another embodiment of the present application, the user may control the electronic device to display a setting interface, where the setting interface includes a setting control for setting a policy for caching feature vectors in the model in the first memory.
The strategies include at least two of:
1. and selecting a part of feature vectors from the plurality of feature vectors according to the sequence of the access frequency of the plurality of feature vectors in the model stored in the second memory from high to low, and caching the part of feature vectors in the first memory.
2. And acquiring the feature vectors in the sparse model required to be used in the next round of training in the second memory and caching the feature vectors in the first memory in the case of the last round of training of the multiple rounds of training on the sparse model.
The user can implement a policy to set the feature vectors in the cache model in the first memory by manipulating the setting control.
Referring to fig. 4, a block diagram of a distributed system according to the present application is shown, where the distributed system includes a plurality of electronic devices, and the plurality of electronic devices are connected in a pairwise communication manner.
The electronic device may include a front-end device and may also include a back-end device.
The front-end device includes devices that can be directly controlled by a wide range of users, such as a mobile phone, a tablet computer, or a notebook computer.
The backend device may include a server or the like.
In the application, the model includes a plurality of feature vectors, each electronic device in the distributed system stores a part of the feature vectors in the model, and the feature vectors in the models stored in the electronic devices are different or do not overlap. Each feature vector has a respective vector identifier, and the vector identifiers of different feature vectors are different.
Wherein the model may comprise a sparse model or the like.
Referring to fig. 5, a flowchart illustrating a data processing method according to the present application is shown, where the method is applied to the first electronic device shown in fig. 4, where the first electronic device is one of the electronic devices in the distributed system shown in fig. 4, and the method may include:
in step S401, vector identifications of a plurality of feature vectors in a model are obtained.
In the present application, during the process of training the model or during the process of processing the on-line data by using the model, the feature vectors in the model need to be acquired.
The model includes a large number of feature vectors, the feature vectors in the model need to be different at different stages in the process of training the model or different stages in the process of processing online data by using the model, and the first electronic device can acquire a part of feature vectors under the condition that the part of feature vectors need to be used.
When a plurality of feature vectors in the model need to be acquired at one stage, the first electronic device first obtains vector identifiers of the plurality of feature vectors, and then acquires the plurality of feature vectors according to the vector identifiers of the plurality of feature vectors.
The vector identification of the plurality of feature vectors may be the result output by the previous stage that precedes and is adjacent to the one stage, and so on.
In step S402, at least one identification tensor is generated from the vector identification of the plurality of eigenvectors.
In one embodiment of the present application, the vector identification of a plurality of feature vectors may be combined into one identification tensor. For example, the concat operation is used to merge vector identifications of a plurality of eigenvectors into one identification tensor.
Or, in another embodiment of the present application, the vector identifiers of the plurality of eigenvectors may be split into at least two parts, and then an identifier tensor is respectively generated according to the eigenvector of each part, so as to obtain at least two identifier tensors, where the vector identifiers in different identifier tensors do not overlap.
In step S403, at least one feature tensor is acquired in the distributed system according to the at least one identification tensor based on the at least one acquisition operator, where the at least one feature tensor includes a plurality of feature vectors.
In step S404, at least one feature tensor is split to obtain a plurality of feature vectors.
In an embodiment of the present application, if an identification tensor is obtained in step S402, an feature tensor can be obtained in the distributed system according to the identification tensor in step S403, where the feature tensor includes a plurality of feature vectors, and an order between the plurality of feature vectors in the feature tensor is the same as an order between the vector identifications in the identification tensor. In this way, in step S404, the feature tensor can be split, and the feature vectors corresponding to the vector identifiers are sequentially obtained in order, so as to obtain a plurality of feature vectors.
In another embodiment of the present application, if at least two identification tensors are obtained in step S402, in step S403, an eigen tensor can be obtained in the distributed system according to each identification tensor, so as to obtain at least two eigen tensors in total, each eigen vector includes at least two eigen vectors, and an order between at least two eigen vectors in each eigen tensor is the same as an order between corresponding vector identifications in the identification tensor corresponding to the eigen tensor. In this way, in step S404, each feature tensor can be split separately, and the feature vectors corresponding to the vector identifiers are obtained sequentially in order, so as to obtain a plurality of feature vectors.
In one case, the operations of obtaining the feature vectors respectively corresponding to the vector identifications are independent from each other. That is, for any one of the vector identifiers of the plurality of feature vectors, the feature vector corresponding to the vector identifier is acquired separately, and in order to be able to acquire the feature vector corresponding to the vector identifier, in one embodiment, one acquisition operator is used separately to acquire the feature vector corresponding to the vector identifier, and the same is true for each of the other vector identifiers of the plurality of feature vectors.
As can be seen, in the above manner, one acquisition operator needs to be used for each vector identifier.
For the obtaining operator, if it is necessary to obtain the feature vector corresponding to the vector identifier using the obtaining operator, the obtaining operator needs to be started, the obtaining operator is scheduled to an object that needs to be operated (for example, the vector identifier, and the like), and then the obtaining operator can be used to perform the operation of "obtaining the feature vector corresponding to the vector identifier".
It can be seen that, the above process involves a flow of "start operator" - "scheduling operator" - "execute operation", and actually "execute operation" can achieve the purpose of obtaining the feature vector corresponding to the vector identifier. Thus, the "start operator" and the "schedule operator" which are not actually the "execution operation" take time and occupy hardware resources.
In the case of more vector identifications (more feature vectors need to be acquired separately), more time is consumed for the "start operator" and the "scheduling operator", and more hardware resources are wasted for the "start operator" and the "scheduling operator".
In the present application, at least one obtaining operator may be used to perform an operation of "obtaining at least one feature tensor in a distributed system according to at least one identification tensor", so that a plurality of feature vectors may be obtained by using fewer obtaining operators (for example, one obtaining operator is used, the number of obtaining operators used may be less than the number of vector identifiers of a plurality of feature vectors, and the like), and thus, although the identification tensor includes the vector identifiers of a plurality of feature vectors, the vector identifiers of a plurality of feature vectors are all operated by the same obtaining operator, only one "starting operator" and one "scheduling operator" are needed, the number of "starting operators" and the number of "scheduling operators" are reduced, thereby saving the time required for obtaining a plurality of feature vectors, and improving the efficiency of obtaining a plurality of feature vectors, and save hardware resources.
In one embodiment of the present application, since a part of feature vectors in a model is stored in each electronic device in a distributed system, a plurality of feature vectors may be distributed in a plurality of electronic devices.
As such, for the first electronic device, a part of the feature vectors may be stored locally by the first electronic device, or none of the feature vectors may be stored locally by the first electronic device, and a part of the feature vectors may be stored in each of a plurality of second electronic devices in the distributed system except the first electronic device.
Thus, for the first electronic device, if the first electronic device needs to acquire at least one feature tensor in the distributed system according to at least one identification tensor based on at least one acquisition operator, the first electronic device may send the vector identifications in the identification tensor to the second electronic device in the distributed system where the corresponding feature vectors are located respectively based on the at least one acquisition operator, and receive the feature vectors returned by the second electronic device and found according to the received vector identifications.
For example, for any vector identifier in the identifier tensor, the first electronic device may determine a second electronic device where the feature vector corresponding to the vector identifier is located, and then send the vector identifier to the second electronic device where the feature vector corresponding to the vector identifier is located, so that the second electronic device searches for the feature vector corresponding to the vector identifier, and returns the feature vector corresponding to the vector identifier to the first electronic device. The first electronic device may then receive the feature vector corresponding to the vector identification returned by the second electronic device. The same is true for every other vector identification in the identification tensor.
The second electronic device comprises an electronic device other than the first electronic device in the distributed system.
It should be noted that, the operation of the first electronic device obtaining the feature vectors corresponding to the vector identifications from the second electronic device is performed based on one obtaining operator.
In addition, if the first electronic device stores part of the feature vectors, the first electronic device may further search the feature vectors stored in the first electronic device for the feature vector corresponding to the vector identifier in the identifier tensor based on the at least one obtaining operator.
Then, the received eigenvector and/or the eigenvector found locally in the first electronic device may be stitched based on the at least one obtaining operator to obtain the feature tensor.
In one mode, one communication operator may be used to perform an operation of "respectively sending the vector identifiers in the identifier tensor to the second electronic device in the distributed system where the corresponding eigenvectors are located", another communication operator may be used to perform an operation of "receiving the eigenvectors returned by the second electronic device and found according to the received vector identifiers", a search operator may be used to perform an operation of "searching the eigenvectors corresponding to the vector identifiers in the identifier tensor in the eigenvector stored in the first electronic device", and a stitching operator may be used to perform an operation of "stitching the received eigenvectors and/or the eigenvectors found locally in the first electronic device" to obtain the feature tensor.
It can be seen that the above four operations need to use four operators respectively, and for each operator, if an operation needs to be executed by using the operator, an operator needs to be started, and the operator is scheduled to an object that needs to be operated (for example, a feature vector or a vector identifier of the feature vector, etc.), and then the operator can be used to execute the operation on the object that needs to be operated. Thus, the "start operator" and the "schedule operator" which are not actually the "execution operation" take time and occupy hardware resources.
Under the condition that a plurality of operations need to be executed in sequence and operators used by the operations are different, more time for starting the operators and scheduling the operators is consumed, and more hardware resources are wasted for the starting operators and the scheduling operators.
In the application, an obtaining operator is used to perform the operations of sending the vector identifiers in the identifier tensor to the second electronic device in the distributed system where the corresponding eigenvectors are located, and receiving the eigenvectors returned by the second electronic device and found according to the received vector identifiers. And/or searching an eigenvector corresponding to the vector identifier in the identifier tensor in the eigenvector stored in the first electronic device, and stitching the received eigenvector and/or the eigenvector searched locally in the first electronic device to obtain the characteristic tensor.
Thus, the vector identifiers in the identifier tensor are respectively sent to the second electronic device in the distributed system where the corresponding feature vectors are located, and the feature vectors returned by the second electronic device and found according to the received vector identifiers are received. And/or searching the eigenvector corresponding to the vector identifier in the identifier tensor in the eigenvector stored in the first electronic device, and stitching the received eigenvector and/or the eigenvector searched locally in the first electronic device to obtain the feature tensor relates to 4 different operations in sequence, but the 4 different operations are executed by using one operator, only one starting operator and one scheduling operator are needed, and the times of the starting operator and the times of the scheduling operator are reduced, so that the time can be saved, the efficiency can be improved, and the hardware resources can be saved.
When finding the eigenvector corresponding to the vector identifier in the identifier tensor from the eigenvector stored in the first electronic device, it may be found whether the eigenvector corresponding to the vector identifier in the identifier tensor exists in the eigenvector stored in the first memory in the first electronic device; under the condition that the feature vector corresponding to the vector identifier in the identifier tensor exists in the first memory, obtaining the feature vector corresponding to the vector identifier in the identifier tensor stored in the first memory, wherein the feature vector corresponding to the vector identifier in the identifier tensor stored in the first memory is obtained in a second memory in the first electronic device in advance and is cached in the first memory; the data access rate of the first memory is greater than the data access rate of the second memory; and under the condition that the feature vector corresponding to the vector identification in the identification tensor does not exist in the first memory, searching the feature vector corresponding to the vector identification in the identification tensor from the feature vectors stored in the second memory.
Because the data access rate of the first memory is greater than that of the second memory, the speed of searching the feature vector corresponding to the vector identifier in the identifier tensor in the feature vector stored in the first electronic device can be increased, and therefore the efficiency of training the model based on the feature vector and the efficiency of processing the data on the line based on the feature vector and the like can be increased.
In order to obtain the eigenvectors corresponding to the vector identifiers in the identifier tensor stored in the first memory, in one embodiment of the present application, the frequency at which each eigenvector in the model stored in the second memory is respectively accessed may be obtained in advance; selecting at least part of feature vectors in the model stored in the second memory according to the sequence of the accessed frequencies from high to low; at least part of the feature vectors are buffered in a first memory. Reference may be made in detail to the embodiments illustrated above, which are not intended to be exhaustive.
Or, in another embodiment of the present application, during one of the multiple rounds of training, feature vectors in the model required to be used in at least one round of training after the one round of training are obtained in the second memory; and caching the feature vectors in the model required to be used in at least one round of training in a first memory. Reference may be made in detail to the embodiments illustrated above, which are not intended to be exhaustive.
In another embodiment of the present application, when at least one identification tensor is generated according to vector identifications of a plurality of eigenvectors, the vector identifications of the plurality of eigenvectors may be deduplicated to obtain a deduplicated vector identification, the number of electronic devices in which eigenvectors corresponding to each deduplicated vector identification are located is determined, the fragmented vector identifications are obtained by fragmenting the deduplicated vector identifications according to the number, and at least two identification tensors are generated according to the fragmented vector identifications.
It should be noted that the operation of "deduplicating the vector identifiers of the plurality of eigenvectors to obtain the deduplicated vector identifiers, determining the number of electronic devices in which the eigenvectors corresponding to each vector identifier after the deduplication are located, segmenting the deduplicated vector identifiers according to the number to obtain the segmented vector identifiers, and generating at least two identifier tensors according to the segmented vector identifiers" may be performed based on one generation operator.
In one mode, the method may include performing an operation of "deduplicating vector identifiers of a plurality of eigenvectors to obtain deduplicated vector identifiers" using a deduplication operator, performing an operation of "determining the number of electronic devices in which eigenvectors corresponding to each deduplicated vector identifier are located" using a determination operator, performing an operation of "slicing the deduplicated vector identifiers according to the number to obtain sliced vector identifiers" using a slicing operator, and performing an operation of "generating at least two identifier tensors according to the sliced vector identifiers" using a generation operator. It can be seen that the above four operations need to use four different operators respectively, and for each operator, if an operation needs to be executed by using the operator, an operator needs to be started, and the operator is scheduled to an object (e.g. a feature vector, etc.) that needs to be operated, and then the operator can be used to execute the operation on the object that needs to be operated. Thus, the "start operator" and the "schedule operator" which are not actually the "execution operation" take time and occupy hardware resources.
Under the condition that a plurality of operations need to be executed in sequence and operators used by the operations are different, more time for starting the operators and scheduling the operators is consumed, and more hardware resources are wasted for the starting operators and the scheduling operators.
In the application, a generation operator is used to perform operations of 'de-duplicating vector identifiers of a plurality of eigenvectors to obtain de-duplicated vector identifiers, determining the number of electronic devices where eigenvectors corresponding to each de-duplicated vector identifier are located, segmenting the de-duplicated vector identifiers according to the number to obtain segmented vector identifiers, and generating at least two identifier tensors according to the segmented vector identifiers'.
Thus, although the process of "deduplicating vector identifications of a plurality of eigenvectors to obtain the vector identifications after deduplication", determining the number of electronic devices where eigenvectors corresponding to each vector identification after deduplication are located, segmenting the vector identifications after deduplication according to the number to obtain segmented vector identifications, and generating at least two identification tensors according to the segmented vector identifications "involves 4 different operations in sequence, 4 different operations are all executed by using one operator, only one" start operator "and one" schedule operator "are needed, the number of times of" start operator "and the number of times of" schedule operator "are reduced, so that time can be saved, efficiency is improved, and hardware resources are saved.
On the basis of the foregoing embodiment, in order to improve efficiency of the first electronic device in acquiring the plurality of feature vectors in the distributed system according to the vector identifications of the plurality of feature vectors, in another embodiment of the present application, feature vectors in models stored in a distributed manner in the distributed system may be grouped, and then when the plurality of feature vectors are acquired in the distributed system according to the vector identifications of the plurality of feature vectors, the vector identifications of the plurality of feature vectors may be grouped according to the grouped feature vectors in the models, so as to obtain at least two vector identification groups; then, corresponding feature vectors are obtained in the distributed system according to the vector identification groups respectively, and in the process of obtaining the corresponding feature vectors in the distributed system according to the vector identification groups respectively, staggered covering is carried out on different vector identification groups so as to improve the parallel rate of obtaining a plurality of feature vectors and further improve the efficiency of obtaining the plurality of feature vectors.
Specifically, in one embodiment, when grouping a plurality of feature vectors in a model of distributed storage in a distributed system, the following process may be performed, including:
11) the type of hardware resources in the first electronic device in the distributed system that need to be used in the process of obtaining the feature vectors in the model.
In the process of "obtaining a plurality of feature vectors in a distributed system according to vector identifiers of a plurality of feature vectors", the first electronic device often uses a plurality of types of hardware resources in the first electronic device, for example, the hardware resources include CPU resources, GPU resources, network bandwidth, and the like.
12) And grouping the feature vectors in the model according to the number of the types and a load balancing principle to obtain a plurality of feature vector groups.
The plurality of feature vectors in the model may be divided into the feature vector groups of the number, or the plurality of feature vectors in the model may be divided into the feature vector groups of several times the number.
For example, assuming that the number of types of hardware resources is 3, the plurality of feature vectors in the model may be divided into 3 feature vector groups, 6 feature vector groups, 9 feature vector groups, or the like, and the specific division into several feature vector groups may be determined according to actual situations, which is not limited in the present application.
In one embodiment of the present application, the feature vectors in the model are often in the form of a feature matrix, where the model includes a plurality of feature matrices, and each feature matrix includes at least two feature vectors.
Thus, when the feature vectors in the model are grouped according to the number of the types and the load balancing principle to obtain a plurality of feature vector groups, the plurality of feature matrices can be divided into the feature matrix groups with the number, and the difference between the number of the feature vectors included in each feature matrix group is smaller than the preset difference.
For any one feature vector group, the feature vectors in the feature vector group may be from at least two different feature matrices, and therefore, in order to make the vector identifiers of the feature vectors in the feature vector group unique, for any one feature vector in the feature vector group, the vector identifier of the feature vector may be generated according to an ID (Identity Document) of the feature matrix in which the feature vector is located and an ID of the feature matrix in which the feature vector is located.
The ID of different feature matrices are different, and the ID of different feature vectors in the same feature matrix is different, so that the vector identifiers of the feature vectors in each feature vector group are different.
On the basis of the foregoing embodiment, when at least one identification tensor is generated according to the vector identifications of the multiple eigenvectors in step S402, the vector identifications of the multiple eigenvectors may be grouped according to the eigenvector group in which the corresponding eigenvector is located, so as to obtain at least two vector identification groups. For example, for a vector identifier of any one feature vector, the first electronic device may determine, according to the vector identifier of the feature vector, a feature vector group in which the feature vector is located, and then include the vector identifier of the feature vector in a vector identifier group corresponding to the feature vector group. The same is true for the vector identifiers of each other eigenvector, so that an identifier tensor can be generated according to the vector identifiers in each vector identifier group, and at least two identifier tensors can be obtained.
Accordingly, when at least one feature tensor is acquired in the distributed system according to at least one identification tensor based on at least one acquisition operator in step S403, multiple kinds of hardware resources in the first electronic device may be sequentially invoked according to a first identification tensor of the at least two identification tensors, so as to acquire, from the feature vector group, feature vectors corresponding to vector identifications in the first identification tensor through the invoked hardware resources.
And sequentially calling a plurality of types of hardware resources in the first electronic device according to a second identification tensor of the at least two identification tensors, so as to obtain the eigenvector corresponding to the vector identification in the second identification tensor in the eigenvector group through the called hardware resources. The second identification tensor comprises an identification tensor of the at least two identification tensors other than the first identification tensor.
And, in the process of sequentially calling the hardware resources of the plurality of types according to the second identification tensor, in the case where the hardware resource of the target type among the plurality of types needs to be called according to the second identification tensor, it is determined whether the hardware resource of the target type is being used.
And when the hardware resource of the target category is not used, calling the hardware resource of the target category according to the second identification tensor.
When the hardware resource of the target category is being used, the hardware resource of the target category is called up according to the second identification tensor after waiting until the hardware resource of the target category is released (is not used any more).
When the first electronic device obtains the plurality of feature vectors in the distributed system according to the vector identifiers of the plurality of feature vectors, in one example, the first electronic device often needs to first use GPU resources to perform deduplication operations, fragmentation operations, and the like on the vector identifiers of the plurality of feature vectors, then use bandwidth resources to send the vector identifiers of a part of feature vectors to the second electronic device in the distributed system, and use CPU resources to search for another part of feature vectors in a part of feature vectors in the model stored in the first electronic device according to the vector identifiers of another part of feature vectors. Secondly, after the second electronic device finds a part of the feature vectors according to the vector identifiers of the part of the feature vectors, the second electronic device returns a part of the feature vectors to the first electronic device, and the first electronic device receives a part of the feature vectors returned by the second electronic device by using the bandwidth resources.
It can be seen that in the above example, the first electronic device needs to use GPU resources-bandwidth resources-CPU resources-bandwidth resources in turn.
If the feature vectors are acquired in the distributed system according to the vector identifications of the feature vectors, the number of the vector identifications required to be involved in each operation stage is the vector identifications (i.e. the full vector identifications) of all the feature vectors.
Therefore, after the GPU resources are called for all the vector identifiers, the bandwidth resources can be continuously called for all the vector identifiers; after bandwidth resources are called for all the vector identifiers, CPU resources can be continuously called for all the vector identifiers; after the CPU resource is called for all the vector identifiers, the bandwidth resource can be continuously called for all the vector identifiers.
This results in a long wait time for each vector identification in each operation phase (requiring the use of one kind of hardware resources), which results in inefficient acquisition of a plurality of eigenvectors in a distributed system based on the vector identifications of the plurality of eigenvectors.
In addition, in the process of one operation phase, other types of hardware resources required to be used in other operation phases may be idle, and thus the utilization rate of other types of hardware resources is low.
In the present application, the vector identifiers of a plurality of eigenvectors are divided into at least two identifier tensors according to the eigenvector group in which the corresponding eigenvector is located.
In this way, when at least one feature tensor is acquired in the distributed system according to at least one identification tensor based on at least one acquisition operator, multiple types of hardware resources in the first electronic device may be sequentially called according to one identification tensor, so as to acquire an eigenvector corresponding to the vector identification in the one identification tensor from the eigenvector group through the called hardware resources.
And once the first kind of hardware resources required to be used are used after the use of the one identification tensor is finished, the multiple kinds of hardware resources of the first electronic device can be sequentially called according to the other identification tensor, so that the eigenvector corresponding to the vector identification in the other identification tensor is obtained in the eigenvector group through the called hardware resources.
By analogy, once the first kind of hardware resources that need to be used are used after the use of the other identification tensor, the multiple kinds of hardware resources of the first electronic device may be sequentially called according to the other identification tensor, so as to obtain the eigenvector corresponding to the vector identification in the other identification tensor in the eigenvector group through the called hardware resources.
In the process of sequentially retrieving the hardware resources of the plurality of types according to the one identification tensor, when the hardware resource of the target type of the plurality of types needs to be retrieved according to the one identification tensor, whether the hardware resource of the target type is being used is judged. And when the hardware resource of the target category is not used, calling the hardware resource of the target category according to the one identification tensor.
Therefore, after one part of vector identifiers are called completely, the bandwidth resources can be continuously called for the part of vector identifiers without waiting for the GPU resources which are called for other part of vector identifiers, meanwhile, the GPU resources are called for the other part of vector identifiers, after the GPU resources are called for the other part of vector identifiers, the bandwidth resources can be continuously called for the other part of vector identifiers without waiting for the GPU resources which are called for the other part of vector identifiers, and meanwhile, the GPU resources are called for the other part of vector identifiers.
Therefore, the hardware resources are not idle as much as possible (that is, the hardware resources are used as much as possible) in any operation stage (requiring the use of one kind of hardware resources) in the process of obtaining the plurality of feature vectors in the distributed system according to the vector identifiers of the plurality of feature vectors, so that the utilization rate of the hardware resources can be improved.
And, the waiting time that each vector identifier needs to consume in each operation stage (one kind of hardware resources needs to be used) can be reduced, the parallel rate of obtaining a plurality of feature vectors can be improved, and the efficiency of obtaining a plurality of feature vectors in a distributed system according to the vector identifiers of the plurality of feature vectors can be further improved.
In addition, the higher the load balance degree among the feature vector groups is, the better the improvement effect of the hardware resource utilization rate is, and the improvement effect of the efficiency of acquiring a plurality of feature vectors is.
The feature vector group includes a plurality of feature vectors, the feature vectors include a plurality of data, and the load of the feature vector group can be understood as the number of data included in the feature vector group.
It is noted that, for simplicity of explanation, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will appreciate that the present application is not limited by the order of acts, as some steps may, in accordance with the present application, occur in other orders and concurrently. Further, those skilled in the art will also appreciate that the embodiments described in the specification are exemplary and that no action is necessarily required in this application.
Referring to fig. 6, a block diagram of an embodiment of a data processing apparatus based on a distributed system according to the present application is shown, where the distributed system includes a plurality of electronic devices, each of the electronic devices respectively stores a part of feature vectors in a model, and the apparatus is applied to a first electronic device in the plurality of electronic devices, and the apparatus includes:
a first obtaining module 11, configured to obtain vector identifiers of a plurality of feature vectors in the model; a generating module 12, configured to generate at least one identification tensor according to the vector identification of the feature vectors; a second obtaining module 13, configured to obtain at least one feature tensor in the distributed system according to the at least one identification tensor based on at least one obtaining operator, where the at least one feature tensor includes the plurality of feature vectors; a splitting module 14, configured to split the at least one feature tensor to obtain the plurality of feature vectors.
In an optional implementation manner, the second obtaining module includes: the searching unit is used for searching the eigenvector corresponding to the vector identifier in the identifier tensor in the eigenvector stored in the first electronic equipment based on the at least one obtaining operator; and/or the sending unit is used for sending the vector identifiers in the identifier tensor to second electronic equipment in a distributed system where the corresponding eigenvectors are located respectively, and the receiving unit is used for receiving the eigenvectors returned by the second electronic equipment and searched according to the received vector identifiers; the second electronic device comprises an electronic device in the distributed system other than the first electronic device; and the stitching unit is configured to stitch the received eigenvector and/or the eigenvector found in the first electronic device based on the at least one obtaining operator to obtain the feature tensor.
In an optional implementation manner, the search unit includes: a first searching subunit, configured to search, in an eigenvector stored in a first memory in the first electronic device, whether an eigenvector corresponding to the vector identifier in the identifier tensor exists; a first obtaining subunit, configured to, in a case where there exists an eigenvector corresponding to a vector identifier in the identifier tensor, obtain an eigenvector corresponding to the vector identifier in the identifier tensor stored in a first memory, where the eigenvector corresponding to the vector identifier in the identifier tensor stored in the first memory is obtained in advance in a second memory in the first electronic device and is cached in the first memory; the data access rate of the first memory is greater than the data access rate of the second memory; and the second searching subunit is configured to search, in the feature vectors stored in the second memory, the feature vectors corresponding to the vector identifiers in the identification tensor under the condition that there is no feature vector corresponding to the vector identifier in the identification tensor.
In an optional implementation manner, the search unit further includes: a second obtaining subunit, configured to obtain frequencies at which feature vectors in the model stored in the second memory are respectively accessed; a selecting subunit, configured to select at least part of feature vectors from feature vectors in the model stored in the second memory in an order from high to low in the accessed frequency; a first caching subunit configured to cache the at least part of the feature vector in the first memory.
In an optional implementation manner, the search unit further includes: a third obtaining subunit, configured to obtain, in the second memory, feature vectors in the model required to be used in at least one of the rounds of training after the one round of training in a process of one of the rounds of training on the model; a second buffer subunit, configured to buffer, in the first memory, the feature vectors in the model required to be used for the at least one round of training.
In an optional implementation manner, the generating module is specifically configured to: using a generating operator to deduplicate the vector identifications of the plurality of characteristic vectors to obtain the deduplicated vector identifications, determining the number of electronic devices where the characteristic vectors corresponding to the deduplicated vector identifications are located, segmenting the deduplicated vector identifications according to the number to obtain segmented vector identifications, and generating at least two identification tensors according to the segmented vector identifications.
In an optional implementation, the apparatus further comprises: a third obtaining module, configured to obtain a type of a hardware resource that needs to be used in the first electronic device in a process of obtaining the plurality of feature vectors in the model; and the grouping module is used for grouping the feature vectors in the model according to the number of the types and the load balancing principle to obtain a plurality of feature vector groups.
In an optional implementation manner, the model includes a plurality of feature matrices, and each feature matrix includes at least two feature vectors; the grouping module is specifically configured to: and dividing the plurality of feature matrixes into the number of feature matrix groups, wherein the difference between the number of feature vectors in each feature matrix group is smaller than a preset difference.
In an optional implementation manner, the generating module includes: the grouping unit is used for grouping the vector identifications of the plurality of characteristic vectors according to the characteristic vector group where the corresponding characteristic vector is located to obtain at least two vector identification groups; and the generating unit is used for respectively generating an identification tensor according to the vector identifications in each vector identification group to obtain at least two identification tensors.
In an optional implementation manner, the second obtaining module includes: the first invoking unit is used for sequentially invoking multiple types of hardware resources in the first electronic device according to a first identification tensor of the at least two identification tensors so as to acquire an eigenvector corresponding to the vector identification in the first identification tensor in an eigenvector group through the invoked hardware resources; the second calling unit is used for sequentially calling the hardware resources of the multiple types according to a second identification tensor of the at least two identification tensors so as to obtain an eigenvector corresponding to the vector identification in the second identification tensor in an eigenvector group through the called hardware resources; the second identified tensor comprises an identified tensor of the at least two identified tensors other than the first identified tensor; a determining unit, configured to determine, in a process of sequentially retrieving the hardware resources of the multiple categories according to the second identification tensor, whether the hardware resource of a target category in the multiple categories is being used when the hardware resource of the target category needs to be retrieved according to the second identification tensor; and a third calling unit, configured to, when the hardware resource of the target category is not being used, call the hardware resource of the target category according to the second identification tensor.
In one case, the operations of obtaining the feature vectors respectively corresponding to the vector identifications are independent from each other. That is, for any one of the vector identifiers of the plurality of feature vectors, the feature vector corresponding to the vector identifier is acquired separately, and in order to be able to acquire the feature vector corresponding to the vector identifier, in one embodiment, one acquisition operator is used separately to acquire the feature vector corresponding to the vector identifier, and the same is true for each of the other vector identifiers of the plurality of feature vectors.
As can be seen, in the above manner, one acquisition operator needs to be used for each vector identifier.
For the obtaining operator, if it is necessary to obtain the feature vector corresponding to the vector identifier using the obtaining operator, the obtaining operator needs to be started, the obtaining operator is scheduled to an object that needs to be operated (for example, the vector identifier, and the like), and then the obtaining operator can be used to perform the operation of "obtaining the feature vector corresponding to the vector identifier".
It can be seen that, the above process involves a flow of "start operator" - "scheduling operator" - "execute operation", and actually "execute operation" can achieve the purpose of obtaining the feature vector corresponding to the vector identifier. Thus, the "start operator" and the "schedule operator" which are not actually the "execution operation" take time and occupy hardware resources.
In the case of more vector identifications (more feature vectors need to be acquired separately), more time is consumed for the "start operator" and the "scheduling operator", and more hardware resources are wasted for the "start operator" and the "scheduling operator".
In the present application, at least one obtaining operator may be used to perform an operation of "obtaining at least one feature tensor in a distributed system according to at least one identification tensor", so that a plurality of feature vectors may be obtained by using fewer obtaining operators (for example, one obtaining operator is used, the number of obtaining operators used may be less than the number of vector identifiers of a plurality of feature vectors, and the like), and thus, although the identification tensor includes the vector identifiers of a plurality of feature vectors, the vector identifiers of a plurality of feature vectors are all operated by the same obtaining operator, only one "starting operator" and one "scheduling operator" are needed, the number of "starting operators" and the number of "scheduling operators" are reduced, thereby saving the time required for obtaining a plurality of feature vectors, and improving the efficiency of obtaining a plurality of feature vectors, and save hardware resources.
Referring to fig. 7, a block diagram of a data processing apparatus according to an embodiment of the present application is shown, which may specifically include the following modules:
the searching module 21 is configured to, in a process of training the sparse model or in a process of processing data using the trained sparse model, if a target feature vector of a plurality of feature vectors in the sparse model needs to be acquired, search the target feature vector in the first memory; a response module 22, configured to respond to the target feature vector if the target feature vector is found in the first memory; the target eigenvector is selected from the plurality of eigenvectors and cached in the first memory in advance according to the sequence of the access frequencies of the plurality of eigenvectors stored in the second memory from high to low, and the data access rate of the first memory is greater than that of the second memory.
Because the data access rate of the first memory is greater than that of the second memory, the method and the device can improve the rate of obtaining the feature vectors in the model, and therefore can improve the efficiency of training the model based on the feature vectors, the efficiency of processing online data based on the feature vectors and the like.
Referring to fig. 8, a block diagram of a data processing apparatus according to an embodiment of the present application is shown, which may specifically include the following modules:
a fourth obtaining module 31, configured to, in a case of a next round of training that requires multiple rounds of training on a sparse model, obtain, in a first memory, feature vectors in the sparse model that need to be used for the next round of training; a training module 32 for performing a next round of training of the multiple rounds of training on the sparse model using the feature vectors; the feature vector is obtained in the second memory and cached in the first memory under the condition of the previous round of training of multiple rounds of training on the sparse model in advance, and the data access rate of the first memory is greater than that of the second memory.
Because the data access rate of the first memory is greater than that of the second memory, the method and the device can improve the rate of obtaining the feature vectors in the model, and therefore can improve the efficiency of training the model based on the feature vectors, the efficiency of processing online data based on the feature vectors and the like.
The present application further provides a non-transitory, readable storage medium, where one or more modules (programs) are stored, and when the one or more modules are applied to a device, the device may execute instructions (instructions) of method steps in this application.
Embodiments of the present application provide one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an electronic device to perform the methods as described in one or more of the above embodiments. In the embodiment of the application, the electronic device comprises a server, a gateway, a sub-device and the like, wherein the sub-device is a device such as an internet of things device.
Embodiments of the present disclosure may be implemented as an apparatus, which may include electronic devices such as servers (clusters), terminal devices such as IoT devices, and the like, using any suitable hardware, firmware, software, or any combination thereof, for a desired configuration.
Fig. 9 schematically illustrates an example apparatus 1300 that can be used to implement various embodiments described herein.
For one embodiment, fig. 9 illustrates an example apparatus 1300 having one or more processors 1302, a control module (chipset) 1304 coupled to at least one of the processor(s) 1302, memory 1306 coupled to the control module 1304, non-volatile memory (NVM)/storage 1308 coupled to the control module 1304, one or more input/output devices 1310 coupled to the control module 1304, and a network interface 1312 coupled to the control module 1304.
Processor 1302 may include one or more single-core or multi-core processors, and processor 1302 may include any combination of general-purpose or special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.). In some embodiments, the apparatus 1300 can be a server device such as a gateway described in the embodiments of the present application.
In some embodiments, apparatus 1300 may include one or more computer-readable media (e.g., memory 1306 or NVM/storage 1308) having instructions 1314 and one or more processors 1302, which in combination with the one or more computer-readable media, are configured to execute instructions 1314 to implement modules to perform actions described in this disclosure.
For one embodiment, control module 1304 may include any suitable interface controllers to provide any suitable interface to at least one of the processor(s) 1302 and/or any suitable device or component in communication with control module 1304.
The control module 1304 may include a memory controller module to provide an interface to the memory 1306. The memory controller module may be a hardware module, a software module, and/or a firmware module.
Memory 1306 may be used, for example, to load and store data and/or instructions 1314 for device 1300. For one embodiment, memory 1306 may comprise any suitable volatile memory, such as suitable DRAM. In some embodiments, the memory 1306 may comprise a double data rate type four synchronous dynamic random access memory (DDR4 SDRAM).
For one embodiment, control module 1304 may include one or more input/output controllers to provide an interface to NVM/storage 1308 and input/output device(s) 1310.
For example, NVM/storage 1308 may be used to store data and/or instructions 1314. NVM/storage 1308 may include any suitable non-volatile memory (e.g., flash memory) and/or may include any suitable non-volatile storage device(s) (e.g., one or more Hard Disk Drives (HDDs), one or more Compact Disc (CD) drives, and/or one or more Digital Versatile Disc (DVD) drives).
NVM/storage 1308 may include storage resources that are physically part of the device on which apparatus 1300 is installed, or it may be accessible by the device and need not be part of the device. For example, NVM/storage 1308 may be accessible over a network via input/output device(s) 1310.
Input/output device(s) 1310 may provide an interface for apparatus 1300 to communicate with any other suitable device, input/output device(s) 1310 may include a communications component, a pinyin component, a sensor component, and so forth. The network interface 1312 may provide an interface for the device 1300 to communicate over one or more networks, and the device 1300 may wirelessly communicate with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols, such as access to a communication standard-based wireless network, e.g., WiFi, 2G, 3G, 4G, 5G, etc., or a combination thereof.
For one embodiment, at least one of the processor(s) 1302 may be packaged together with logic for one or more controllers (e.g., memory controller modules) of the control module 1304. For one embodiment, at least one of the processor(s) 1302 may be packaged together with logic for one or more controllers of the control module 1304 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 1302 may be integrated on the same die with logic for one or more controller(s) of the control module 1304. For one embodiment, at least one of the processor(s) 1302 may be integrated on the same die with logic of one or more controllers of the control module 1304 to form a system on chip (SoC).
In various embodiments, apparatus 1300 may be, but is not limited to being: a server, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a handheld computing device, a tablet, a netbook, etc.), among other terminal devices. In various embodiments, apparatus 1300 may have more or fewer components and/or different architectures. For example, in some embodiments, device 1300 includes one or more cameras, a keyboard, a Liquid Crystal Display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an Application Specific Integrated Circuit (ASIC), and speakers.
An embodiment of the present application provides an electronic device, including: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the electronic device to perform a method as described in one or more of the present applications.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The data processing method and apparatus provided by the present application are introduced in detail, and a specific example is applied in the present application to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (14)

1. A data processing method based on a distributed system is characterized in that the distributed system comprises a plurality of electronic devices, each electronic device respectively stores a part of feature vectors in a model, the method is applied to a first electronic device in the plurality of electronic devices, and the method comprises the following steps:
obtaining vector identifications of a plurality of feature vectors in the model;
generating at least one identification tensor according to the vector identification of the plurality of eigenvectors;
obtaining at least one feature tensor in the distributed system according to the at least one identification tensor based on at least one obtaining operator, wherein the at least one feature tensor comprises the plurality of feature vectors;
and splitting the at least one feature tensor to obtain the plurality of feature vectors.
2. The method of claim 1, wherein the obtaining at least one feature tensor in the distributed system from the at least one identification tensor based on at least one obtaining operator comprises:
based on the at least one obtaining operator, searching eigenvectors corresponding to the vector identifiers in the identifier tensor in the eigenvectors stored in the first electronic device; and/or respectively sending the vector identifiers in the identifier tensor to second electronic equipment in a distributed system where the corresponding feature vectors are located, and receiving the feature vectors returned by the second electronic equipment and searched according to the received vector identifiers; the second electronic device comprises an electronic device in the distributed system other than the first electronic device;
and based on the at least one acquisition operator, stitching the received eigenvectors and/or the eigenvectors found in the first electronic device to obtain the feature tensor.
3. The method of claim 2, wherein finding the vector in the identification tensor in the feature vector stored by the first electronic device identifies a corresponding feature vector, comprises:
searching whether an eigenvector corresponding to the vector identifier in the identifier tensor exists in the eigenvector stored in a first memory in the first electronic device;
under the condition that the feature vector corresponding to the vector identification in the identification tensor exists, acquiring the feature vector corresponding to the vector identification in the identification tensor stored in a first memory, wherein the feature vector corresponding to the vector identification in the identification tensor stored in the first memory is acquired in a second memory in the first electronic device in advance and is cached in the first memory; the data access rate of the first memory is greater than the data access rate of the second memory;
under the condition that the eigenvector corresponding to the vector identification in the identification tensor does not exist, searching the eigenvector corresponding to the vector identification in the identification tensor from the eigenvector stored in the second memory.
4. The method of claim 3, further comprising:
acquiring the frequency of respectively accessing each feature vector in the model stored in the second memory;
selecting at least part of feature vectors from the feature vectors in the model stored in the second memory in the order of the accessed frequency from high to low;
buffering the at least partial feature vector in the first memory.
5. The method of claim 3, further comprising:
acquiring feature vectors in the model required to be used in at least one round of training after the one round of training in the second memory during one round of training of the model;
caching the feature vectors in the model required to be used in the at least one round of training in the first memory.
6. The method of claim 1, wherein generating at least one identification tensor from the vector identification of the plurality of eigenvectors comprises:
using a generating operator to deduplicate the vector identifications of the plurality of characteristic vectors to obtain the deduplicated vector identifications, determining the number of electronic devices where the characteristic vectors corresponding to the deduplicated vector identifications are located, segmenting the deduplicated vector identifications according to the number to obtain segmented vector identifications, and generating at least two identification tensors according to the segmented vector identifications.
7. The method of claim 1, further comprising:
obtaining the type of hardware resources in the first electronic device required to be used in the process of obtaining the plurality of feature vectors in the model;
and grouping the feature vectors in the model according to the number of the types and a load balancing principle to obtain a plurality of feature vector groups.
8. The method of claim 7, wherein the model comprises a plurality of feature matrices, each feature matrix comprising at least two feature vectors;
the grouping the feature vectors in the model according to the number of the types and the load balancing principle to obtain a plurality of feature vector groups includes:
and dividing the plurality of feature matrixes into the number of feature matrix groups, wherein the difference between the number of feature vectors in each feature matrix group is smaller than a preset difference.
9. The method of claim 7, wherein generating at least one identification tensor from the vector identification of the plurality of eigenvectors comprises:
grouping the vector identifications of the plurality of eigenvectors according to the eigenvector group where the corresponding eigenvector is located to obtain at least two vector identification groups;
and respectively generating an identification tensor according to the vector identification in each vector identification group to obtain at least two identification tensors.
10. The method according to any one of claims 7-9, wherein said obtaining at least one feature tensor in the distributed system from the at least one identification tensor based on at least one obtaining operator comprises:
sequentially calling a plurality of types of hardware resources in the first electronic device according to a first identification tensor of the at least two identification tensors, so as to obtain an eigenvector corresponding to the vector identification in the first identification tensor in an eigenvector group through the called hardware resources;
sequentially calling the hardware resources of the multiple types according to a second identification tensor of the at least two identification tensors so as to obtain an eigenvector corresponding to the vector identification in the second identification tensor in the eigenvector group through the called hardware resources; the second identified tensor comprises an identified tensor of the at least two identified tensors other than the first identified tensor;
in the process of sequentially calling the hardware resources of the multiple types according to the second identification tensor, judging whether the hardware resource of the target type is used or not under the condition that the hardware resource of the target type in the multiple types needs to be called according to the second identification tensor; and under the condition that the hardware resource of the target category is not used, calling the hardware resource of the target category according to the second identification tensor.
11. A method of data processing, the method comprising:
in the process of training a sparse model or in the process of processing data by using a trained sparse model, if a target characteristic vector in a plurality of characteristic vectors in the sparse model needs to be acquired, searching the target characteristic vector in a first memory;
responding to the target feature vector when the target feature vector is found in the first memory; the target feature vector is selected from the plurality of feature vectors and cached in the first memory in advance according to the sequence of the access frequencies of the plurality of feature vectors stored in the second memory from high to low, and the data access rate of the first memory is greater than that of the second memory.
12. A method of data processing, the method comprising:
under the condition of a next round of training of multi-round training of a sparse model, acquiring feature vectors in the sparse model required to be used in the next round of training in a first memory;
performing a next round of training of the multiple rounds of training on the sparse model using the feature vectors; the feature vector is obtained in a second memory and cached in the first memory under the condition of the previous round of training of multiple rounds of training on the sparse model in advance, and the data access rate of the first memory is greater than that of the second memory.
13. An electronic device, characterized in that the electronic device comprises:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to perform the method of any one of claims 1-12.
14. A non-transitory computer readable storage medium having instructions therein which, when executed by a processor of an electronic device, enable the electronic device to perform the method of any of claims 1-12.
CN202111015203.4A 2021-08-31 2021-08-31 Data processing method and device Active CN113448739B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111015203.4A CN113448739B (en) 2021-08-31 2021-08-31 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111015203.4A CN113448739B (en) 2021-08-31 2021-08-31 Data processing method and device

Publications (2)

Publication Number Publication Date
CN113448739A true CN113448739A (en) 2021-09-28
CN113448739B CN113448739B (en) 2022-02-11

Family

ID=77819352

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111015203.4A Active CN113448739B (en) 2021-08-31 2021-08-31 Data processing method and device

Country Status (1)

Country Link
CN (1) CN113448739B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115658146A (en) * 2022-12-14 2023-01-31 成都登临科技有限公司 AI chip, tensor processing method and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169061A (en) * 2017-05-02 2017-09-15 广东工业大学 A kind of text multi-tag sorting technique for merging double information sources
CN111095302A (en) * 2017-09-21 2020-05-01 高通股份有限公司 Compression of sparse deep convolutional network weights
CN111506262A (en) * 2020-03-25 2020-08-07 华为技术有限公司 Storage system, file storage and reading method and terminal equipment
CN112214652A (en) * 2020-10-19 2021-01-12 支付宝(杭州)信息技术有限公司 Message generation method, device and equipment
CN112307352A (en) * 2020-11-26 2021-02-02 腾讯科技(深圳)有限公司 Content recommendation method, system, device and storage medium
CN112800466A (en) * 2021-02-10 2021-05-14 支付宝(杭州)信息技术有限公司 Data processing method and device based on privacy protection and server
WO2021123790A1 (en) * 2019-12-19 2021-06-24 Sita Information Networking Computing Uk Limited Image processing system and method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169061A (en) * 2017-05-02 2017-09-15 广东工业大学 A kind of text multi-tag sorting technique for merging double information sources
CN111095302A (en) * 2017-09-21 2020-05-01 高通股份有限公司 Compression of sparse deep convolutional network weights
WO2021123790A1 (en) * 2019-12-19 2021-06-24 Sita Information Networking Computing Uk Limited Image processing system and method
CN111506262A (en) * 2020-03-25 2020-08-07 华为技术有限公司 Storage system, file storage and reading method and terminal equipment
CN112214652A (en) * 2020-10-19 2021-01-12 支付宝(杭州)信息技术有限公司 Message generation method, device and equipment
CN112307352A (en) * 2020-11-26 2021-02-02 腾讯科技(深圳)有限公司 Content recommendation method, system, device and storage medium
CN112800466A (en) * 2021-02-10 2021-05-14 支付宝(杭州)信息技术有限公司 Data processing method and device based on privacy protection and server

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115658146A (en) * 2022-12-14 2023-01-31 成都登临科技有限公司 AI chip, tensor processing method and electronic equipment

Also Published As

Publication number Publication date
CN113448739B (en) 2022-02-11

Similar Documents

Publication Publication Date Title
US10719260B2 (en) Techniques for storing and retrieving data from a computing device
US20140215170A1 (en) Block Compression in a Key/Value Store
CN110347651B (en) Cloud storage-based data synchronization method, device, equipment and storage medium
CN102857578B (en) A kind of file uploading method of network hard disc, system and net dish client
KR20170123336A (en) File manipulation method and apparatus
US10909086B2 (en) File lookup in a distributed file system
US20170201566A1 (en) File downloading method, apparatus, and terminal device
US9483493B2 (en) Method and system for accessing a distributed file system
US10771358B2 (en) Data acquisition device, data acquisition method and storage medium
US10417192B2 (en) File classification in a distributed file system
WO2023051228A1 (en) Method and apparatus for sample data processing, and device and storage medium
CN106657182B (en) Cloud file processing method and device
CN113448739B (en) Data processing method and device
US20170286439A1 (en) System and method for duplicating files on client device for cloud storage
US11055223B2 (en) Efficient cache warm up based on user requests
CN115396422A (en) Data transmission method and device
CN114553762A (en) Method and device for processing flow table items in flow table
CN111625600B (en) Data storage processing method, system, computer equipment and storage medium
CN112596820A (en) Resource loading method, device, equipment and storage medium
CN113849524B (en) Data processing method and device
CN113296977B (en) Message processing method and device
CN113542422B (en) Data storage method and device, storage medium and electronic device
CN114996307A (en) Federal processing method and device for data
CN115129789A (en) Bucket index storage method, device and medium of distributed object storage system
EP3555767A1 (en) Partial storage of large files in distinct storage systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant