CN110135465B

CN110135465B - Model parameter representation space size estimation method and device and recommendation method

Info

Publication number: CN110135465B
Application number: CN201910325428.6A
Authority: CN
Inventors: 王涌壮; 徐宇辉; 毛志成; 袁镱
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-04-22
Filing date: 2019-04-22
Publication date: 2022-12-09
Anticipated expiration: 2039-04-22
Also published as: CN110135465A

Abstract

The embodiment of the disclosure provides a model parameter representation space size estimation method and device, a recommendation method, a computer readable medium and electronic equipment, and belongs to the technical field of computers. The model parameter representation space size estimation method comprises the following steps: acquiring a target parameter vector of a target model; counting the distance distribution among the target parameter vectors; determining a target threshold value for clustering the target parameter vector according to the distance distribution; and estimating the size of the representation space of the parameter vector of the target model according to the distance between the target threshold and the target parameter vector. The technical scheme of the embodiment of the disclosure provides a parameter space size estimation technology based on clustering, and provides a quantitative standard for quantitative compression of model parameters, so that the process of adjusting the parameters becomes more accurate, efficient and rapid.

Description

Model parameter representation space size estimation method and device and recommendation method

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method for estimating a size of a model parametric representation space, an apparatus for estimating a size of a model parametric representation space, a recommendation method, a computer-readable medium, and an electronic device.

Background

Under a large-scale machine learning scene, the scale of model parameters can reach hundreds of millions, and the size of the model can reach hundreds of G, so that great challenges are brought to storage, transmission and use of the model. Quantization compression of model parameters is one of the mainstream model compression methods, but selection of compression ratio features/quantization bit numbers can only select a suitable value according to experimental results under different hyper-parameters (such as learning rate, training iteration number, number of neural network layers, number of neurons in each layer, activation function, and the like).

In other words, in the related art, the selection of the quantization bit number often depends on experience or selects a more appropriate value by repeatedly adjusting parameters, the parameter adjusting process is complex, the verification period is long, the calculation amount is large, and a technical means and a theoretical basis for quickly searching for an appropriate value are lacked.

Therefore, a new model parametric representation space size estimation method and apparatus, a recommendation method, a computer readable medium and an electronic device are needed.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The embodiment of the disclosure provides a model parameter representation space size estimation method and device, a recommendation method, a computer readable medium and an electronic device, which can provide a quantitative standard for quantitative compression of model parameters, so that the process of adjusting the parameters becomes more accurate, rapid and efficient.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to an aspect of the present disclosure, there is provided a model parametric representation space size estimation method, including: acquiring a target parameter vector of a target model; counting the distance distribution among the target parameter vectors; determining a target threshold value for clustering the target parameter vector according to the distance distribution; and estimating the size of the representation space of the parameter vector of the target model according to the distance between the target threshold and the target parameter vector.

In an exemplary embodiment of the present disclosure, obtaining a target parameter vector of a target model includes: training the target model on a plurality of servers in parallel to enable the target model to reach a stable state; storing a model fragment on each server separately; sampling one model fragment of the plurality of servers, and taking a parameter vector of the sampled model fragment as the target parameter vector.

In an exemplary embodiment of the present disclosure, the counting of distance distribution between target parameter vectors includes: counting a first distribution of distances between the target parameter vectors; clustering the target parameter vectors according to different thresholds; and respectively counting the second distribution of the distances between the target parameter vectors after clustering according to different thresholds.

In an exemplary embodiment of the present disclosure, determining a target threshold for clustering the target parameter vector according to the distance distribution includes: respectively calculating relative entropies of the first distribution and the second distribution under different thresholds to obtain a curve of the relative entropies changing along with the thresholds; and determining the target threshold according to the relative entropy threshold change curve.

In an exemplary embodiment of the present disclosure, determining the target threshold according to the relative entropy threshold variation curve includes: obtaining a first derivative curve of the relative entropy threshold variation curve; obtaining a smoothing stage of the first derivative curve; and selecting a threshold corresponding to the smoothing stage as the target threshold.

In an exemplary embodiment of the present disclosure, estimating a size of a representation space of a parameter vector of the target model according to a distance between the target threshold and a target parameter vector includes: counting the proportion that the distance between the target parameter vectors is smaller than the target threshold value; estimating the size of the representation space according to the proportion.

In an exemplary embodiment of the present disclosure, the representation space size K is estimated according to the following formula:

in the above formula, p is the ratio.

In an exemplary embodiment of the present disclosure, the target model includes an embedding layer and a deep neural network; the method further comprises the following steps: acquiring the length of the embedded layer; and determining the quantization bit number of the parameter vector of the embedded layer according to the length of the embedded layer and the size of the representation space.

According to an aspect of the present disclosure, there is provided a recommendation method including: acquiring current user characteristics and current information characteristics of a target client; processing the current user characteristics and the current information characteristics through a target model based on deep learning to obtain recommendation information to be sent to the target client; wherein, the parameter vector of the target model is quantized and compressed according to the estimated size of the representation space, and the size of the representation space of the parameter vector of the target model is estimated by using the method according to any one of the above embodiments.

According to an aspect of the present disclosure, there is provided a model parametric representation space size estimation apparatus including: a target parameter obtaining module configured to obtain a target parameter vector of a target model; a distance distribution statistical module configured to count distance distribution between the target parameter vectors; a target threshold determination module configured to determine a target threshold for clustering the target parameter vector according to the distance distribution; a space size estimation module configured to estimate a representation space size of a parameter vector of the target model according to a distance between the target threshold and a target parameter vector.

According to an aspect of the present disclosure, there is provided a recommendation device including: the characteristic data acquisition module is configured to acquire the current user characteristic and the current information characteristic of the target client; the recommendation information obtaining module is configured to process the current user characteristics and the current information characteristics through a target model based on deep learning, obtain recommendation information and send the recommendation information to the target client; wherein, the parameter vector of the target model is quantized and compressed according to the estimated size of the representation space, and the size of the representation space of the parameter vector of the target model is estimated by using the method according to any one of the above embodiments.

According to an aspect of the embodiments of the present disclosure, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the model parametric representation space size estimation method as described in the above embodiments.

According to an aspect of the embodiments of the present disclosure, there is provided an electronic device including: one or more processors; a storage device configured to store one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the model parametric representation space size estimation method as described in the above embodiments.

According to an aspect of embodiments of the present disclosure, there is provided a computer-readable medium, on which a computer program is stored, which when executed by a processor implements the recommendation method as described in the above embodiments.

According to an aspect of the embodiments of the present disclosure, there is provided an electronic device including: one or more processors; a storage device configured to store one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the recommendation method as described in the embodiments above.

In the technical scheme provided by some embodiments of the present disclosure, distance distribution among target parameter vectors of a target model is counted, and then a target threshold for clustering the target parameter vectors is determined according to the distance distribution, so that the size of a representation space of the parameter vectors of the target model can be estimated according to the distance between the target threshold and the target parameter vectors.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:

FIG. 1 illustrates a schematic diagram of an exemplary system architecture of a model parametric representation space size estimation method or a model parametric representation space size estimation apparatus to which embodiments of the present disclosure may be applied;

FIG. 2 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device used to implement embodiments of the present disclosure;

FIG. 3 schematically illustrates a flow diagram of a model parametric representation spatial magnitude estimation method according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a basic structural diagram of an object model according to an embodiment of the present disclosure;

FIG. 5 is a diagram illustrating a processing procedure of step S310 shown in FIG. 3 in one embodiment;

FIG. 6 is a schematic diagram showing AUC curves of models for different numbers of quantization bits;

FIG. 7 is a diagram illustrating a processing procedure of step S320 shown in FIG. 3 in one embodiment;

FIG. 8 schematically shows a diagram of P (x) distributions at different thresholds according to an embodiment of the present disclosure;

FIG. 9 is a diagram illustrating a processing procedure of step S330 shown in FIG. 3 in one embodiment;

FIG. 10 is a diagram illustrating a processing procedure of step S332 illustrated in FIG. 9 in one embodiment;

FIG. 11 shows a plot of KL distance versus threshold versus a plot of the first derivative of KL distance versus threshold;

FIG. 12 is a diagram illustrating a processing procedure of step S340 illustrated in FIG. 3 in one embodiment;

FIG. 13 schematically illustrates a flow chart of a model parametric representation spatial magnitude estimation method according to yet another embodiment of the present disclosure;

FIG. 14 schematically shows a flow diagram of a model parametric representation spatial size estimation method according to another embodiment of the present disclosure;

FIG. 15 is a diagram illustrating a processing procedure of step S1430 shown in FIG. 14 in one embodiment;

FIG. 16 schematically shows a flow diagram of a recommendation method according to an embodiment of the present disclosure;

fig. 17 schematically shows a block diagram of a model parametric representation spatial size estimation apparatus according to an embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Fig. 1 illustrates a schematic diagram of an exemplary system architecture 100 to which a model parametric representation space size estimation method or a model parametric representation space size estimation apparatus of an embodiment of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include one or more of

terminal devices

101, 102, 103, a network 104, and a server 105. Network 104 is the medium used to provide communication links between

terminal devices

101, 102, 103 and server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may be various electronic devices having display screens including, but not limited to, smart phones, tablets, portable and desktop computers, digital cinema projectors, and the like.

The server 105 may be a server that provides various services. For example, the user sends a teletext request to the server 105 using the terminal device 103 (or terminal device 101 or 102). The server 105 may retrieve a search result matching the user from the database based on the related information carried in the teletext display request, and feed back the search result, for example, any one or more of corresponding video information, text information, picture information, audio information, commodity information, and the like, to the terminal device 103, so that the user may view the corresponding teletext information based on the content displayed on the terminal device 103.

Also, for example, the terminal device 103 (or the terminal device 101 or 102) may be a smart television, a VR (Virtual Reality)/AR (Augmented Reality) helmet display, or a mobile terminal such as a smart phone, a tablet computer, etc. on which an instant messaging Application (APP) and a video Application (APP) are installed, and the user may send a text display request to the server 105 through the smart television, the VR/AR helmet display, or the instant messaging application and the video APP. Based on the teletext display request, the server 105 may retrieve teletext information matched with the user from the database and return the teletext information to the smart television, the VR/AR head mounted display, or the instant messaging video APP, and then display and/or play the returned teletext information through the smart television, the VR/AR head mounted display, or the instant messaging video APP.

FIG. 2 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device implementing an embodiment of the present disclosure.

It should be noted that the computer system 200 of the electronic device shown in fig. 2 is only an example, and should not bring any limitation to the functions and the application scope of the embodiment of the present disclosure.

As shown in fig. 2, the computer system 200 includes a Central Processing Unit (CPU) 201 that can perform various appropriate actions and processes in accordance with a program stored in a Read-Only Memory (ROM) 202 or a program loaded from a storage section 208 into a Random Access Memory (RAM) 203. In the RAM 203, various programs and data necessary for system operation are also stored. The CPU201, ROM 202, and RAM 203 are connected to each other via a bus 204. An input/output (I/O) interface 205 is also connected to bus 204.

The following components are connected to the I/O interface 205: an input portion 206 including a keyboard, a mouse, and the like; an output section 207 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 208 including a hard disk and the like; and a communication section 209 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 209 performs communication processing via a network such as the internet. A drive 210 is also connected to the I/O interface 205 as needed. A removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 210 as necessary, so that a computer program read out therefrom is mounted into the storage section 208 as necessary.

In particular, the processes described below with reference to the flowcharts may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 209 and/or installed from the removable medium 211. The computer program, when executed by a Central Processing Unit (CPU) 201, performs various functions defined in the methods and/or apparatus of the present application.

It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM) or flash Memory), an optical fiber, a portable compact disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF (Radio Frequency), etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods, apparatus and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules and/or units and/or sub-units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described modules and/or units and/or sub-units may also be disposed in a processor. Wherein the names of such modules and/or units and/or sub-units in some cases do not constitute a limitation on the modules and/or units and/or sub-units themselves.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by one of the electronic devices, cause the electronic device to implement the method as described in the embodiments below. For example, the electronic device may implement the steps shown in fig. 3, or fig. 5, or fig. 7, or fig. 9, or fig. 10, or fig. 12, or fig. 13, or fig. 14, or fig. 15, or fig. 16.

Some terms referred to in the embodiments of the present disclosure are first defined herein.

And (3) quantification: refers to the process of converting a multi-bit high-precision value into a finite number of low-precision values.

Area Under Curve (AUC): and (4) evaluating indexes of the model. If no specific description is given, the index is used in the embodiments of the disclosure to evaluate the model performance, and the higher the AUC is, the better the model performance is.

KL distance (Kullback-Leibler Divergence): also known as relative entropy, represents the expectation of the logarithmic difference between the original distribution P (x) and the approximated distribution Q (x). More generally, the KL distance represents the number of extra bits needed to encode the average of samples from P (x) based on the encoding of Q (x).

The representation space: indicating the number of values that the parameter vector can take.

FIG. 3 schematically shows a flow diagram of a model parametric representation spatial magnitude estimation method according to an embodiment of the present disclosure. The method steps of the embodiment of the present disclosure may be executed by a terminal device, may also be executed by a server, or is executed by the terminal device and the server interactively, for example, the server 105 in fig. 1 may be executed, but the present disclosure is not limited thereto.

As shown in fig. 3, the method for estimating the size of the model parametric representation space provided by the embodiment of the present disclosure may include the following steps.

In step S310, a target parameter vector of the target model is acquired.

In the embodiment of the present disclosure, the target model may be any deep learning model. Taking an application scenario of the recommendation system as an example, the target model may include an embedded layer and a Deep Neural Network (DNN), the DNN may include, for example, a CNN (Convolutional Neural Network) and an RNN (Recurrent Neural Network), and the RNN may include, for example, an LSTM (Long Short-Term Memory), a GRU (gated Recurrent unit), and the like.

In the embodiment of the present disclosure, the target model is trained by using a training data set to reach a stable state of convergence, so as to obtain model parameters of the target model, where the model parameters are expressed by vectors and referred to as parameter vectors of the target model. Generally, in the training process of the target model, a plurality of servers or a server cluster are used for parallel training, and each server stores part of model parameters of the target model. The target parameter vector may be a parameter vector of all embedded layers of the entire target model, that is, the model parameters stored in all servers for training the target model are all used as the target parameter vector, but because the number of the target parameter vectors is large and the calculation amount is large, the target parameter vector may be only a parameter vector of an embedded layer on a part of servers (for example, one server) in the target model.

In step S320, the distance distribution between the target parameter vectors is counted.

In step S330, a target threshold value for clustering the target parameter vector is determined according to the distance distribution.

In step S340, a size of a representation space of a parameter vector of the target model is estimated according to a distance between the target threshold and a target parameter vector.

According to the model parameter representation space size estimation method provided by the embodiment of the disclosure, distance distribution among target parameter vectors of a target model is counted, and then the target parameter vectors are determined to be clustered target threshold values according to the distance distribution, so that the representation space size of the parameter vectors of the target model can be estimated according to the distance between the target threshold values and the target parameter vectors.

The method for estimating the size of the model parametric representation space provided by the embodiment of the present disclosure is illustrated in conjunction with fig. 4 to 15.

Fig. 4 schematically shows a basic structural diagram of an object model according to an embodiment of the present disclosure.

Due to large-scale machine learning, especially large-scale neural networks are applied to various fields of productive life. The expansion of data size and the highly sparse nature of data lead to a dramatic increase in the size of neural network models. Due to the high-dimensional sparsity of data characteristics, the scale of an embedding layer expands, the size of a model can reach hundreds of G levels, and the requirements of actual storage and communication are difficult to adapt. Due to the redundancy of the parameters of the embedded layer, the overhead of model storage and transmission can be greatly reduced in a parameter quantification mode.

Here, a large-scale recommendation system scenario such as browser information and video is taken as an example, and a billion-level parameter distributed machine learning system is adopted, and a basic model structure of the system is shown in fig. 4. The target model provided by the embodiment of the disclosure may include an Embedding layer (Embedding layer) and a DNN, wherein the input features are expressed as high-dimensional sparse characteristics, the features are mapped to low-dimensional dense features through the Embedding layer, and the output of the Embedding layer is used as the input of the DNN and is calculated and output. Due to the high dimensional sparsity of the input data, the scale of the embedding layer is huge and quantization compression operation is needed.

In the embodiment of fig. 4, for the recommendation system, the input features may include, for example, user features, such as feature data of the user's age, sex, internet behavior, location, and preference, and also include information features, such as information features of article type (hourly, entertainment, sports, etc.), publishing time of an article or video or audio, and teletext information of an author of the article.

In the embodiment of the present disclosure, quantization compression is mainly applied to the parameters of the embedding layer of the target model, and the parameters of DNN are not quantized, because the embedding layer occupies most of the size of the target model, the size of the embedding layer is often hundreds of times of DNN, for example, for a large-scale online recommendation scenario, the feature dimension input to the embedding layer can reach billion level, here, taking one billion as an example, it is assumed that each feature is mapped into a slot with a length of 8bits through the embedding layer according to the domain (field) to which the feature belongs, that is, the embedding layer maps a feature with a sparse high dimension into a feature with a low dimension (here, 8 dimensions), this mapped feature is called a slot, that is, each feature corresponds to one slot, the length of each slot is 8bits, and the parameters of the embedding layer are represented by a vector with a length of 8 bits. The size of the embedded layer is close to 300GB (10) ¹⁰ X 8 × 32 bit). While the subsequent neural network DNN is much smaller, assuming that the used data has 37 fields, taking the example of the neural network with hidden layer (fully connected layer) sizes of 512, 256, 128 and 32, respectively, the size of the entire DNN is only 1.2MB (37 × 8 × 512+512 × 256+256 × 128+128 × 32+32 × 1) × 32bit, so compressing it cannot effectively improve the compression ratio and can reduce the model performance.

The method for estimating the size of the model parameter representation space provided by the embodiment of the disclosure estimates the size of the representation space of the model parameter vector mainly by counting the parameter distribution of the stable model, and achieves the purpose of reducing or eliminating the precision loss caused by the quantization operation by selecting the combination of the quantization bit number and the length of the embedding layer which can meet the size of the representation space.

Fig. 5 is a schematic diagram illustrating a processing procedure of step S310 shown in fig. 3 in an embodiment.

In step S311, the target model is trained in parallel on a plurality of servers so that the target model reaches a steady state.

In step S312, a model fragment is stored separately on each server.

In step S313, one of the model slices in the plurality of servers is sampled, and a parameter vector of the sampled model slice is used as the target parameter vector.

Specifically, the parameter vector of the model fragment stored in any one server may be randomly selected as the target parameter vector.

Fig. 6 schematically shows a schematic diagram of model AUC curves for different numbers of quantization bits.

As shown in fig. 6, taking a graph-text recommendation scenario as an example, AUC curves of models with different quantization bit numbers such as binary quantization, ternary quantization, and quaternary quantization and with unquantized (full-precision model result) for the embedded layer parameter are shown, where the ordinate represents AUC, and the abscissa is time and the unit is hour.

For example, binary quantization herein means that each bit of the embedding layer parameter takes a value of 0 or 1; ternary quantization means that each bit of the embedded layer parameter takes the value of-1, or 0, or 1; the four-valued quantization means that each bit of the embedding layer parameter takes a value of-1, or-0.5, or 1, but the disclosure is not limited thereto, and the four-valued quantization may be set according to a specific application scenario.

Comparing the AUC results of the binary quantization and the ternary quantization in fig. 6, the binary quantization has about two thousandth of AUC loss with respect to the ternary quantization, which proves that the representation space size of the binary quantization is not enough to fully represent the embedded layer original quantization space. And three-valued quantization has no performance loss relative to four-valued quantization, which means that the parameter vector space after three-valued quantization can meet the requirement of the parameter vector representing space size.

The method for estimating the size of the model parameter representation space provided by the embodiment of the disclosure is based on the idea of clustering, and firstly, a target threshold value of clustering is determined by observing the threshold value size which enables the KL distance (relative entropy) to change rapidly. The size of the representation space generated by the clustering result can provide effective theoretical reference for determining the quantization bit number.

Or using the context to recommend the scene asFor example, it is assumed that each feature is mapped into a slot with a length of 8 according to the field (field), but the present disclosure is not limited thereto, and the dimension of the low-dimensional dense feature may be set according to specific requirements, so that the size of the representation space of the parameter vector after three-valued quantization is 3 ⁸ =6561. The method for estimating the size of the model parameter representation space provided by the embodiment of the disclosure theoretically proves that the size can meet the requirement of the model parameter.

In the disclosed embodiment, it is assumed that the parameter vectors are uniformly distributed in space. First, the theoretical basis of the assumption for the spatial size prediction of the parameters is explained. On this basis, the spatial size of the parametric representation can be predicted in the actual case.

Assuming that the embedded layer parameters of the target model have N (N is a positive integer greater than or equal to 1) target parameter vectors, randomly selecting any one target parameter vector as a first cluster center, and assuming that the first cluster center can gather x (x is a positive integer greater than or equal to 1) target parameter vectors, then assuming that the ratio of each cluster center that can gather from the remaining target parameter vectors in the space is

Assuming that the clustering centers at any time are the same when aggregating the proportions of the target parameter vectors from the rest of the target parameter vectors, and are all p, at time t, the number of the target parameter vectors that can be clustered by the t-th clustering center is set as b _t The number of the remaining target parameter vectors is a _t Then its expression may be:

assume that the clustering termination condition is:

a _t ≤x (3)

the expected number of cluster centers K can be expressed as:

fig. 7 is a schematic diagram illustrating a processing procedure of step S320 shown in fig. 3 in an embodiment.

In step S321, a first distribution of distances between target parameter vectors is counted.

The distance between the target parameter vectors may be the Euclidean distance between any two target parameter vectors, assuming that the first target parameter vector is represented as (x) ₁₁ ，x ₁₂ ，...，x _1n ) The second target parameter vector is denoted as (x) ₂₁ ，x ₂₂ ，...，x _2n ) The distance d between the two target parameter vectors ₁₂ The following formula can be used for calculation:

in the above formula (5), k represents the dimension of the target parameter vector, i.e. the length of the slot, for example, k =8.

The first distribution is obtained by counting the euclidean distances between the target parameter vectors, that is, the euclidean distances between any two target parameter vectors are calculated respectively, and then the occurrence probabilities of the euclidean distances are counted, for example, the number of 1 euclidean distance is divided by the total number of euclidean distances, so as to obtain the occurrence probability of 1 euclidean distance, and so on.

In step S322, the target parameter vectors are clustered according to different thresholds.

In the embodiment of the present disclosure, a threshold (assumed as a first threshold) is taken, the target parameter vectors are clustered, and when the euclidean distance between two target parameter vectors is smaller than the first threshold, the two target parameter vectors are clustered into one class. For example, at any time, one target parameter vector is randomly selected from the remaining target parameter vectors as a clustering center, and then the euclidean distances between the other remaining target parameter vectors and the clustering center are respectively calculated.

And taking another threshold (assumed as a second threshold), clustering the target parameter vectors, and clustering the target parameter vectors into a class when the Euclidean distance between the two target parameter vectors is smaller than the second threshold. For example, at any time, one target parameter vector is randomly selected from the remaining target parameter vectors as a clustering center, then the euclidean distances between the other remaining target parameter vectors and the clustering center are respectively calculated, and if the euclidean distance between one target parameter vector and the clustering center is smaller than the second threshold, the target parameter vector and the clustering center are clustered into one class.

The clustering process is repeated for the target parameter vector by selecting different thresholds.

In step S323, second distributions of distances between the target parameter vectors clustered according to different thresholds are respectively counted.

In the embodiment of the present disclosure, the second distributions P (x) of the distances between the target parameter vectors clustered according to different thresholds are respectively counted.

For example, for the target parameter vectors formed after clustering according to the first threshold, euclidean distances between any two target parameter vectors are respectively calculated, and then the occurrence probability of each euclidean distance is counted to obtain a second distribution P1 (x) of the distances between the target parameter vectors clustered according to the first threshold. For another example, for the target parameter vectors formed after clustering according to the second threshold, euclidean distances between any two target parameter vectors are respectively calculated, and then the occurrence probability of each euclidean distance is counted to obtain a second distribution P2 (x) of the distances between the target parameter vectors clustered according to the second threshold. The statistical process of the second distribution of the distances between the target parameter vectors clustered according to other thresholds is similar.

Fig. 8 schematically shows a schematic diagram of P (x) distributions at different thresholds according to an embodiment of the present disclosure.

As shown in fig. 8, the ordinate is P (x), and the abscissa is the euclidean distance. Where the non-clusters correspond to Q (x), and the other two lines represent P (x) with thresholds of 0.002 and 0.003, respectively.

Fig. 9 is a schematic diagram illustrating a processing procedure of step S330 shown in fig. 3 in an embodiment.

In step S331, relative entropies of the first distribution and the second distribution under different thresholds are respectively calculated, and a curve of variation of the relative entropies with the thresholds is obtained.

In the embodiment of the present disclosure, the KL distance may be calculated according to the following formula:

in the above formula (6), i is a value that can be obtained by "euclidean distance". For example, assuming that the sampling interval is 0.001 and the euclidean distance range is 0 to 0.02, i =0,0.001,0.002, \82300.019, 0.02, (the values are merely used for illustration, the present disclosure is not limited thereto, the actual sampling interval may be smaller, and the euclidean distance range may be larger).

Continuously changing the threshold value, repeatedly carrying out the processes of counting Q (x) and P (x) and calculating the KL distance, and drawing the change of the KL distance under different threshold values until the change range of the KL distance along with the threshold value is sufficiently obvious.

In step S332, the target threshold is determined according to the relative entropy threshold variation curve.

Here, a threshold value in which the KL distance variation is small in magnitude may be selected as the target threshold value used for clustering.

Fig. 10 is a schematic diagram illustrating a processing procedure of step S332 illustrated in fig. 9 in an embodiment.

In step S3321, a first derivative curve of the relative entropy threshold variation curve is obtained.

In step S3322, a smoothing stage of the first derivative curve is obtained.

In step S3323, the threshold corresponding to the smoothing stage is selected as the target threshold.

FIG. 11 shows a KL distance versus threshold plot versus a first derivative of KL distance versus threshold plot.

As shown in fig. 11, the vertical axis on the left side is KL distance, the vertical axis on the right side is the first derivative of KL distance, the horizontal axis is threshold, and the arrow points to the target threshold of 0.002. The curve of the KL distance with the threshold is shown by the dotted line in fig. 11. The first derivative of the curve of the KL distance along with the threshold value (the change of the KL distance at the moment relative to the KL distance at the last moment) is observed, the change is gentle when the threshold value is smaller, and the vibration amplitude becomes larger as the threshold value becomes larger.

The physical meaning of the KL distance represents the number of extra bits required to encode the sample average from P based on the encoding of Q. Therefore, in the embodiment of the present disclosure, when the variation is small, it can be considered that the quantization scheme can better represent the original parameter space. Otherwise, the parameters of the clustering under the threshold are considered to be insufficient to fully represent the original parameter space, and more bit numbers are required to be additionally added to meet the requirement of coding. The threshold of the smoothing stage, as shown by the solid line in fig. 11, is a value used for clustering, and in this experiment, 0.002, that is, the threshold at the end position of the smoothing stage, is taken as the target threshold.

Because when the change is small (gentle), it indicates that under the current clustering scheme, the distance distribution after clustering and the distance before clustering are close, at this time, the threshold value can be further increased; when the threshold value is further increased to cause the solid line to jump, the parameter of the threshold value clustering is proved to be insufficient to fully represent the original parameter space, the expression capability is insufficient, and the performance of the model is reduced. The threshold at the inflection point is the selected target threshold. Overall, the threshold is too small, the clustering effect is not obvious, and the compression effect is not obvious; if the threshold is too large, the parametric representation capability is insufficient, and the model performance is reduced.

Fig. 12 is a schematic diagram illustrating a processing procedure of step S340 illustrated in fig. 3 in an embodiment.

In step S341, the proportion that the distance between the target parameter vectors is smaller than the target threshold is counted.

In the embodiment of the present disclosure, a ratio that the euclidean distance between the target parameter vectors before being clustered is smaller than the target threshold is counted, that is, the euclidean distance between the target parameter vectors before being clustered is smaller than the number of the target threshold, and the ratio is obtained by dividing the total number of the euclidean distances between the target parameter vectors before being clustered by the total number of the euclidean distances between the target parameter vectors before being clustered, where the ratio is the ratio p in the above equations (1) and (4).

In step S342, the size of the representation space is estimated from the ratio.

In the embodiment of the present disclosure, referring to the above formula (4), the expression space size K may be estimated according to the following formula:

in the above formula (7), p is the ratio.

FIG. 13 schematically shows a flow chart of a model parametric representation spatial size estimation method according to a further embodiment of the present disclosure. In embodiments of the present disclosure, the target model may include an embedding layer and a deep neural network.

As shown in fig. 13, compared with other embodiments described above, the method for estimating the spatial size of model parametric representation provided in the embodiment of the present invention is different in that the following steps may be further included.

In step S1310, the length of the embedding layer is acquired.

For example, the embedding layer length refers to the slot having the length of 8.

In step S1320, the number of quantization bits of the parameter vector of the embedding layer is determined according to the length of the embedding layer and the size of the representation space.

FIG. 14 schematically shows a flow chart of a model parametric representation spatial size estimation method according to another embodiment of the present disclosure.

In step S1410, a converged full-precision target model is trained.

The full-precision target model is the target model before the model parameters obtained after the training reaches the stable state are not subjected to quantization compression.

In step S1420, the distance distribution Q (x) between the parameter vectors of the object model is counted.

In step S1430, the parameter vectors are clustered according to different thresholds, and the distance distribution P (x) between the clustered parameter vectors is counted.

In step S1440, a KL distance-to-threshold variation curve is calculated, and a threshold with a small variation is selected as the target threshold.

In step S1450, the proportion of the distance between the statistical parameter vectors that is less than the target threshold is counted.

In step S1460, the size of the space represented by the parameter vector is estimated using the scale.

In step S1470, an appropriate quantization scheme is selected based on the estimated size of the representation space.

Fig. 15 is a schematic diagram illustrating a processing procedure of step S1430 shown in fig. 14 in an embodiment. The process in which the inter-parameter-vector distance distribution P (x) (shown by the red box in FIG. 4) is statistically clustered is shown in FIG. 5.

In step S1431, a threshold value is initialized.

In step S1432, the parameter vectors are clustered according to the current threshold.

In step S1433, the distance distribution Pi (x) between the clustered parameter vectors is counted.

Wherein i is a positive integer greater than or equal to 1 and less than or equal to z, z represents the number of different thresholds, and z is a positive integer greater than 1.

In step S1434, it is determined whether or not to end; if not, entering the next threshold value and jumping back to the step S1432; otherwise, ending.

The number N of model parameter vectors in an actual case can reach hundreds of megabytes, and the number of Euclidean distances among all parameter vectors reaches

Complete statistics of the

The Euclidean distance distribution will consume a lot of time and effort. According to the experimental result, only the Euclidean distances among the m (m < N, m is a positive integer greater than or equal to 1) parameter vectors need to be counted, and the number x' of the Euclidean distances smaller than the target threshold (namely, the number of the Euclidean distances among the m parameter vectors smaller than the determined target threshold) is counted, so that the target threshold is calculated

I.e., as an estimate of p, so that the calculation can be simplified.

The m parameter vectors may be randomly selected from the N parameter vectors, for example, where m ≈ 0.1% xn is selected. The results obtained with a larger m are almost indistinguishable from this.

For example, also taking a context or an information stream recommendation system scenario as an example, the process of estimating the size of the parameter space is as follows:

training the target model in parallel on a plurality of servers by using a training data set (such as browser user graphic recommendation historical data), and assuming that the number of obtained embedded layer parameter vectors is N =1.38 × 10 when the training is converged ⁸ . The model on each server separately stores one model fragment, that is, a partial embedded layer parameter vector of the target model, and for the purpose of simplifying the calculation, one model fragment may be randomly sampled, assuming that the number of embedded layer parameter vectors included therein is m =124192, and the m embedded layer parameter vectors are used as target parameter vectors for estimating the target threshold.

Based on the m target parameter vectors, statistics is performed on changes of KL distances under different thresholds, and the assumed result is shown in fig. 11. And selecting a target threshold t =0.002 of the cluster.

7.71 × 10 at all M =0.5 × (M-1) × M =0.5 × (124192-1) × 124192 ≈ 7.71 × 10 ⁹ The Euclidean distance between target parameter vectors has about x' =9 × 10 ⁶ If the Euclidean distance is less than the target threshold value, then

According to the above equation (7), the size K of the expression space required by the parameter is:

since 256=2 ⁸ ＜5783＜3 ⁸ =6561, so in this example three-valued quantization can meet parameter space size requirements.

According to the model parameter representation space size estimation method provided by the embodiment of the disclosure, KL distances are used for estimating extra bit numbers required by cluster coding under different thresholds, and a proper quantization bit number is estimated through a clustering method, so that a parameter adjusting process is more automatic, rapid and stable and has a basis.

Fig. 16 schematically shows a flowchart of a recommendation method according to yet another embodiment of the present disclosure. The method steps of the embodiment of the present disclosure may be executed by a terminal device, or may also be executed by a server, or may be executed by the terminal device and the server interactively, for example, the method may be executed by the

terminal device

101, or 102, or 103 in fig. 1, but the present disclosure is not limited thereto.

As shown in fig. 16, the recommendation method provided by the embodiment of the present disclosure may include the following steps.

In step S1610, the current user characteristics and current information characteristics of the target client are acquired.

For example, a current user logs in a target client (e.g., a browser, a web page, or an application) through a user name of the current user, the background server may obtain current user characteristics of the current user corresponding to the target client, for example, data such as a name, an age, a height, a weight, a mobile phone number, and a user name of the current user, and may also obtain data such as a name of an article, publication time of the article, an author of the article, keywords of the article, a type of the article, a category of a video, a name of an audio, and content of a picture of the current information characteristics, and specific data content included in the current user characteristics and the current information characteristics may be determined according to a specific application scenario, which is not limited by this disclosure.

In step S1620, the current user characteristic and the current information characteristic are processed by a target model based on deep learning, and recommendation information is obtained to be sent to the target client.

The parameter vector of the target model is quantized and compressed according to the estimated size of the representation space, and the size of the representation space of the parameter vector of the target model is estimated by using the model parameter representation space size estimation method according to any one of the embodiments.

In the embodiment of the disclosure, the target model based on deep learning is trained by using a training data set in advance, the training data set may include historical user characteristics, historical information characteristics and corresponding labeled historical recommendation information, the target model may include an embedded layer and a deep neural network, and a parameter vector of the target model may be obtained after training. Then, the model parameter representation space size estimation method described in any of the above embodiments may be used to estimate the representation space size K required by the parameter, so as to determine the quantization bit number for performing quantization compression on the embedded layer parameter vector, compress the embedded layer parameter of the target model according to the quantization bit number, and then store the model parameter of the deep neural network and the compressed embedded layer parameter.

When the information flow or the image-text personalized recommendation needs to be carried out on line, the current user characteristics and the current information characteristics are input into a trained deep learning-based recommendation model, and the server can send personalized recommendation information such as recommendation information or any recommendation content such as recommendation audio and video for the current user according to the current user characteristics and the current information characteristics.

Fig. 17 schematically shows a block diagram of a model parametric representation space size estimation apparatus according to an embodiment of the present disclosure. The model parameter representation space size estimation apparatus according to the embodiment of the present disclosure may be provided on a terminal device, may also be provided on a server, or may be provided partially on a terminal device and partially on a server, for example, may be provided on the server 105 in fig. 1, but the present disclosure is not limited thereto.

As shown in fig. 17, the apparatus 1700 for estimating a spatial size of model parameter representation provided in the embodiment of the present disclosure may include a target parameter obtaining module 1710, a distance distribution statistics module 1720, a target threshold determining module 1730, and a spatial size estimating module 1740.

The target parameter obtaining module 1710 may be configured to obtain a target parameter vector of a target model. The distance distribution statistics module 1720 may be configured to count distance distributions between target parameter vectors. The target threshold determination module 1730 may be configured to determine a target threshold for clustering the target parameter vectors according to the distance distribution. The spatial size estimation module 1740 may be configured to estimate a representation spatial size of a parameter vector of the object model based on a distance between the object threshold and an object parameter vector.

In an exemplary embodiment, the target parameter obtaining module 1710 may include: a model training unit which can be configured to train the target model on a plurality of servers in parallel so as to enable the target model to reach a stable state; a parameter storage unit configured to store a model fragment on each server individually; the parameter sampling unit may be configured to sample one of the model fragments in the plurality of servers, and use a parameter vector of the sampled model fragment as the target parameter vector.

In an exemplary embodiment, the distance distribution statistics module 1720 may include: a first distribution statistical unit configured to count a first distribution of distances between the target parameter vectors; the parameter clustering unit can be configured to cluster the target parameter vectors according to different thresholds; the second distribution statistical unit may be configured to separately count second distributions of distances between the target parameter vectors clustered according to different thresholds.

In an exemplary embodiment, the target threshold determination module 1730 may include: a relative entropy change obtaining unit, which may be configured to calculate relative entropies of the first distribution and the second distribution under different thresholds, respectively, and obtain a relative entropy change curve with the threshold; a target threshold determination unit, which can be configured to determine the target threshold according to the relative entropy threshold variation curve.

In an exemplary embodiment, the target threshold determining unit may include: a relative entropy change degree obtaining subunit, which may be configured to obtain a first derivative curve of the relative entropy threshold change curve; a smoothing phase obtaining subunit which may be configured to obtain a smoothing phase of the first derivative curve; and the target threshold selecting subunit may be configured to select a threshold corresponding to the smoothing stage as the target threshold.

In an exemplary embodiment, the spatial size estimation module 1740 may include: a proportion obtaining unit configured to count a proportion that a distance between the target parameter vectors is smaller than the target threshold; a spatial size estimation unit may be configured to estimate the representation spatial size from the ratio.

In an exemplary embodiment, the representation space size K is estimated according to the following formula:

in the above formula, p is the ratio.

In an exemplary embodiment, the target model may include an embedding layer and a deep neural network. The model parametric representation spatial magnitude estimation apparatus 1700 may further include: an embedded layer length acquisition module configurable to acquire a length of the embedded layer; a quantization bit number determination module may be configured to determine a quantization bit number of a parameter vector of the embedded layer according to the length of the embedded layer and the size of the representation space.

The specific implementation of each module and/or unit and/or subunit in the model parameter representation space size estimation apparatus provided in the embodiment of the present disclosure may refer to the content in the model parameter representation space size estimation method, and is not described herein again.

Further, the embodiment of the present disclosure may also provide a recommendation apparatus, which may include a feature data obtaining module and a recommendation information obtaining module.

The feature data acquisition module may be configured to acquire a current user feature and a current information feature of the target client. The recommendation information obtaining module may be configured to process the current user characteristics and the current information characteristics through a target model based on deep learning, and obtain recommendation information to send to the target client. The parameter vector of the target model is quantized and compressed according to the estimated size of the representation space, and the size of the representation space of the parameter vector of the target model is estimated by using the model parameter representation space size estimation method according to any of the embodiments.

The specific implementation of each module and the estimation of the size of the representation space in the recommendation device provided in the embodiment of the present disclosure may refer to the content in the method and the device for estimating the size of the representation space of the model parameter, which is not described herein again.

It should be noted that although in the above detailed description several modules or units or sub-units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units or sub-units described above may be embodied in one module or unit or sub-unit, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit or sub-unit described above may be further divided into embodiments by a plurality of modules or units or sub-units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for estimating the size of a model parametric representation space comprises the following steps:

acquiring a target parameter vector of a target model;

counting the distance distribution among the target parameter vectors;

determining a target threshold value for clustering the target parameter vector according to the distance distribution;

estimating the size of a representation space of a parameter vector of the target model according to the distance between the target threshold and the target parameter vector;

the obtaining of the target parameter vector of the target model includes:

training the target model on a plurality of servers in parallel to enable the target model to reach a stable state;

storing part of model parameters of the target model on each server;

and sampling model parameters on all servers or part of the servers in the plurality of servers as the target parameter vector.

2. The method of claim 1, wherein counting a distance distribution between target parameter vectors comprises:

counting a first distribution of distances between the target parameter vectors;

clustering the target parameter vectors according to different thresholds;

and respectively counting the second distribution of the distances between the target parameter vectors after clustering according to different thresholds.

3. The method of claim 2, wherein determining the target threshold for clustering the target parameter vectors according to the distance distribution comprises:

respectively calculating relative entropies of the first distribution and the second distribution under different thresholds to obtain a curve of the relative entropies changing along with the thresholds;

and determining the target threshold according to the relative entropy threshold change curve.

4. The method of claim 3, wherein determining the target threshold from the relative entropy threshold variation curve comprises:

obtaining a first derivative curve of the relative entropy threshold variation curve;

obtaining a smoothing stage of the first derivative curve;

and selecting a threshold corresponding to the smoothing stage as the target threshold.

5. The method of claim 1, wherein estimating a representation space size of a parameter vector of the target model based on a distance between the target threshold and a target parameter vector comprises:

counting the proportion that the distance between the target parameter vectors is smaller than the target threshold;

estimating the representation space size according to the proportion.

6. The method of claim 5, wherein the representation space size K is estimated according to the following formula:

in the above formula, p is the ratio.

7. The method of claim 1, wherein the target model comprises an embedding layer and a deep neural network; the method further comprises the following steps:

acquiring the length of the embedded layer;

and determining the quantization bit number of the parameter vector of the embedded layer according to the length of the embedded layer and the size of the representation space.

8. A recommendation method, comprising:

acquiring current user characteristics and current information characteristics of a target client;

processing the current user characteristics and the current information characteristics through a target model based on deep learning to obtain recommendation information to be sent to the target client;

wherein the parameter vector of the target model is quantized and compressed according to the estimated size of the representation space, and the size of the representation space of the parameter vector of the target model is estimated by the method according to any one of claims 1 to 7.

9. An apparatus for estimating a size of a parametric representation space of a model, comprising:

a target parameter obtaining module configured to obtain a target parameter vector of a target model;

a distance distribution statistical module configured to count distance distribution between the target parameter vectors;

a target threshold determination module configured to determine a target threshold for clustering the target parameter vector according to the distance distribution;

a space size estimation module configured to estimate a representation space size of a parameter vector of the target model according to a distance between the target threshold and a target parameter vector;

wherein, the target parameter acquisition module comprises:

a model training unit configured to train the target model in parallel on a plurality of servers so that the target model reaches a steady state;

a parameter storage unit configured to store a part of model parameters of the target model on each server;

and the parameter sampling unit is configured to sample the model parameters on all the servers or part of the model parameters on part of the servers as the target parameter vector.

10. The apparatus of claim 9, wherein the distance distribution statistics module comprises:

a first distribution statistical unit configured to count a first distribution of distances between the target parameter vectors;

the parameter clustering unit is configured to cluster the target parameter vectors according to different thresholds respectively;

and the second distribution statistical unit is configured to respectively count second distribution of distances between the target parameter vectors after clustering according to different thresholds.

11. The apparatus of claim 10, wherein the target threshold determination module comprises:

a relative entropy change obtaining unit configured to calculate relative entropies of the first distribution and the second distribution under different thresholds respectively, and obtain a curve of relative entropy changing with the threshold;

a target threshold determination unit configured to determine the target threshold according to the relative entropy threshold variation curve.

12. The apparatus of claim 11, wherein the target threshold determination unit comprises:

a relative entropy change degree obtaining subunit configured to obtain a first derivative curve of the relative entropy threshold change curve;

a smoothing stage obtaining subunit configured to obtain a smoothing stage of the first derivative curve;

and the target threshold selecting subunit is configured to select a threshold corresponding to the smoothing stage as the target threshold.

13. The apparatus of claim 9, wherein the spatial size estimation module comprises:

the proportion obtaining unit is configured to count the proportion that the distance between the target parameter vectors is smaller than the target threshold value;

a spatial size estimation unit configured to estimate the representation spatial size according to the ratio.

14. The apparatus of claim 13, wherein the representation space size K is estimated according to the following formula:

in the above formula, p is the ratio.

15. The apparatus of claim 9, wherein the target model comprises an embedding layer and a deep neural network; the device further comprises:

an embedded layer length acquisition module configured to acquire a length of the embedded layer;

and the quantization bit number determining module is configured to determine the quantization bit number of the parameter vector of the embedded layer according to the length of the embedded layer and the size of the representation space.

16. A recommendation device, comprising:

the characteristic data acquisition module is configured to acquire the current user characteristic and the current information characteristic of the target client;

the recommendation information obtaining module is configured to process the current user characteristics and the current information characteristics through a target model based on deep learning, obtain recommendation information and send the recommendation information to the target client;

wherein the parameter vector of the target model is quantization-compressed according to the estimated size of the representation space, and the size of the representation space of the parameter vector of the target model is estimated by the model parametric representation space size estimation apparatus according to any one of claims 9 to 15.

17. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out a method of model parametric representation spatial size estimation according to one of the claims 1 to 7.

18. An electronic device, comprising:

one or more processors;

a storage device configured to store one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the model parametric representation space size estimation method of any of claims 1 to 7.