CN111950015A - Data open output method and device and computing equipment - Google Patents

Data open output method and device and computing equipment Download PDF

Info

Publication number
CN111950015A
CN111950015A CN201910398372.7A CN201910398372A CN111950015A CN 111950015 A CN111950015 A CN 111950015A CN 201910398372 A CN201910398372 A CN 201910398372A CN 111950015 A CN111950015 A CN 111950015A
Authority
CN
China
Prior art keywords
matrix
data
output
sparse matrix
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910398372.7A
Other languages
Chinese (zh)
Other versions
CN111950015B (en
Inventor
刘勇江
李想
卢健
殷尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tendcloud Tianxia Technology Co ltd
Original Assignee
Beijing Tendcloud Tianxia Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tendcloud Tianxia Technology Co ltd filed Critical Beijing Tendcloud Tianxia Technology Co ltd
Priority to CN201910398372.7A priority Critical patent/CN111950015B/en
Publication of CN111950015A publication Critical patent/CN111950015A/en
Application granted granted Critical
Publication of CN111950015B publication Critical patent/CN111950015B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Medical Informatics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data open output method, which is executed in computing equipment, wherein the computing equipment stores a first M multiplied by N sparse matrix, an embedded model is trained according to the matrix, the model is suitable for decomposing an input sparse matrix into at least three sub-matrices and reconstructing two of the sub-matrices into an output matrix, and the method comprises the following steps: acquiring L user identifications of data to be output, extracting original data of N user characteristics corresponding to each user identification, and generating a second sparse matrix L multiplied by N; splicing the second sparse matrix into the first sparse matrix to obtain a third sparse matrix; updating the first sparse matrix into a third sparse matrix, and retraining the embedded model by adopting the third sparse matrix to obtain a new embedded model; and learning the second sparse matrix by adopting the new embedded model to obtain a corresponding output matrix, and openly outputting the output matrix. The invention also discloses a corresponding data open output device and a computing device.

Description

Data open output method and device and computing equipment
Technical Field
The invention relates to the technical field of internet, in particular to a data open output method, a data open output device and computing equipment.
Background
With the rapid development of internet technology and the increasing popularity of electronic commerce, the number of network information resources is increasing dramatically, and information providers always want to be able to obtain more user data in order to recommend commodity resources that may be of interest to users according to their preferences. When the existing data provider outputs data to a data consumer (information provider), most of the data provider directly outputs original data in a plaintext manner, which causes the problem of privacy disclosure of users. If the original data is processed into data such as labels and the like for output, a lot of user information can be lost, the data user correspondingly recommends the most appropriate commodity resources to the client according to the data, the recommendation effect can be greatly reduced, and the floor conversion rate of the product is influenced.
Therefore, it is desirable to provide a data output method that can better protect the privacy of the user and output more user information.
Disclosure of Invention
To this end, the present invention provides a data open output method, apparatus and computing device in an attempt to solve or at least alleviate the above-existing problems.
According to an aspect of the present invention, there is provided a data open output method, adapted to be executed in a computing device, wherein the computing device stores a first M × N sparse matrix, and trains an embedded model for data open output according to the matrix, the embedded model is adapted to decompose an input sparse matrix into at least three sub-matrices and reconstruct two of the sub-matrices into an output matrix, the method includes: acquiring L user identifications of data to be output, extracting original data of N user characteristics corresponding to each user identification, and generating a second sparse matrix L multiplied by N; splicing the second sparse matrix into the first sparse matrix to obtain a third sparse matrix; updating the stored first sparse matrix into a third sparse matrix, and retraining the embedded model by adopting the third sparse matrix to obtain a new embedded model; and learning the second sparse matrix by adopting the new embedded model to obtain an output matrix of the second sparse matrix, and openly outputting the output matrix so that a data user can predict the user behavior according to the output matrix.
Optionally, in the method according to the present invention, a prediction model is trained in the data consumer, and the prediction model outputs a corresponding user behavior prediction result according to the output matrix.
Optionally, in a method according to the invention, the data consumer trains the predictive model by obtaining output matrices and corresponding user behavior features for a plurality of users from a computing device.
Optionally, in the method according to the invention, the embedding model is a singular value decomposition model adapted to decompose an input sparse matrix into a first unitary matrix U, a singular value matrix S and a second unitary matrix V, and reconstruct both sub-matrices U and S into an output matrix.
Optionally, in the method according to the present invention, the step of reconstructing the U and S sub-matrices as the output matrix includes: and intercepting the front K columns of the U sub-matrix to obtain a U intercepting matrix, intercepting the front K rows and the front K columns of the S sub-matrix to obtain an S intercepting matrix, and multiplying the U intercepting matrix and the S intercepting matrix to obtain an output matrix.
Optionally, in the method according to the invention, the embedded model is a word vector model or a variational self-encoder model.
Optionally, in the method according to the present invention, the user characteristics include at least one of an installation application, a geographical location preference, device information, and device networking information.
Optionally, in the method according to the present invention, a unique enterprise identifier corresponding to each user identifier is further stored in the computing device, and the original data of the user characteristics is stored with the unique enterprise identifier as an index.
Optionally, in the method according to the present invention, the step of extracting raw data of N user features corresponding to each user identifier includes: and retrieving the unique enterprise identification corresponding to each user identification, and taking the unique enterprise identification as an index to extract corresponding original data.
Optionally, in the method according to the present invention, wherein, for each data user, a corresponding first sparse matrix and an embedded model trained according to the matrix are stored in the computing device; the step of obtaining L user identifications of data to be output comprises the following steps: receiving the L user identifications sent by a certain data user; the step of stitching the second sparse matrix into the first sparse matrix comprises: splicing the second sparse matrix into the first sparse matrix corresponding to the data user to obtain a corresponding third sparse matrix; the step of updating the stored first sparse matrix to the third sparse matrix comprises: and updating the stored first sparse matrix of the data user into a third sparse matrix.
According to another aspect of the present invention, there is provided a data open output apparatus adapted to reside in a computing device, wherein the computing device stores a first M × N sparse matrix, and trains an embedded model for data open output according to the matrix, the embedded model is adapted to decompose an input sparse matrix into at least three sub-matrices and reconstruct two of the sub-matrices into an output matrix, the apparatus comprising: the data extraction module is suitable for acquiring L user identifications of data to be output, acquiring original data of N user characteristics corresponding to each user identification and generating a second sparse matrix L multiplied by N; the matrix splicing module is suitable for splicing the second sparse matrix into the first sparse matrix to obtain a third sparse matrix; the model retraining module is suitable for updating the stored first sparse matrix into a third sparse matrix and retraining the embedded model by adopting the third sparse matrix to obtain a new embedded model; and the data output module is suitable for learning the second sparse matrix by adopting the new embedded model to obtain an output matrix of the second sparse matrix and openly outputting the output matrix so that a data user can predict the user behavior according to the output matrix.
Optionally, in the apparatus according to the present invention, a prediction model is trained in the data consumer, and the prediction model outputs a corresponding user behavior prediction result according to the output matrix.
Optionally, in an apparatus according to the invention, the data consumer trains the predictive model by obtaining output matrices and corresponding user behavior features for a plurality of users from a computing device.
Alternatively, in the apparatus according to the invention, the embedding model is a singular value decomposition model adapted to decompose an input sparse matrix into a first unitary matrix U, a singular value matrix S and a second unitary matrix V, and reconstruct both sub-matrices U and S into an output matrix.
Optionally, in the apparatus according to the present invention, the singular value decomposition model is adapted to intercept the first K columns of the U submatrix to obtain a U-cut matrix, intercept the first K rows and the first K columns of the S submatrix to obtain an S-cut matrix, and multiply the U-cut matrix and the S-cut matrix to obtain the output matrix.
According to yet another aspect of the invention, there is provided a computing device comprising at least one processor; and at least one memory including computer program instructions; the at least one memory and the computer program instructions are configured to, with the at least one processor, cause the computing device to perform a data open output method as described above.
According to still another aspect of the present invention, there is provided a readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform the data open output method as described above.
According to the technical scheme of the invention, an embedded model is constructed, the model can decompose an input sparse matrix into at least three sub-matrices, and reconstruct two of the sub-matrices into an output matrix, the output matrix does not contain privacy information of a user, but represents implicit representation of the user, and required user behavior characteristic data can be provided for a data using (receiving) party to the maximum extent. Firstly, an embedded model is trained by taking an initially stored first sparse matrix of M multiplied by N as a training set, wherein M represents M user identifiers, and N represents N user characteristics. If the data corresponding to L user identifications needs to be output to a data user, original data of N user characteristics corresponding to the user identifications are extracted first, a second sparse matrix of L multiplied by N is generated, the second sparse matrix is spliced into the first sparse matrix, and a third sparse matrix of (M + L) multiplied by N is obtained. And then, updating the first sparse matrix as the training set into a third sparse matrix, and training the embedded model again according to the third sparse matrix to obtain a new embedded model. And finally, learning the second sparse matrix by using the new embedded model to obtain a corresponding output matrix and outputting the output matrix to a data user. Therefore, before data needs to be output, the lower model is trained again, the precision of the model is improved, and a better output result is provided for a data user.
That is, the invention performs data transformation and implicit representation extraction on the original data in a machine learning manner, and outputs the transformed data to the outside. Such output data cannot be understood by humans, but can be understood and used by a machine learning model of the data consumer. And the data user can carry out prediction model modeling according to the output matrix, and a corresponding user behavior prediction result can be obtained according to the output matrix. The invention can improve the modeling effect of a data user, solve the problem of data output safety, protect the privacy information of the user and improve the service experience of enterprises.
Drawings
To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.
FIG. 1 shows a block diagram of a computing device 100, according to one embodiment of the invention;
FIG. 2 illustrates a flow diagram of a data open export method 200 according to one embodiment of the invention;
fig. 3 is a block diagram illustrating a data open output apparatus 300 according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
FIG. 1 is a block diagram of a computing device 100 according to one embodiment of the invention. In a basic configuration 102, computing device 100 typically includes system memory 106 and one or more processors 104. A memory bus 108 may be used for communication between the processor 104 and the system memory 106.
Depending on the desired configuration, the processor 104 may be any type of processing, including but not limited to: a microprocessor (μ P), a microcontroller (μ C), a Digital Signal Processor (DSP), or any combination thereof. The processor 104 may include one or more levels of cache, such as a level one cache 110 and a level two cache 112, a processor core 114, and registers 116. The example processor core 114 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof. The example memory controller 118 may be used with the processor 104, or in some implementations the memory controller 118 may be an internal part of the processor 104.
Depending on the desired configuration, system memory 106 may be any type of memory, including but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. System memory 106 may include an operating system 120, one or more applications 122, and program data 124. In some embodiments, application 122 may be arranged to operate with program data 124 on an operating system. The program data 124 includes instructions, and in the computing device 100 according to the present invention, the program data 124 contains instructions for executing the data open output method 200.
Computing device 100 may also include an interface bus 140 that facilitates communication from various interface devices (e.g., output devices 142, peripheral interfaces 144, and communication devices 146) to the basic configuration 102 via the bus/interface controller 130. The example output device 142 includes a graphics processing unit 148 and an audio processing unit 150. They may be configured to facilitate communication with various external devices, such as a display or speakers, via one or more a/V ports 152. Example peripheral interfaces 144 may include a serial interface controller 154 and a parallel interface controller 156, which may be configured to facilitate communication with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 158. An example communication device 146 may include a network controller 160, which may be arranged to facilitate communications with one or more other computing devices 162 over a network communication link via one or more communication ports 164.
A network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media, such as carrier waves or other transport mechanisms, in a modulated data signal. A "modulated data signal" may be a signal that has one or more of its data set or its changes made in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or private-wired network, and various wireless media such as acoustic, Radio Frequency (RF), microwave, Infrared (IR), or other wireless media. The term computer readable media as used herein may include both storage media and communication media.
Computing device 100 may be implemented as a server, such as a file server, a database server, an application server, a WEB server, etc., or as part of a small-form factor portable (or mobile) electronic device, such as a cellular telephone, a Personal Digital Assistant (PDA), a personal media player device, a wireless WEB-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions. Computing device 100 may also be implemented as a personal computer including both desktop and notebook computer configurations. In some embodiments, the computing device 100 is configured to perform the data open output method 200. The data open output means that a data holder (such as a third-party data company) opens the use authority of data to the outside, and a data user can acquire legally compliant data from the data holder.
According to one embodiment, the computing device may further store a first sparse matrix of M × N, where M refers to M user identifiers and N refers to N user features, and an embedded model trained according to the first matrix for data open output. The N user characteristics may be, for example, N installed applications, N application preferences, N geographic locations, N geographic location preferences, N device information, or N device networking information. Taking the installation of the application as an example, the raw data of the N user characteristics is whether the user installs the application 1, whether the user installs the application 2, whether the user installs the application 3, and the like. Application preferences such as learning-like application preferences, video-like application preferences, friend-making chat-like preferences, and the like. The geographic location refers to whether the user has appeared at a certain geographic location, the geographic location preference may be, for example, a business aggregation area preference, a white collar aggregation area preference, a residential area preference, and the corresponding raw data may include whether the user has appeared at the business aggregation area 1, the business aggregation area 2, the white collar aggregation area 1, and so on. Generally, the application preference, the geographic location preference, and the like of the user can be obtained by performing attribution analysis on the installed application and the appeared geographic location of the user, respectively.
The device information may include common mobile phone configuration items such as a mobile phone manufacturer, a mobile phone model, a mobile phone version, a memory, an operating system, an operator, pixels, a kernel, and the like, such as whether the mobile phone is a certain manufacturer, a certain model, and the like. The device networking information is communication and connection information of the device and the Internet, and comprises whether the device is connected through a wireless network or not and whether the device is connected through a cellular network or not; for wireless networks, it also includes the attributes of the wireless network connected, the wireless sources scanned, etc. In addition, the N user features may also be unstructured data, such as a pixel point at each position on a certain picture.
In addition, for the extraction of the N user features, all the same type features of the M users can be extracted, and then the user features are removed in duplicate to obtain the total number of the user features. For example, all applications installed by each user are obtained, and then the applications are deduplicated to obtain the final N user features. For installing applications, assuming that there are M user identifiers, the extracted original data has M rows, and each row corresponds to an application list installed by a user. The application total number of M users is N after duplication removal. The raw data is transformed into an mxn sparse matrix, with each row corresponding to a user and each column corresponding to an application. If the third row and the fifth column are 0, it means that the third user does not install the application corresponding to the fifth column.
Compared with a common matrix, the sparse matrix can greatly save the storage space. According to one embodiment, the first sparse matrix may be constructed according to the following method: creating an M multiplied by N matrix, and filling all values in the matrix into zero; scanning line by line, if the user has a corresponding behavior (if a corresponding application is installed) at a certain position, replacing the value at the position with 1 until the scanning is finished; and rows and columns where data is recorded and stored as 1. The stored information can be regarded as a sparse matrix in fact, and the information can be restored to the original matrix at any time.
According to another embodiment, the embedded model may include a matrix decomposition module that decomposes an input sparse matrix into at least three sub-matrices and a matrix reconstruction module that reconstructs two of the sub-matrices into an output matrix. The embedded model may be a singular value decomposition model (SVD), a Word vector model Word2vec, or a Variational Auto-Encoder model (Variational Auto-Encoder), and further may be a truncated singular value decomposition model. The embedded model can also truncate the decomposed three sub-matrixes, namely performing matrix reconstruction on the content items of the first K (K is less than N) columns of the truncated sub-matrixes to reduce the dimensionality of the matrixes and improve the data processing speed.
Taking the singular value decomposition model as an example, the method can decompose an input sparse matrix into a first unitary matrix U, a singular value matrix S (only diagonal has values, and the rest are 0), and a second unitary matrix V, and reconstruct (multiply) the U and S sub-matrices into an output matrix. The step of reconstructing the U and S sub-matrices as the output matrix may include: and intercepting the front K columns of the U sub-matrix to obtain a U intercepting matrix, intercepting the front K rows and the front K columns of the S sub-matrix to obtain an S intercepting matrix, and multiplying the U intercepting matrix and the S intercepting matrix to obtain an output matrix.
Assuming that the original matrix is 1,000 rows and 10,000 columns, the decomposition results in the following three sub-matrices: u matrix 1,000 x 1,000, Sigma matrix 1,000 x 1,000, V matrix 1,000 x 10,000. Matrix truncation may only preserve the first 512 columns of the U matrix, so the dimensions of the three matrices become: 1,000 x 512, 512 x 10,000. The dimension of the output matrix after multiplication of the two truncated matrices is thus 1,000 x 512. It can be seen that the output matrix has the same number of rows as the input matrix, each row still representing a user. However, the number of columns in the output matrix is changed to 512, which is different from the application of each column in the original matrix, and each column in the output matrix corresponds to a feature. The characteristic has no interpretability, and the output data can be ensured not to reveal the privacy of the user.
It should be understood that, those skilled in the art may perform construction training on each embedded model according to the prior art, including selection and adjustment of model parameters such as matrix truncation dimension, number of model iterations, learning rate, and the like, which will not be described herein again. Generally, the training process of the model is mainly to restore the decomposed sub-matrices to be as close to the original matrix as possible, for example, a gradient descent algorithm is used to adjust parameters such as learning rate, so that the sum of squares of errors (objective function) of the true matrix value and the restored matrix value at each position is minimized to achieve an optimal matrix decomposition algorithm. In addition, regularization parameters (i.e., penalty term for overfitting) may also be added to the objective function to prevent overfitting.
FIG. 2 illustrates a flow diagram of a data openness export method 200, suitable for execution in the computing device 100, according to one embodiment of the invention. As shown in fig. 2, the method begins at step S220.
In step S220, L user identifiers of the data to be output are obtained, and the raw data of N user features corresponding to each user identifier is extracted, so as to generate an lxn second sparse matrix.
The raw data may be, for example, a list of applications installed by the user, a geographical location where the user is present, device information of the user, and the like, which may all be extracted from a data storage device such as a database. In addition, the original data may be extracted, transposed, and loaded (ETL), such as pre-processing for deduplication and washing based on user identification. The user identification may be, for example, a user name, a device identification, a mobile phone number, an IMEI, a MAC, an android id, an IDFV, or the like. According to one embodiment, the computing device may further store unique enterprise identifications (TDIDs) corresponding to various user identifications, where the unique enterprise identifications are generated by the data holder for each user at the acquisition end according to a certain rule, and the original data of the user characteristics may also be stored with the unique enterprise identifications as indexes. Therefore, when the original data of the N user characteristics corresponding to each user identification is extracted, the unique enterprise identification corresponding to each user identification can be retrieved, and the corresponding original data is extracted by taking the unique enterprise identification as an index.
Subsequently, in step S240, the second sparse matrix is spliced to the first sparse matrix to obtain a third sparse matrix. Namely, the L multiplied by N second sparse matrix is spliced to the M multiplied by N first sparse matrix which is used as the initial training set to obtain the (M + L) multiplied by N third sparse matrix.
Subsequently, in step S260, the stored first sparse matrix is updated to a third sparse matrix, and the embedded model is retrained by using the third sparse matrix, so as to obtain a new embedded model.
That is, when the second sparse matrix needs to be output, the matrix is spliced to the original training set to train the model again, and then the trained model is adopted to perform representation learning on the second sparse matrix. Therefore, the consistency of the decomposed sub-matrix of a certain matrix task (such as decomposing an L-row matrix) and the decomposed sub-matrix of a system matrix task (such as decomposing an M + L-row matrix) can be ensured, overlarge deviation can not occur, and the precision of matrix decomposition is improved. At this time, the first sparse matrix is updated to the third sparse matrix, and the value of M is updated to M + L. If data corresponding to another P user identifications are needed later, matrix splicing and model training are continuously carried out in the same way, at the moment, the corresponding second sparse matrix is P multiplied by N, the first sparse matrix is (M + L) multiplied by N, the spliced third sparse matrix is (M + L + P) multiplied by N, and the like.
Subsequently, in step S280, the new embedded model is used to learn the second sparse matrix, so as to obtain an output matrix of the second sparse matrix, and the output matrix is openly output, so that the data user performs user behavior prediction according to the output matrix.
As mentioned above, the embedded model may truncate the decomposed sub-matrices, for example, truncate the first K columns of features to obtain a truncated matrix, and then multiply the truncated matrix to obtain an output matrix. The output matrix has the same number of rows as the input matrix, each row still represents a user, but the number of columns of the output matrix is changed into K columns, which is different from the original matrix that each column is a user characteristic (such as corresponding to an application), and each column of the output matrix at the moment corresponds to a characteristic which is not interpretable, and the output data cannot be understood by human, but can be understood and used by a machine learning model to obtain required information. The output matrix is directly output, and a data using (receiving) party can call through a data interface, but cannot obtain the V matrix, so that the original matrix cannot be obtained by restoration, and any information related to the user cannot be restored.
According to one embodiment, a prediction model is trained in the data user, and the prediction model (e.g., a wind control model, a financial product preference model) can output the user behavior prediction result so as to make a corresponding product recommendation (e.g., a financial product recommendation). User behavior prediction such as user preference for a certain product (application) or download usage tendency; in addition, some user feature information or feature labels can also be obtained, such as a preference learning application, a video application, and the like. The data using party may obtain the output matrices of the multiple users and the corresponding user behavior feature information from the data holding party (e.g., the computing device 100) in advance, and train the prediction model according to the obtained content, where a specific training method is a technology that is relatively common in the art and is not described here any more.
According to one embodiment of the invention, a computing device (data holder) may store, for each data consumer, a corresponding first sparse matrix, and an embedded model trained from the matrix. Thus, when the L user identifiers of the data to be output are obtained in step S220, the L user identifiers sent by a certain data user are actually received; in the matrix splicing in step S240, the second sparse matrix may be actually spliced to the first sparse matrix corresponding to the data user, so as to obtain a corresponding third sparse matrix; in the step S260, the first sparse matrix of the data user may be updated to the third sparse matrix.
Generally, each data user has a special account (e.g. mailbox account), and when a data holder receives a plurality of user identifiers sent from an account of a data user on a data output platform of the data user, the data holder performs corresponding data extraction, model training and learning processes to generate a corresponding output matrix for the data user to call. Therefore, the data pertinence maintenance method for each data user can ensure that data of each user can not be crossed, and the stability and the safety of data service are improved. It should be understood that, in this embodiment, the same mxn sparse matrix may be used for the initial first sparse matrix of each data user, or different sparse matrices may be used; accordingly, the initial embedded models may be the same or different, and the present invention is not limited thereto.
Fig. 3 shows a schematic structural diagram of a data open output apparatus 300 according to an embodiment of the present invention, which is adapted to reside in a computing device 100, the computing device stores a first M × N sparse matrix, and trains an embedded model for data open output according to the matrix, the embedded model is adapted to decompose an input sparse matrix into at least three sub-matrices, and reconstruct two of the sub-matrices into an output matrix. The embedded model can be a (truncated) singular value model, a word vector model or a variational self-encoder model, wherein the truncated singular value model decomposes an input sparse matrix into a first unitary matrix U, a singular value matrix S and a second unitary matrix V, intercepts the first K columns of the U sub-matrix to obtain a U truncated matrix, intercepts the first K rows and the first K columns of the S sub-matrix to obtain an S truncated matrix, and multiplies the U truncated matrix and the S truncated matrix to obtain an output matrix. As shown in FIG. 3, the apparatus includes a data extraction module 320, a matrix stitching module 340, a model retraining module 360, and a data output module 380.
The data extraction module 320 may obtain L user identifiers of the data to be output, and collect the original data of N user features corresponding to each user identifier, to generate an lxn second sparse matrix. According to one embodiment, the user characteristic may be, for example, an installation application, a geographic location preference, device information, or device networking information. Moreover, the computing device 100 may further store a unique enterprise identifier corresponding to each user identifier; the raw data of the user characteristics is stored indexed by the unique business identity. In this way, the data extraction module 320 may retrieve the unique business identifier corresponding to each user identifier and extract the corresponding raw data using the unique business identifier as an index.
The matrix splicing module 340 may splice the second sparse matrix into the first sparse matrix to obtain a third sparse matrix.
The model retraining module 360 may update the stored first sparse matrix to a third sparse matrix, and retrain the embedded model using the third sparse matrix to obtain a new embedded model.
The data output module 380 may learn the second sparse matrix by using the new embedded model to obtain an output matrix of the second sparse matrix, and open-output the output matrix, so that a data user can obtain user feature information according to the output matrix. The data user can train a prediction model, the prediction model can output a corresponding user behavior prediction result according to the output matrix, and the data user can train the prediction model by acquiring the output matrices of a plurality of users and corresponding user behavior characteristics from the data holder.
According to an embodiment of the present invention, the computing device may store, for each data user, a first sparse matrix corresponding to each data user and an embedded model trained according to the first sparse matrix, where the data extraction module 320 may receive L user identifiers sent by a certain data user, and the matrix splicing module 340 may splice the second sparse matrix into the first sparse matrix corresponding to the data user to obtain a corresponding third sparse matrix; the model retraining module 360 may update the stored first sparse matrix to the third sparse matrix, and perform model retraining using the third sparse matrix, thereby implementing data sequential services for each data user.
The details of the data openness output device 300 according to the present invention are disclosed in detail in the description based on fig. 1 and fig. 2, and are not described herein again.
According to the technical scheme of the invention, the used embedded model can mine the potential correlation in the original data, and the output data obtained by transforming the model can improve the modeling effect. The output data can be directly spliced with other data (such as data of a client) for use without characteristic engineering, and the output data does not have personal information, so that safe data output is realized. The prediction model without business explanation can use the data output by the invention theoretically, and has wide application prospect in various fields such as finance, retail, internet, advertisement and the like.
A8, the method according to any one of A1-A6, wherein the computing device further stores therein a unique business ID corresponding to each user ID; and storing the original data of the user characteristics by taking the unique enterprise identification as an index.
A9, the method as in A8, wherein the step of extracting raw data of N user features corresponding to each user id comprises: and retrieving the unique enterprise identification corresponding to each user identification, and taking the unique enterprise identification as an index to extract corresponding original data.
A10, the method as in any one of a1-a9, wherein the computing device has stored therein for each data use a corresponding first sparse matrix and an embedded model trained from that matrix; the step of obtaining L user identifications of the data to be output comprises: receiving the L user identifications sent by a certain data user; the step of stitching the second sparse matrix into the first sparse matrix comprises: splicing the second sparse matrix to the first sparse matrix corresponding to the data user to obtain a corresponding third sparse matrix; the step of updating the stored first sparse matrix to the third sparse matrix comprises: and updating the stored first sparse matrix of the data user into the third sparse matrix.
The apparatus of B12, as described in B11, wherein the data consumer has a prediction model trained therein, and the prediction model outputs the corresponding user behavior prediction result according to the output matrix.
B13, the apparatus of B12, wherein the data consumer trains the predictive model by obtaining output matrices and corresponding user behavior features for a plurality of users from the computing device.
B14, the apparatus according to any of B11-B13, wherein the embedding model is a singular value decomposition model adapted to decompose an input sparse matrix into a first unitary matrix U, a singular value matrix S and a second unitary matrix V, and reconstruct both sub-matrices U and S into an output matrix.
The apparatus of B15, as in B14, wherein the singular value decomposition model is adapted to obtain a U truncation matrix by truncating the first K columns of the U submatrix, obtain an S truncation matrix by truncating the first K rows and the first K columns of the S submatrix, and multiply the U truncation matrix and the S truncation matrix to obtain the output matrix.
The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to execute the data open output method of the present invention according to instructions in the program code stored in the memory.
By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer-readable media includes both computer storage media and communication media. Computer storage media store information such as computer readable instructions, data structures, program modules or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of computer readable media.
In the description provided herein, algorithms and displays are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with examples of this invention. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
Furthermore, some of the described embodiments are described herein as a method or combination of method elements that can be performed by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.
As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense with respect to the scope of the invention, as defined in the appended claims.

Claims (10)

1. A data open output method, adapted to be executed in a computing device, wherein an mxn first sparse matrix is stored in the computing device, and an embedded model for data open output is trained according to the matrix, the embedded model is adapted to decompose an input sparse matrix into at least three sub-matrices, and reconstruct two of the sub-matrices into an output matrix, the method comprising:
acquiring L user identifications of data to be output, extracting original data of N user characteristics corresponding to each user identification, and generating a second sparse matrix L multiplied by N;
splicing the second sparse matrix into the first sparse matrix to obtain a third sparse matrix;
updating the stored first sparse matrix into the third sparse matrix, and retraining the embedded model by adopting the third sparse matrix to obtain a new embedded model; and
and learning the second sparse matrix by adopting the new embedded model to obtain an output matrix of the second sparse matrix, and openly outputting the output matrix so that a data user can predict the user behavior according to the output matrix.
2. The method of claim 1, wherein a prediction model is trained in the data consumer, the prediction model outputting corresponding user behavior predictions based on the output matrix.
3. The method of claim 2, wherein the data consumer trains the predictive model by obtaining output matrices and corresponding user behavior features for a plurality of users from the computing device.
4. Method according to any of claims 1-3, wherein the embedding model is a singular value decomposition model adapted to decompose an input sparse matrix into a first unitary matrix U, a singular value matrix S and a second unitary matrix V and reconstruct both the U and S sub-matrices into an output matrix.
5. The method of claim 4, wherein the step of reconstructing the U and S sub-matrices as an output matrix comprises:
and intercepting the first K columns of the U submatrix to generate a U truncation matrix, intercepting the first K rows and the first K columns of the S submatrix to generate an S truncation matrix, and multiplying the U truncation matrix and the S truncation matrix to obtain the output matrix.
6. The method of any of claims 1-3, wherein the embedded model is in a word vector model or a variational auto-encoder model.
7. The method of any of claims 1-6, wherein the user characteristics include at least one of an installation application, a geographic location preference, device information, and device networking information.
8. A data open output apparatus adapted to reside in a computing device having a first M × N sparse matrix stored therein and from which an embedded model for open output of data is trained, the embedded model being adapted to decompose an input sparse matrix into at least three sub-matrices and to reconstruct two of the sub-matrices into an output matrix, the apparatus comprising:
the data extraction module is suitable for acquiring L user identifications of data to be output, acquiring original data of N user characteristics corresponding to each user identification and generating a second sparse matrix L multiplied by N;
the matrix splicing module is suitable for splicing the second sparse matrix into the first sparse matrix to obtain a third sparse matrix;
the model retraining module is suitable for updating the stored first sparse matrix into a third sparse matrix and retraining the embedded model by adopting the third sparse matrix to obtain a new embedded model; and
and the data output module is suitable for learning the second sparse matrix by adopting a new embedded model to obtain an output matrix of the second sparse matrix and openly outputting the output matrix so that a data user can predict the user behavior according to the output matrix.
9. A computing device, comprising:
at least one processor; and
at least one memory including computer program instructions;
the at least one memory and the computer program instructions are configured to, with the at least one processor, cause the computing device to perform the method of any of claims 1-7.
10. A readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-7.
CN201910398372.7A 2019-05-14 2019-05-14 Data open output method and device and computing equipment Active CN111950015B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910398372.7A CN111950015B (en) 2019-05-14 2019-05-14 Data open output method and device and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910398372.7A CN111950015B (en) 2019-05-14 2019-05-14 Data open output method and device and computing equipment

Publications (2)

Publication Number Publication Date
CN111950015A true CN111950015A (en) 2020-11-17
CN111950015B CN111950015B (en) 2024-02-20

Family

ID=73335597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910398372.7A Active CN111950015B (en) 2019-05-14 2019-05-14 Data open output method and device and computing equipment

Country Status (1)

Country Link
CN (1) CN111950015B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114817845A (en) * 2022-05-20 2022-07-29 昆仑芯(北京)科技有限公司 Data processing method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480777A (en) * 2017-08-28 2017-12-15 北京师范大学 Sparse self-encoding encoder Fast Training method based on pseudo- reversal learning
CN107808278A (en) * 2017-10-11 2018-03-16 河海大学 A kind of Github open source projects based on sparse self-encoding encoder recommend method
US20190073580A1 (en) * 2017-09-01 2019-03-07 Facebook, Inc. Sparse Neural Network Modeling Infrastructure

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480777A (en) * 2017-08-28 2017-12-15 北京师范大学 Sparse self-encoding encoder Fast Training method based on pseudo- reversal learning
US20190073580A1 (en) * 2017-09-01 2019-03-07 Facebook, Inc. Sparse Neural Network Modeling Infrastructure
CN107808278A (en) * 2017-10-11 2018-03-16 河海大学 A kind of Github open source projects based on sparse self-encoding encoder recommend method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114817845A (en) * 2022-05-20 2022-07-29 昆仑芯(北京)科技有限公司 Data processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111950015B (en) 2024-02-20

Similar Documents

Publication Publication Date Title
US9990558B2 (en) Generating image features based on robust feature-learning
CN107786943B (en) User grouping method and computing device
CN109635918A (en) The automatic training method of neural network and device based on cloud platform and preset model
CN112418292B (en) Image quality evaluation method, device, computer equipment and storage medium
CN108549909B (en) Object classification method and object classification system based on crowdsourcing
CN109117442B (en) Application recommendation method and device
CN112070542A (en) Information conversion rate prediction method, device, equipment and readable storage medium
CN114330474A (en) Data processing method and device, computer equipment and storage medium
CN114580794A (en) Data processing method, apparatus, program product, computer device and medium
CN111950015B (en) Data open output method and device and computing equipment
CN112416533A (en) Method and device for running application program on browser and electronic equipment
CN111950016B (en) Method and device for generating data open output model and computing equipment
CN113742600B (en) Resource recommendation method and device, computer equipment and medium
CN112668659A (en) Model training method, platform and electronic equipment
CN114422502A (en) Resource downloading method, computing device and storage medium
CN113988914A (en) User value prediction method and device and electronic equipment
CN117421717B (en) Account authorization method, account authorization device, computer equipment and storage medium
CN115455306B (en) Push model training method, information push device and storage medium
CN108491465B (en) Crowd diffusion method and computing device
US20220382741A1 (en) Graph embeddings via node-property-aware fast random projection
CN118092888A (en) Code aided programming method, apparatus, computer device and storage medium
CN116028468A (en) Database tuning method, device, equipment, storage medium and program product
CN117609201A (en) Data deduplication method, device, storage medium and terminal
CN117274882A (en) Multi-scale target detection method and system based on improved YOLO model
CN113947417A (en) Training method and device of age identification model and age identification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100027 302, 3 / F, aviation service building, Dongzhimen street, Dongcheng District, Beijing

Applicant after: BEIJING TENDCLOUD TIANXIA TECHNOLOGY Co.,Ltd.

Address before: 100027 1003a, 10th floor, 33 Suzhou street, Haidian District, Beijing

Applicant before: BEIJING TENDCLOUD TIANXIA TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40036774

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant