CN111753897A

CN111753897A - Identification persistence method and device, electronic equipment and storage medium thereof

Info

Publication number: CN111753897A
Application number: CN202010552453.0A
Authority: CN
Inventors: 邱德波; 靳泽雯
Original assignee: Beike Technology Co Ltd
Current assignee: Beike Technology Co Ltd
Priority date: 2020-06-17
Filing date: 2020-06-17
Publication date: 2020-10-09

Abstract

The invention provides an identification continuation method, an identification continuation device, electronic equipment and a storage medium thereof, wherein the method comprises the following steps: obtaining a rated number of paired training samples, obtaining a similarity training model through deep learning training according to the paired training samples, receiving a to-be-continued sample, obtaining a plurality of historical samples from different sources in historical data according to the to-be-continued sample, obtaining a target historical sample through calculation of the similarity training model, wherein the target historical sample is the sample with the highest similarity, and mapping the to-be-continued mark to the target historical sample to realize mark continuation. The new user identification is associated with the data in the original user identification in a deep learning mode for synchronization or calling, so that the technical problem that the new user identification cannot be associated with the data under the original user identification after a user replaces equipment or modifies equipment parameters and the like in the prior art is solved.

Description

Identification persistence method and device, electronic equipment and storage medium thereof

Technical Field

The present invention relates to the field of computers, and in particular, to a method and an apparatus for tag renewal, an electronic device, and a storage medium thereof.

Background

Currently, a user tracking technology is widely used, for example, a user owns a tablet computer a and then purchases a mobile phone a, and because the user is the same user, it is desirable to associate data of two devices (i.e., the tablet computer a and the mobile phone a) so that the tablet computer a and the mobile phone a can synchronize with each other or retrieve data. Then, how to implement the functions in the above scenario, the prior art in the industry is id-Mapping. id-Mapping is to map and aggregate identifiers (such as device identifiers, user identifiers, and the like) of different sources, and finally concatenate fragmented data to eliminate data islands and provide a user complete information graph. After having a complete information graph, the tablet computer a or the mobile phone a can synchronize or retrieve data in other devices according to the information graph, so as to achieve the purpose of user tracking.

However, id-Mapping in the prior art can cause that identification history information cannot be traced after a certain device parameter changes due to the user device parameter, the APP product iteration and the like, which is a common fault in the industry. For example, because some parameters in the mobile phone a are different from corresponding parameters in the tablet computer a, the tablet computer a and the mobile phone a may determine that the two do not belong to the same user, which results in that the id-Mapping cannot map the user identifier in the mobile phone a to the information map, that is, cannot map the user identifier to a history identifier (that is, associated with the original identifier), so that the tablet computer a and the mobile phone a cannot synchronize or retrieve data, thereby hindering the application of the user identifier.

Therefore, there is a need for a method and system for associating a new user identifier to data under an original user identifier after a user changes a device or modifies a device parameter, which is helpful for solving the technical problem that the new user identifier cannot be associated to the data under the original user identifier after the user changes the device or modifies the device parameter.

Disclosure of Invention

The application provides an identification persistence method, which aims to associate a new user identification with data in an original user identification in a deep learning mode after a user replaces equipment or modifies equipment parameters and the like, so that the method is used for synchronization or calling, and is helpful for solving the technical problem that the new user identification cannot be associated with the data under the original user identification after the user replaces the equipment or modifies the equipment parameters and the like in the prior art.

The method comprises the following steps:

obtaining a rated number of paired training samples, wherein the paired training samples are two training samples with determined similarity, and the training samples comprise training sample identifications and training sample characteristic data corresponding to the training sample identifications;

obtaining a similarity training model through deep learning training according to the paired training samples, wherein the similarity training model is a model for calculating similarity;

receiving a pending sample, wherein the pending sample comprises a pending identification and a pending feature data;

obtaining a plurality of historical samples from different sources in historical data according to the to-be-resumed sample, wherein the historical samples comprise historical sample identifications and historical sample characteristic data, and the historical samples are samples corresponding to the historical sample characteristic data and the resume characteristic data;

calculating a target historical sample through the similarity training model according to the to-be-continued sample and the plurality of historical samples, wherein the target historical sample is the sample with the highest similarity with the to-be-continued identifier in the historical samples;

mapping the pending identification to the target history sample in the history data to achieve identification continuation.

In an embodiment, the obtaining of the similarity training model through deep learning training according to the pair of training samples, where the similarity training model is a model for calculating similarity includes:

respectively encoding the training sample feature data of two samples in the pair of training samples into first fixed dimension vectors;

splicing the two first fixed dimension vectors of the same pair of training samples according to the determined similarity through deep learning to obtain a similarity neural network;

and carrying out full-network splicing on the similarity neural networks of different pairs of training samples to obtain the similarity training model.

In an embodiment, the obtaining the similarity training model after performing full-network stitching on the similarity neural networks of different pairs of training samples includes:

carrying out full-network splicing on the similarity neural networks of different pairs of training samples to obtain a to-be-detected similarity training model;

substituting the paired training samples into the to-be-detected similarity training model to calculate to obtain detection similarity;

calculating a difference value as an error value according to the similarity and the detection similarity determined by the pair of training samples;

determining whether the error value is less than a predetermined error threshold,

and if the error value is smaller than the preset error threshold value, taking the similarity training model to be detected as the similarity training model.

In one embodiment, the determining whether the error value is less than a predetermined error threshold further comprises:

and if the error value is greater than or equal to the preset error threshold value, adjusting parameters in the similarity training model to be detected so as to enable the similarity training model to be detected to serve as the similarity training model after the error value is smaller than the preset error threshold value.

In an embodiment, the obtaining of the target history sample by the similarity training model according to the pending sample and the plurality of history samples through calculation includes:

encoding the persistent feature data and the historical sample feature data into a second fixed dimension vector;

and obtaining target historical samples by the second fixed dimension vector through the similarity training model, wherein the target historical samples are the samples with the highest similarity with the identifier to be continued in the historical samples.

In an embodiment, after the step of obtaining a nominal number of paired training samples, where the paired training samples are two training samples for which a similarity has been determined, and the training samples include training sample identifiers and training sample feature data corresponding to the training sample identifiers, the method further includes:

and cleaning the training sample characteristic data in the pair of training samples to obtain the training sample characteristic data of a data type and a classification type.

In an embodiment, the obtaining of multiple history samples from different sources in the history data according to the to-be-resumed sample includes a history sample identifier and history sample feature data, and the method includes, before the step of obtaining the history sample corresponding to the history sample feature data and the resume feature data, the step of obtaining the history sample including:

judging whether the historical sample feature data of the historical sample in the data source corresponds to the persistent feature data or not, and if so, executing the subsequent steps.

In one embodiment, the present application further provides an identification continuation device, including:

the device comprises an acquisition module, a comparison module and a comparison module, wherein the acquisition module is used for acquiring a rated number of paired training samples, the paired training samples are two training samples with determined similarity, and the training samples comprise training sample identifications and training sample characteristic data corresponding to the training sample identifications;

the training module is used for obtaining a similarity training model through deep learning training according to the pair of training samples, wherein the similarity training model is a model used for calculating similarity;

the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving a to-be-ordered sample, and the to-be-ordered sample comprises an to-be-ordered identifier and order feature data;

the obtaining module is further configured to obtain a plurality of historical samples from different sources in historical data according to the to-be-resumed sample, where the historical samples include historical sample identifiers and historical sample feature data, and the historical samples are samples corresponding to the historical sample feature data and the resume feature data;

the calculation module is used for calculating a target historical sample through the similarity training model according to the to-be-resumed sample and the plurality of historical samples, wherein the target historical sample is a sample with the highest similarity with the to-be-resumed identifier in the historical samples;

and the mapping module is used for mapping the identifier to be continued to the target history sample in the history data so as to realize identifier continuation.

In one embodiment, the apparatus further comprises:

the encoding module is used for encoding the training sample feature data of two samples in the pair of training samples into first fixed dimension vectors respectively;

the splicing module is used for respectively encoding the training sample feature data of two samples in the pair of training samples into first fixed dimension vectors;

the similarity neural network is further used for splicing the two first fixed dimension vectors of the same pair of training samples according to the determined similarity through deep learning to obtain a similarity neural network;

In an embodiment, the stitching module is further configured to perform full-network stitching on the similarity neural networks of different pairs of training samples to obtain a to-be-detected similarity training model;

the device also includes:

a substitution module for substituting the pair of training samples into the to-be-detected similarity training model to calculate the detection similarity;

the calculating module is further used for calculating a difference value as an error value according to the similarity determined by the pair of training samples and the detection similarity;

and the judging module is used for judging whether the error value is smaller than a preset error threshold value or not, and if the error value is smaller than the preset error threshold value, taking the to-be-detected similarity training model as the similarity training model.

In an embodiment, the determining module is further configured to, if the error value is greater than or equal to the predetermined error threshold, adjust parameters in the to-be-detected similarity training model so that the to-be-detected similarity training model is used as the similarity training model after the error value is smaller than the predetermined error threshold.

In an embodiment, the encoding module is configured to encode the persistent feature data and the historical sample feature data into a second fixed-dimension vector;

and the calculation module is used for obtaining a target historical sample by the second fixed dimension vector through the similarity training model, wherein the target historical sample is the sample with the highest similarity with the identifier to be continued in the historical samples.

In one embodiment, the apparatus further comprises:

and the cleaning module is used for cleaning the training sample characteristic data in the pair of training samples to obtain the training sample characteristic data of a data type and a classification type.

In an embodiment, the determining module is configured to determine whether historical sample feature data of a historical sample in a data source corresponds to the persistent feature data, and if the historical sample feature data corresponds to the persistent feature data, perform the following steps.

In one embodiment, the present application provides an electronic device, the apparatus comprising: a processor and a memory;

the memory has stored therein an application executable by the processor for causing the processor to perform the steps of the identification continuation method.

In an embodiment, the present application provides a computer-readable storage medium having a computer program stored thereon, which, when being executed by a processor, performs the steps of the method for identification continuation.

As can be seen from the above, based on the above embodiments, the present application provides an identifier persistence method, which can associate a new user identifier with data in an original user identifier in a deep learning manner after a user changes a device or modifies a device parameter, and the like, for synchronization or retrieval, and is helpful for solving the technical problem that the new user identifier cannot be associated with the data under the original user identifier after the user changes the device or modifies the device parameter, and the like in the id-Mapping in the prior art.

Drawings

FIG. 1 is a flow chart 100 illustrating a tag renewal method of the present invention;

FIG. 2 is a flow chart 200 illustrating a tag renewal method of the present invention;

FIG. 3 is a flow chart 300 illustrating a tag renewal method of the present invention;

FIG. 4 is a flow chart 400 illustrating a tag renewal method of the present invention;

FIG. 5 is a schematic diagram of a DNN neural network architecture according to the present invention;

FIG. 6 is a diagram illustrating an embodiment of an identifier tag continuation apparatus.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and examples.

Although id-Mapping in the prior art can solve the user tracking problem in the background art, the id-Mapping method may fail due to user equipment parameters, APP product iteration and the like, and the user tracking problem cannot be solved. The reason for the failure of id-Mapping is explained in the following by specific cases, and the id-Mapping can establish an information map between different devices because there is common data between different devices, for example, a user has two devices, and the historical information of the tablet computer a and the mobile phone a has a common binding.

The mobile phone number and the mobile phone number are the same user, so the id-Mapping can aggregate the 6 parameters of the imei, the bound mobile phone number and the mac address of the tablet computer a and the mobile phone a into an array, and the association between the tablet computer a and the mobile phone a is established according to the fact that both devices have the array consisting of the 6 parameters, so that data synchronization or mutual calling of the tablet computer a and the mobile phone a is realized.

The following is a specific example illustrating the problems that exist in the above example:

for example, the ID information of two logs is obtained as follows:

the device 1: < mac1, mac2> < imei1> < tel1 >;

device 2 < mac1> < imei2> < tel1, tel2 >;

mac1 and mac2, imei1 and imei2, tel1 and tel2 are two different mac addresses and two different imei string numbers, respectively, and two different tel1 and tel2 mobile phone numbers.

Since mac1 appears in both logs, both logs may be from the same user, and several rounds of merger are adopted to finally obtain < mac1, < mac2> < imei1, imei2> < tel1, tel2 >.

And (6) Map output:

sending to the device 1: < mac1, mac2> < imei1, imei2> < tel1, tel2 >;

sending to the device 2: < mac1, mac2> < imei1, imei2> < tel1, tel2 >;

the device 1 and the device 2 can establish the association between the device 1 and the device 2 according to the map (i.e. the array), and if the mobile phone number of one of the devices is changed to tel3, the map will fail and the association between the device 1 and the device 2 will also fail.

Therefore, how to enable the same user to realize the association among all the devices even if the parameters in the array are changed after different device identifiers are established becomes a problem to be solved urgently for research and development personnel.

Fig. 1 is a flow chart 100 illustrating a tag renewal method according to the present invention. As shown in fig. 1, in one embodiment, the present application provides an identifier renewal method, which includes:

s101, obtaining a rated number of paired training samples, wherein the paired training samples are two training samples with determined similarity, and the training samples comprise training sample identifications and training sample characteristic data corresponding to the training sample identifications.

In this step, a specific implementation step of obtaining a rated number of the pair of training samples is provided, and the pair of training samples in this step has two training samples, and since the similarity is a relative concept, at least two training samples are required to obtain the similarity. The training sample feature data may be understood as device property data or user behavior data, such as gender, age, geographical location, etc. of the user.

In addition, in one of the pair of training samples, the similarity between the two training samples is known and determined, and the specific similarity algorithm is more in the prior art, and is not repeated here as long as the similarity between the two training samples can be represented. It should be noted that the similarity may indicate that two samples point to the same user, or point to different users, as long as the similarity between the two samples can be determined, for example, if the similarity is greater than 80%, the similarity points to that two training samples belong to the same user, whereas if the similarity is less than or equal to 80%, the similarity points to that two training samples do not belong to the same user. Both of these cases are possible, and how this is handled will be further described later.

S102, obtaining a similarity training model through deep learning training according to the paired training samples, wherein the similarity training model is used for calculating the similarity.

In this step, a specific step of obtaining the similarity training model by training the pair of training samples is provided. The similarity training model may be used to calculate the similarity, and the specific training method may be implemented by using a DNN neural network, for example, if the similarity between two training samples in the pair of training samples is 76%, an objective function related to the feature data of the training samples is constructed by the neural network, and finally the similarity obtained by calculating the objective function is 76%, which indicates that the constructed objective function may be correct, and at this time, the objective function may be used as the similarity training model for subsequent processing. More specific implementations are further described below and will not be described further herein.

S103, receiving a to-be-continued sample, wherein the to-be-continued sample comprises to-be-continued identification and continued feature data.

In this step, specific embodiments of receiving a pending sample are provided. The pending sample is equivalent to the user identifier of the mobile phone a and the corresponding feature data, that is, the pending identifier and the pending feature data. It is noted that the vital signs data may also be understood as device property data or user behavior data, such as gender, age, geographical location, etc. of the user, but that the vital signs data are associated in the vital sample.

S104, obtaining a plurality of historical samples from different sources in historical data according to the to-be-resumed sample, wherein the historical samples comprise historical sample identifications and historical sample characteristic data, and the historical samples are samples corresponding to the historical sample characteristic data and the resume characteristic data.

In this step, a specific implementation manner of receiving the to-be-resumed sample is provided, in the above examples of the tablet computer a and the mobile phone a, the historical sample is equivalent to the sample from the tablet computer a, and of course, devices such as a tablet computer B, a smart watch B, and a mobile phone B may also be present, and these devices may be understood as different sources, and the user identifier and feature data in these devices are the historical sample identifier and the historical sample feature data in this step, and the historical sample feature data is, for example, attribute data such as user gender and age, and behavior data of the user using the mobile phone number and the device, and the like.

It should be noted that the persistent feature data and the historical sample feature data should be corresponding data in this step, so that the processing flow can be simplified, the reason for this is that calculating the similarity requires comparing the same type of feature data, and it is not possible to perform the similarity calculation between the age of the user of the tablet computer and the phone number of the mobile phone a, which is obviously unreasonable, so, if the persistent feature data and the historical sample feature data do not correspond to each other, such historical samples will not be acquired, even if the persistent feature data and the historical sample feature data correspond to each other, such historical samples may still be numerous, such as tablet B, smart watch B, and handset B meets the corresponding requirements, the history samples corresponding to these devices may all be obtained.

And S105, calculating to obtain a target historical sample through the similarity training model according to the to-be-continued sample and the plurality of historical samples, wherein the target historical sample is the sample with the highest similarity with the to-be-continued identifier in the historical samples.

In this step, a specific step of obtaining a target history sample by calculating through the similarity training model according to the pending sample and the plurality of history samples is provided, wherein the target history sample is a sample with the highest similarity to the pending mark in the history samples. This step may be understood as substituting the persistent characteristic data and the historical sample characteristic data into the objective function, so as to calculate a plurality of the similarities between different historical samples and the to-be-persistent sample, and select the historical sample with the highest similarity as the target historical sample. For example, at this time, the target history sample is a corresponding sample of the tablet computer a, which indicates that the tablet computer a and the mobile phone a are the same user at a large probability.

S106, mapping the identifier to be continued to the target history sample in the history data to realize identifier continuation.

In this step, a specific step of mapping the pending identifier to the historical data is provided, and finally the pending identifier in the mobile phone a is mapped to the tablet computer a in the historical data for association, so that a data island in the tablet computer a is eliminated, and the continuation of the pending identifier in the historical data in the mobile phone is completed.

In this embodiment, a method for identifying a persistence is provided, first, a nominal number of the pair of training samples is obtained from a data source for a subsequent training of the similarity training model, where the pair of training samples is two training samples and there is a determined similarity between the two training samples because the similarity is a relative concept, and the nominal number is a reasonable number that the similarity training model may be trained in a subsequent step. It should be noted that the similarity here does not represent that two training samples must be similar samples, and may be dissimilar samples, as long as the similarity can be determined by two training samples. And obtaining the similarity training model through deep learning training according to the paired training samples, wherein the similarity training model is a model for calculating the similarity. The specific training mode can be realized by those skilled in the art, and will not be described further herein. And after the similarity model is obtained, continuing processing of the mark can be realized, and a to-be-continued sample is received, wherein the to-be-continued sample comprises to-be-continued mark and continued characteristic data. The user identifier of the mobile phone a and the data of the user's gender, age, geographical location, etc. can be understood as the data characteristics of the user identifier of the mobile phone a and the user of the mobile phone a. Then, a plurality of historical samples from different sources are obtained according to the to-be-resumed samples, wherein the historical samples are samples corresponding to the resume feature data in the historical data, and the historical samples comprise historical sample identifications and historical sample feature data. The historical data may be understood as data of the tablet computer a, and of course, the historical data may also include data of devices such as a tablet computer B, a smart watch B, and a mobile phone B, and a plurality of historical samples of these different sources (i.e., different devices) should include samples of the historical sample feature data corresponding to the persistent feature data. For example, the user identifier of the tablet computer a is the historical sample identifier of the tablet computer a, and the feature data of the tablet computer a is the historical sample feature data. Of course, the tablet computer B, the smart watch B, the mobile phone B, and the like of different devices also have corresponding historical sample identifiers and historical sample feature data. It should be noted that the historical sample feature data and the sample corresponding to the persistent feature data, that is, the persistent feature data in the mobile phone a may be many, such as the gender and age of the user, and the mobile phone number, and then the historical sample feature data also relates to the gender and age of the user of the tablet computer a, and the mobile phone number. Then, target history samples are calculated according to the pending sample and the plurality of history samples through the similarity training model, wherein the target history samples are samples with the highest similarity to the pending identity in the history samples, as in the previous example, the history samples from different sources may be samples of the tablet computer a and the tablet computer B, and the smart watch B and the mobile phone B, then the target history samples with the highest similarity to the pending sample in the devices, such as the mobile phone a of the tablet computer a, are calculated through the similarity training model, and finally the pending identity is mapped to the target history samples in the history data to realize identity continuation. The mobile phone A and the tablet computer A can generate data correlation, and a data island of the tablet computer A is eliminated. In the above process, since the similarity is relied on, not a specific array, i.e. the map above, the final data association is not affected even if the device is replaced or the parameters in the device are modified. The method and the device are beneficial to solving the technical problem that data cannot be continuously associated between the devices after the user replaces the devices or modifies the device parameters and the like in the prior art.

Fig. 2 is a flow chart 200 illustrating a tag renewal method according to the present invention. As shown in fig. 2, in an embodiment, the obtaining a similarity training model through deep learning training according to the pair of training samples, where the similarity training model is a model for calculating a similarity includes:

s201, respectively encoding the training sample feature data of two training samples in the pair of training samples into vectors of a first fixed dimension.

In this step, a specific step of encoding the training sample feature data into first fixed dimension vectors respectively is provided. For example, the class-type features in the training sample feature data are converted into low-dimensional dense vectors through an embedding layer (i.e., a splicing layer), and then are spliced with corresponding numerical features, and the low-dimensional dense vectors of the two training samples are converted into the first fixed-dimension vectors through the embedding layer.

The following first explains a low-dimensional dense vector, for example, a vector (1.0,0.0,3.0), which is a one-dimensional array, which can have two expressions, and since there is only one dimension, it can be understood as a low-dimensional array, and there are two expressions: dense type and sparse type, wherein dense type is represented by (1.0,0.0,3.0), and the expression is not different from the common array, sparse type is represented by (3, [0,2], [1.0,3.0]), first number is represented by element number, second number is represented by element number, third number is represented by element value, and the step can be expressed by dense type.

The fixed dimension vector is a vector for fixing the dimension to specific content, for example, the user gender is represented by 0 for male, 1 for female, the mobile phone number is a specific numerical value, AAAAAA. For example, the user gender is converted into a one-dimensional low-dimensional array (0 or 1), that is, the low-dimensional dense vector, and then the low-dimensional dense vector is spliced with a specific numerical value of a mobile phone number, for example, the first training sample, the user 1 is a male, the mobile phone number AAAAAA, the second training sample, the user 2 is a male, and the mobile phone number BBBAAAAA, so that the first fixed dimension vector of the first training sample is (0, AAAAAA), and the first fixed dimension vector of the second training sample is (0, BBBAAAAA).

S202, splicing the two first fixed dimension vectors of the same pair of training samples according to the determined similarity through deep learning to obtain a similarity neural network.

And splicing the two first fixed dimension vectors in the step. In this step, the similarity neural network using DNN can be understood as a function spliced by the first fixed-dimension vector, and the similarity of the pair of training samples can be calculated by the function. Because the similarity of the pair of training samples is a known quantity, the pair of training samples will calculate a new similarity through the similarity neural network, and continuously adjust the function of the similarity neural network to make the determined similarity match with the similarity calculated through the similarity neural network, at this time, the training of the similarity neural network is completed.

S203, carrying out full-network splicing on the similarity neural networks of different pairs of training samples to obtain the similarity training model.

In this step, a specific implementation of training the similarity training model is provided. Because the training sample feature data of different pairs of training samples may be different, the training sample feature data for the similarity neural networks trained by the pairs of training samples are different, and thus the similarity training model obtained by full-network stitching the similarity neural networks of different pairs of training samples can calculate the similarity of different trained pairs of training samples.

In this embodiment, a specific implementation manner of training the similarity training model is provided, and first, the training sample feature data of two samples in the pair of training samples are respectively encoded into first fixed dimension vectors. And then splicing the two first fixed dimension vectors to obtain the similarity neural network. And finally, carrying out full-network splicing on the similarity neural network to obtain the similarity model.

Fig. 3 is a flow chart 300 illustrating a tag renewal method according to the present invention. As shown in fig. 3, in an embodiment, the obtaining the similarity training model after performing full-network stitching on the similarity neural networks of different pairs of training samples includes:

s301, carrying out full-network splicing on the similarity neural networks of different pairs of training samples to obtain a similarity training model to be detected.

In this step, a specific implementation manner is provided for obtaining the training model of the similarity to be detected through the neural network stitching of the similarity, and the training model of the similarity to be detected can be understood as the objective function, but the objective function is a function that may still have inaccuracy, because the stitching process described above may reduce the accuracy of calculating the similarity.

And S302, substituting the paired training samples into the to-be-detected similarity training model to calculate the detection similarity.

In this step, a specific implementation manner is provided in which the pair of training samples are substituted into the training model for similarity to be detected, and the detection similarity is calculated.

S303, calculating a difference value as an error value according to the similarity determined by the pair of training samples and the detection similarity.

In this step, the determined similarity and the detected similarity are subtracted to obtain a specific implementation of the error value.

S304, judging whether the error value is smaller than a preset error threshold value, and if the error value is smaller than the preset error threshold value, taking the similarity training model to be detected as the similarity training model.

In this step, the magnitude of the error value and the predetermined error threshold is determined to determine whether the similarity training model to be detected can be used as the similarity training model.

In this embodiment, a specific implementation manner for determining whether the similarity training model to be detected can be used as the similarity training model is provided. Firstly, carrying out full-network splicing on the similarity neural networks of different pairs of training samples to obtain a to-be-detected similarity training model. And substituting the paired training samples into the to-be-detected similarity training model to calculate to obtain the detection similarity. Calculating a difference value as an error value according to the similarity determined by the pair of training samples and the detection similarity. And judging whether the error value is smaller than a preset error threshold value, and if the error value is smaller than the preset error threshold value, taking the similarity training model to be detected as the similarity training model.

In this embodiment, a specific implementation is provided in which the error value is smaller than the predetermined error threshold. And if the error value is greater than or equal to the preset error threshold value, adjusting parameters in the similarity training model to be detected so as to enable the similarity training model to be detected to serve as the similarity training model after the error value is smaller than the preset error threshold value. The parameters are to be understood as parameters in the objective function.

Fig. 4 is a flow chart 400 illustrating a tag renewal method according to the present invention. As shown in fig. 4, in an embodiment, the calculating, according to the pending sample and the plurality of history samples, a target history sample through the similarity training model, where the target history sample is a sample with a highest similarity to the pending identifier in the history samples includes:

s401, encoding the continuous characteristic data and the historical sample characteristic data into a second fixed dimension vector.

In this step, a specific embodiment of encoding and converting the persistent feature data and the historical sample feature data into the second fixed dimension vector is provided. Since the objective function of the similarity training model is substituted into a fixed dimension vector by means of special data, the fixed dimension vector should be substituted in the subsequent calculation, so the persistent feature data and the historical sample feature data should be encoded into a second fixed dimension vector in this step.

The second fixed dimension vector is the same as the first fixed dimension vector in data structure, except that the first fixed dimension vector is derived from two pairs of the training samples, and the second fixed dimension vector is derived from the persistent feature data and the historical sample feature data.

S402, obtaining a target historical sample by the second fixed dimension vector through the similarity training model, wherein the target historical sample is a sample with the highest similarity with the to-be-continued identifier in the historical samples.

In this step, a specific implementation manner of calculating the target history sample through the second fixed dimension vector is provided, where the target history sample is a sample with the highest similarity to the identifier to be persisted in the history samples. And substituting the second fixed dimension vector into the objective function to obtain the similarity, and selecting the sample with the highest similarity value.

In this embodiment, a specific implementation of how to calculate the target history sample by the second fixed dimension vector is provided. Encoding the persistent feature data and the historical sample feature data into a second fixed dimension vector. And obtaining target historical samples by the second fixed dimension vector through the similarity training model, wherein the target historical samples are the samples with the highest similarity with the identifier to be continued in the historical samples. And finally calculating the target history sample.

In this embodiment, a specific implementation manner of performing cleaning and screening on the training sample data before the pair of training samples is trained is provided.

In this embodiment, a specific implementation manner of determining whether the historical sample feature data corresponds to the persistent feature data is provided. When the historical sample feature data corresponds to the persistent feature data, performing subsequent steps. The essence of this embodiment is to select the historical samples capable of calculating the similarity through the similarity training model, otherwise, if the historical sample feature data does not correspond to the persistent feature data, the similarity cannot be calculated between the historical sample feature data and the persistent feature data, and thus, such historical samples do not have the value of performing the subsequent steps except for increasing the energy consumption.

Basic principle and working process:

the present application is further explained below by taking the mobile phone a newly added by the user as an example, by using the history sample composed of the tablet computer a, the tablet computer B, the smart watch B, the mobile phone B, and other different device sources. The tablet personal computer A is provided with a user identifier B, the gender and the age of a user, and a mobile phone number B, wherein the mobile phone number B is XXX.

The user establishes a user identifier a in the mobile phone a, and the feature data mapped or associated with the user identifier a, in this example, the gender and age of the user, and the mobile phone number feature data are used, for example, the gender is male, the age is 20, and the mobile phone number a is XXX.

In one embodiment, the present application provides a method for identifying a resume, the method comprising:

obtaining a rated number of paired training samples, wherein the paired training samples are two training samples with determined similarity, and the training samples comprise training sample identifications and training sample characteristic data corresponding to the training sample identifications.

In this step, a nominal number of pairs of training samples is obtained, where the nominal number is understood to be the number of correct models to be trained, and specifically how many pairs of training samples are required to be set by the developer, where it is noted that the pairs of training samples should have at least two training samples with the determined similarity, and each training sample includes the training sample identifier and the training sample feature data, and the training sample feature data should select the feature data corresponding to the user identifier a, because the model trained in this way is applicable to the user identifier a for its lifetime. For example, if the training sample characteristic data is not the attribute of the user or the attribute of the device, such as two pictures, although the similarity between the two pictures can also be calculated, the training sample cannot be applied to the similarity calculation of the user identifier a after training, and thus the requirement of this step is not met. In this step, the training sample feature data should be selected from the feature data that the user identifier a may be associated with. Such as the user's gender and age, cell phone number, etc.

In addition, the similarity is only a numerical value, and may indicate that two training samples of the pair of training samples are similar samples or not similar samples. In either case, the training can be performed because the purpose of the pair of training samples is to train a model, even if the two training samples in the pair are not similar, and their inversions are actually similar.

The training sample feature data is cleaned in this step to obtain a data type and a classification type, why is cleaning? Since the data type and the category type data can greatly reduce the calculation amount, such as the gender of the user, the category type can be understood, since the gender is generally only male and female, and 0 or 1 can be used to represent different categories. The data type, such as the age of the user, is a specific numerical value. It is not difficult to find that, for example, the similarity of the category type data is determined, only the gender of the user needs to be judged, the calculation amount of the data type is not large, for example, the mobile phone number is not large, and whether the mobile phone numbers in the two training samples are the same or not is judged, so that the similarity is calculated by the data type or the category type data has the characteristic of small calculation amount, but if the similarity of two pictures is calculated, the algorithm and the calculation amount are huge, and meanwhile, a subsequent fixed dimension vector is not easily formed, so that the paired training samples are cleaned in the step.

In a broad sense, the training sample feature data may include attribute data such as gender and age of the user and behavior data of the user using a mobile phone number, equipment, and the like, and the data is cleaned and converted into numerical type and category type features. Data such as gender and the like belong to the type, and frequency, days and the like of using the mobile phone number by the user belong to the numerical type.

And respectively encoding the training sample feature data of two samples in the pair of training samples into first fixed dimension vectors.

In this step, the training sample feature data of the type in the two training samples of the pair of training samples is converted into a low-dimensional dense vector through an embedding layer, and then is respectively spliced with the corresponding numerical type, and finally is respectively converted into a first fixed-dimensional vector through the embedding layer.

And splicing the two first fixed dimension vectors of the same pair of training samples according to the determined similarity through deep learning to obtain a similarity neural network.

In this step, the two first fixed dimension vectors of the same pair of training samples are spliced to obtain the similarity neural network, and the similarity neural network is only a model formed by splicing one pair of training samples.

And carrying out full-network splicing on the similarity neural networks of different pairs of training samples to obtain a to-be-detected similarity training model.

In this step, it is important to splice the single similarity neural networks of the nominal number of the pair of training samples to obtain a total similarity training model to be detected, but this step is only simple DNN neural network splicing and cannot obtain a model meeting the requirements, so that the similarity training model to be detected is obtained at this time.

Fig. 5 is a schematic diagram of the DNN neural network architecture of the present invention, and as shown in fig. 5, an explanation is made on the formation of the above-mentioned similarity training model to be detected in the form of a DNN data model.

The vector obtained by the vector generation module is firstly connected with a full connection module, wherein the activation function involved in each full connection layer is a tanh function, which leads the output of each layer to be limited between (-1, 1). The activation function may not be limited to this function, and may also use similar sigmoid functions to ensure that the output of each layer in the fully-connected module is normalized to within a certain range of values. In addition, a cross-network module is provided in parallel with the fully-connected module, for the purpose of automatically constructing the limited high-order cross-feature in a display-controllable and efficient manner, and the output of each layer is xl +1 ═ x0xlT wl + b + xl ═ f (xl, wl, bl) + xl. Where x0 is the input to the first layer, xl is the output of layer L, xl +1 is the output of layer L +1, wl and b are the parameters of layer L. The number of neurons in each layer is the same, that is, the input and output dimensions of each layer are the same. Secondly, inspired by a residual error network, the function f of each layer is fitted with the residual error of xl +1-xl, so that the problem of gradient disappearance can be solved, and the network can be deeper. And finally, splicing the outputs of the full-connection module and the cross network module, and then putting the spliced outputs into a full-connection layer, wherein the activation function related to the full-connection layer is a sigmoid function, so that the final output is limited between (0,1), and the probability that the two samples are similar is represented.

When this model is trained using this model, the objective function of the trained model is:

wherein y is_iRepresenting true similarity of two IDs, p_iID similarity, W, representing the output of a similarity discriminant model_lRepresenting a network parameter. The regular term used is the L2 regular term.

The above is an explanation of the fully-spliced mathematical model in the prior art, which is not a limitation of the present application, but is only for better explanation of the present application.

And substituting the paired training samples into the to-be-detected similarity training model to calculate to obtain the detection similarity.

In this step, a specific step of detecting the similarity training model to be detected is provided, and the pair of training samples is substituted into the similarity training model to be detected. Specifically, the two training samples in the pair of training samples are vectorized and then substituted into the objective function.

Calculating a difference value as an error value according to the similarity determined by the pair of training samples and the detection similarity.

The difference calculated in this step from the similarity and the detected similarity is the error value.

And judging whether the error value is smaller than a preset error threshold value, if so, taking the to-be-detected similarity training model as the similarity training model, and if so, adjusting parameters in the to-be-detected similarity training model to enable the error value to be smaller than the preset error threshold value, and then taking the to-be-detected similarity training model as the similarity training model.

In this step, it is determined whether the similarity training model to be detected can be used as the similarity training model, and if the predetermined error threshold is not met, parameters in the similarity training model to be detected are adjusted, where the parameters may be understood as parameters of the objective function. Thus, the similarity training model is obtained.

Receiving a pending sample, wherein the pending sample comprises a pending identification and a pending feature data.

In this step, the user identifier a of the mobile phone a and the mapped or associated feature data, that is, the pending identifier and the feature data, are received.

And acquiring a plurality of historical samples from different sources in historical data according to the to-be-resumed sample, wherein the historical samples comprise historical sample identifications and historical sample characteristic data, and the historical samples are samples corresponding to the historical sample characteristic data and the resume characteristic data.

In this step, the historical samples composed of different device sources, such as the tablet computer a, the tablet computer B, the smart watch B, and the mobile phone B, are obtained, and the historical sample feature data is the feature data in the above devices. It should be noted that if the historical sample feature data does not correspond to the persistent feature data, then the masking can be performed directly because the similarity between the features, which is a relative concept, cannot be calculated, and the historical sample feature data must correspond to the persistent feature data before the calculation can be performed.

And calculating a target historical sample through the similarity training model according to the to-be-continued sample and the plurality of historical samples, wherein the target historical sample is the sample with the highest similarity with the to-be-continued identifier in the historical samples.

In this step, a specific implementation manner of obtaining the target history sample by calculation is provided, where the target history sample is a sample with the highest similarity to the identifier to be persisted in the history samples. For example, the tablet computer a in the example corresponds to the history sample with the highest similarity. The mobile phone A and the tablet computer A are the same user under the condition of high probability.

In this step, a specific implementation is provided for mapping the pending identifiers to the target history samples in the history data to implement identifier renewal. After that, the to-be-ordered identifier, that is, the user identifier a of the mobile phone a, may access the data in the target history sample, that is, the data of the tablet computer a, and the mobile phone a may perform data synchronization or data retrieval of the tablet computer a, and the user identifiers of the mobile phone a and the tablet computer a are associated with each other, thereby completing the order.

FIG. 6 is a diagram illustrating an embodiment of an identifier tag continuation apparatus. As shown in fig. 6, in one embodiment, the present application provides an identification resume apparatus, including:

an obtaining module 101, configured to obtain a rated number of paired training samples, where the paired training samples are two samples for which similarity has been determined, and the training samples include training sample identifiers and training sample feature data;

the training module 102 is configured to obtain a similarity training model through deep learning training according to the pair of training samples, where the similarity training model is a model used for calculating similarity between the samples;

the receiving module 103 is configured to receive a pending sample, where the pending sample includes a pending identifier and pending feature data;

the obtaining module 101 is further configured to obtain a plurality of historical samples from different sources according to the to-be-resumed sample, where the historical sample is a sample corresponding to the resume feature data in the historical data, and the historical sample includes a historical sample identifier and historical sample feature data;

a calculating module 104, configured to calculate, according to the pending sample and the multiple historical samples, a target historical sample through the similarity training model, where the target historical sample is a sample with a highest similarity to the pending identifier in the historical samples;

a mapping module 105, configured to map the pending identity to the target history sample in the history data to implement identity renewal.

In one embodiment, the apparatus further comprises:

an encoding module 106, configured to encode the training sample feature data of two samples in the pair of training samples into first fixed dimension vectors respectively;

a splicing module 107, configured to splice the two first fixed dimension vectors of the same pair of training samples according to the determined similarity through deep learning to obtain a similarity neural network;

and the similarity training model is obtained by carrying out full-network splicing on the similarity neural networks of different pairs of training samples.

the device also includes:

a substitution module 108, configured to substitute the pair of training samples into the to-be-detected similarity training model to calculate a detection similarity;

the calculating module 104 is further configured to calculate a difference value as an error value according to the similarity and the detection similarity determined by the pair of training samples;

the determining module 109 is configured to determine whether the error value is smaller than a predetermined error threshold, and if the error value is smaller than the predetermined error threshold, use the to-be-detected similarity training model as the similarity training model.

In an embodiment, the determining module 109 is further configured to, if the error value is greater than or equal to the predetermined error threshold, adjust parameters in the to-be-detected similarity training model so that the to-be-detected similarity training model is used as the similarity training model after the error value is smaller than the predetermined error threshold.

In an embodiment, the encoding module 106 is configured to encode the persistent feature data and the historical sample feature data into a second fixed-dimension vector;

the calculating module 104 is configured to obtain a target history sample from the second fixed dimension vector through the similarity training model, where the target history sample is a sample with a highest similarity to the identifier to be persisted in the history samples.

In one embodiment, the apparatus further comprises:

a cleaning module 110, configured to clean the training sample feature data in the pair of training samples to obtain data type and category type of the training sample feature data.

In an embodiment, the determining module 109 is configured to determine whether the historical sample feature data of the historical sample in the data source corresponds to the persistent feature data, and if the historical sample feature data corresponds to the persistent feature data, perform the following steps.

In one embodiment, the present application further provides an electronic device, the apparatus comprising: a processor and a memory;

In practical applications, the computer readable medium may be included in the apparatus/device/system described in the above embodiments, or may exist alone without being assembled into the apparatus/device/system. The above-mentioned computer-readable storage medium carries one or more programs which, when executed, implement the image data processing method of the described data.

According to embodiments disclosed herein, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example and without limitation: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing, without limiting the scope of the present disclosure. In the embodiments disclosed herein, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

It should be understood that the present invention is not limited to the particular embodiments described herein, but is intended to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for tag renewal, the method comprising:

2. The method according to claim 1, wherein the obtaining a similarity training model through deep learning training according to the pair of training samples, wherein the similarity training model is a model for calculating similarity includes:

3. The method according to claim 2, wherein the obtaining the similarity training model by full-network stitching the similarity neural networks of different pairs of training samples comprises:

4. The tag lifetime method of claim 3, wherein said determining if said error value is less than a predetermined error threshold value further comprises:

5. The method according to claim 1, wherein the calculating a target history sample according to the pending sample and the plurality of history samples by the similarity training model comprises:

6. The method of claim 1, wherein the obtaining a nominal number of pairs of training samples, wherein the pairs of training samples are two training samples with determined similarity, and the training samples comprise training sample identifiers and training sample feature data corresponding to the training sample identifiers, further comprises:

7. The method for identifying a resume according to claim 1, wherein a plurality of history samples from different sources in the history data are obtained according to the to-be-resumed sample, wherein the history samples include history sample identifiers and history sample feature data, and the history samples are obtained before the step of obtaining the sample corresponding to the history sample feature data and the resume feature data, the method includes:

determining whether historical sample feature data of a historical sample in a data source corresponds to the recurring feature data,

if the historical sample feature data corresponds to the persistent feature data, performing subsequent steps.

8. An identification continuation apparatus, characterized in that the identification continuation apparatus comprises:

the acquisition module is used for acquiring a plurality of historical samples from different sources in historical data according to the to-be-resumed sample, wherein the historical samples comprise historical sample identifications and historical sample characteristic data, and the historical samples are samples corresponding to the historical sample characteristic data and the resume characteristic data;

9. An electronic device, wherein the apparatus comprises: a processor and a memory;

the memory has stored therein an application program executable by the processor for causing the processor to perform the steps of the identification continuation method as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the identification continuation method of any one of claims 1 to 7.