CN111967518B

CN111967518B - Application labeling method, application labeling device and terminal equipment

Info

Publication number: CN111967518B
Application number: CN202010832244.1A
Authority: CN
Inventors: 黄崇远
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Priority date: 2020-08-18
Filing date: 2020-08-18
Publication date: 2023-10-13
Anticipated expiration: 2040-08-18
Also published as: CN111967518A

Abstract

The application provides an application labeling method, which comprises the following steps: acquiring application sequences corresponding to a plurality of users respectively, wherein each application sequence comprises text information for describing a first application related to the corresponding user; inputting each application sequence into a trained natural language processing model, and obtaining an output result of the trained natural language processing model based on each application sequence, wherein the output result comprises characteristic vectors corresponding to each first application; according to the feature vector, similarity between a basic application and other applications is determined, wherein the basic application is a first application corresponding to a preset label, and the other applications are applications except the basic application in the first application; and determining application labels corresponding to other applications respectively according to the similarity and the preset labels of the basic application. By the method, the accuracy of marking the application can be improved.

Description

Application labeling method, application labeling device and terminal equipment

Technical Field

The application belongs to the technical field of application, and particularly relates to an application labeling method, an application labeling device, terminal equipment and a computer readable storage medium.

Background

In the process of using the terminal device, the terminal device needs to continuously refresh the screen to provide a dynamic display effect for the user.

Labeling applications has an important role in many internet application scenarios. For example, in an application store, accurate labeling is performed on the application, so that the searching efficiency of a user for the application is improved, in addition, related applications can be accurately recommended to the user according to the labeling of the application, and the use experience of the user is improved.

At present, the commonly used method for obtaining the application label is a method for manually marking the application, extracting keywords from the description information of the application to mark the application, and the like. However, the method for labeling by manpower has lower efficiency, and the subjectivity of the manpower is stronger, so that the accuracy of the application labeling is difficult to ensure. The method for extracting keywords from the description information of the application to label the application depends on the description information of the application, but the description information is not necessarily accurate, and in addition, the method may be too simple, so that more accurate application labels are difficult to extract. Therefore, the current method for labeling applications has poor accuracy.

Disclosure of Invention

The embodiment of the application provides an application labeling method, an application labeling device, terminal equipment and a computer readable storage medium, which can improve the accuracy of labeling an application.

In a first aspect, an embodiment of the present application provides an application labeling method, including:

acquiring application sequences corresponding to a plurality of users respectively, wherein each application sequence comprises text information for describing a first application related to the corresponding user;

inputting each application sequence into a trained natural language processing model, and obtaining an output result of the trained natural language processing model based on each application sequence, wherein the output result comprises characteristic vectors corresponding to each first application;

according to the feature vector, similarity between a basic application and other applications is determined, wherein the basic application is a first application corresponding to a preset label, and the other applications are applications except the basic application in the first application;

and determining application labels corresponding to other applications respectively according to the similarity and the preset labels of the basic application.

In a second aspect, an embodiment of the present application provides an application labeling apparatus, including:

The system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring application sequences respectively corresponding to a plurality of users, and each application sequence contains text information for describing a first application related to the corresponding user;

the processing module is used for inputting each application sequence into a trained natural language processing model, and obtaining an output result of the trained natural language processing model based on each application sequence, wherein the output result comprises characteristic vectors corresponding to each first application;

the first determining module is used for determining similarity between a basic application and other applications according to the feature vector, wherein the basic application is a first application corresponding to a preset label, and the other applications are applications except the basic application in the first application;

and the second determining module is used for determining application labels corresponding to other applications respectively according to the similarity and the preset labels of the basic application.

In a third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, a display, and a computer program stored in the memory and capable of running on the processor, where the processor implements the application labeling method as described in the first aspect when executing the computer program.

In a fourth aspect, an embodiment of the present application provides a computer readable storage medium storing a computer program, where the computer program is executed by a processor to implement the application labeling method as described in the first aspect.

In a fifth aspect, an embodiment of the present application provides a computer program product, which when run on a terminal device, causes the terminal device to perform the application tagging method of the first aspect.

Compared with the prior art, the embodiment of the application has the beneficial effects that: in the embodiment of the application, application sequences corresponding to a plurality of users can be acquired respectively, and each application sequence contains text information for describing a first application related to the corresponding user, so that the text information describing the associated application of each user can be acquired, each application sequence is input into a trained natural language processing model, the output result of the trained natural language processing model based on each application sequence is obtained, and the feature vector of each first application can be accurately and efficiently extracted through the trained natural language processing model; and then, according to the feature vectors respectively corresponding to the first applications, determining the similarity between the basic application and other applications, so that the application labels respectively corresponding to the other applications can be determined according to the similarity and the preset labels of the basic application. At this time, the similarity between each basic application and other applications can be evaluated according to the feature vectors obtained by accurate and efficient extraction, so that the labels of other similar applications can be accurately determined by combining the preset labels obtained by pre-labeling the basic applications, and the accuracy of labeling the applications is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of an application labeling method according to an embodiment of the present application;

FIG. 2 is a flowchart of another labeling method according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of an labeling device according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

The application labeling method provided by the embodiment of the application can be applied to terminal equipment such as a server, a desktop computer, a mobile phone, a tablet personal computer, wearable equipment, vehicle-mounted equipment, augmented reality (augmented reality, AR)/Virtual Reality (VR) equipment, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (personal digital assistant, PDA) and the like, and the embodiment of the application does not limit the specific type of the terminal equipment.

Specifically, fig. 1 shows a flowchart of an application labeling method provided by an embodiment of the present application, where the application labeling method may be applied to a terminal device.

As shown in fig. 1, the application labeling method may include:

step S101, obtaining application sequences respectively corresponding to a plurality of users, wherein each application sequence comprises text information for describing a first application related to the corresponding user.

In the embodiment of the application, for each user, the first application related to the user can be an application used by the user in a specified time period and/or an installed application and the like.

For example, the application sequence may include a name of the corresponding user and a name of the corresponding first application, and further, the application sequence may include at least one of attribute information, application description information, and the like of the corresponding first application. The specific form of the application sequence is not limited herein.

In some embodiments, the acquiring the application sequences respectively corresponding to the plurality of users may include:

for each user, sequencing the users in a specified time period according to the first applications which are used in sequence;

and generating an application sequence corresponding to the user according to the sequencing result.

For example, when the user a sequentially uses the application a, the application B, the application C, the application D, and the application E in the day, the application sequence corresponding to the user a is:

user a: application A, application B, application C, application D, application E.

Of course, in some examples, in the application sequence corresponding to the user a, the same first application may appear multiple times according to the use of the user.

According to the application sequence acquisition mode, application sequences corresponding to a plurality of users can be acquired as follows:

User B: application B, application C, application D, application F, application G.

User C: ...

At this time, the application sequence may include information about the application used by the user at a different time from the specified time period (e.g., during a day), and may include some inherent association features between applications. For example, for a shopping class application, a user often needs to then invoke a payment class application to make a payment when shopping using the shopping class application. Thus, the application sequence may include attribute features of each first application and association information between each first application. So that feature vectors for each first application can be obtained from the sequence of applications by means of a subsequent natural language processing model.

In addition, the application sequence acquisition mode in the embodiment of the application can quickly and efficiently acquire a large number of application sequences corresponding to users, the data processing efficiency is higher, the application sequence is also clear, and the application sequence is convenient for a subsequent natural language processing model to process.

Step S102, inputting each application sequence into a trained natural language processing model, and obtaining an output result of the trained natural language processing model based on each application sequence, wherein the output result comprises feature vectors corresponding to each first application.

In the embodiment of the application, the natural language processing model can be used for acquiring the feature vector of each first application from the words and the characters in the application sequence and the corresponding context information. By way of example, the natural language processing model may be a neural probability language model, a g-gram model, a word2vec model, or the like.

The natural language processing model may be trained from a preset data set. In one example, the preset dataset may include a preset application and a truth tab for the preset application. In another example, if the natural language processing model is a Continuous Bag-of-Words (CBOW) model in a word2vec model, the natural language processing model may be trained based on each of the application sequences, and after the training is completed, an output result of the trained natural language processing model based on each of the application sequences is obtained.

One principle of processing in a natural language processing model after inputting each of the application sequences into the natural language processing model is described below by taking a Continuous Bag-of-Words (CBOW) model in a word2vec model as an example.

In the CBOW model, conditional probabilities can be used to model to predict a first application of the application sequence. Wherein, the model modeling target is as formula 001:

P(w _t |w _t-c :w _t+c )

wherein w is _t For the predicted first application, w _t-c Information of first applications, w, of first c first applications in corresponding application sequences _t+c And the information of the last c first applications of the first applications in the corresponding application sequence.

In a given application sequence w ₁ ,w ₂ ,w ₃ ...w _t The objective function of the CBOW model is the log likelihood function that maximizes equation 001, such as equation 002:

where T is the length of the corresponding application sequence, w _t For the predicted first application, w _t-c Information of first applications, w, of first c first applications in corresponding application sequences _t+c And the information of the last c first applications of the first applications in the corresponding application sequence.

And calculating a softmax function according to the conditional probability through a formula 003, so as to construct a CBOW model based on the softmax function.

Wherein, formula 003 is as follows:

Wherein the method comprises the steps of

In the embodiment of the application, when the natural language processing model is trained based on each application sequence, whether the natural language processing model is trained is determined to be completed according to the loss function and/or the preset iteration times. For example, after the number of iterations of the natural language processing model reaches a preset number of iterations, it may be determined that the training of the natural language processing model is completed; furthermore, the model may be determined based on a loss function of the natural language processing model. The loss function of the natural language processing model can adopt the existing or future generated loss function, and the selection of the loss function can be determined according to actual requirements.

In the embodiment of the application, the specific form of the feature vector can be various, and the feature vector can be represented by a matrix, a vector and the like by way of example. In some applications, the feature vector may be a Word vector (Word spotting). The word vector may be referred to as word embedding. At this time, the names of the respective first applications in the input application sequence may be converted into the form of word vectors by the natural language processing model. The dimensionality of the word vector may be set by a developer based on test results, or the like. The word vector corresponding to each first application may be an 8-dimensional vector.

Step S103, according to the feature vector, determining similarity between the basic application and other applications, wherein the basic application is a first application corresponding to a preset label, and the other applications are applications except the basic application in the first application.

In the embodiment of the application, the corresponding features of the first application can be represented by each numerical value in the feature vector. For example, taking a Continuous Bag-of-Words (CBOW) model in a word2vec model as an example, each value in a feature vector output by the CBOW model for a certain first application may represent a classification probability of each node in a Huffman tree of the CBOW model for the corresponding first application. The number of nodes may be the dimension of the feature vector.

The basic application can be determined in a plurality of ways. For example, the base application may be determined from the respective first applications according to information of operation behavior (e.g., installation, uninstallation, use, etc.) of the respective first applications by the user, release time of the respective first applications, and the like. It should be noted that the number of the basic applications may be determined according to the actual scenario, which is not limited herein. For example, the number of the base applications may be determined according to the number of the first applications, or may be predetermined by a developer. In some application scenarios, if there are hundreds of thousands of first applications, there may be 1000 base applications. The preset label of the basic application can be obtained in various modes. For example, it can be obtained by manual labeling; alternatively, it may be obtained by extracting keywords from the description information of the base application.

In some embodiments, the feature vector is an N-dimensional digital vector, N being an integer greater than 1;

and determining the similarity between the basic application and other applications according to the feature vector, wherein the method comprises the following steps of:

for any one basic application and any one other application, calculating the inner product of the feature vector of the basic application and the feature vector of the other application;

and determining the similarity between the basic application and the other applications according to the inner product.

In the embodiment of the present application, the feature vector may be an N-dimensional digital vector, where the digital vector of each dimension may represent probability information of a corresponding first application in a certain feature dimension, so that the similarity between the base application and the other applications may be determined by calculating an inner product of the feature vector of the base application and the feature vector of the other applications.

For example, if the feature vector of the application a is a= [ a ] ₁ ,a ₂ ,a ₃ ,...a _n ]And the feature vector of application B is b= [ B ] ₁ ,b ₂ ,b ₃ ,...b _n ]Then the inner product of the feature vector of the base application and the feature vector of the other application is:

S＝a·b＝a ₁ b ₁ +a ₂ b ₂ +a ₃ b ₃ +...+a _n b _n

in some examples, the inner product may be used as a similarity between the base application and the other applications, and of course, the inner product and other comparison information (such as comparison information between corresponding keywords and the like) between the base application and the other applications may also be combined.

In some embodiments, the determining the similarity between the base application and the other applications according to the inner product includes:

taking the inner product as a first similarity between the base application and the other applications;

determining a second similarity between the basic application and the other applications according to the description information of the basic application and the description information of the other applications;

and determining the similarity between the basic application and the other applications according to the first similarity, the second similarity, the third weight of the first similarity and the fourth weight of the second similarity.

In the embodiment of the application, the inherent attribute information about the basic application and other applications can be acquired according to the description information of the basic application and the description information of the other applications, so that the second similarity between the basic application and the other applications is determined according to the inherent attribute.

The method includes the steps of extracting keywords corresponding to the basic application from description information of the basic application, extracting keywords corresponding to other applications from description information of the other applications, and determining second similarity between the basic application and the other applications according to the keywords corresponding to the basic application and the keywords corresponding to the other applications.

For example, for application a, the description information of application a is:

1. thousands of voices can be sent and received through a small amount of flow, so that electricity and flow are saved; 2. a friend circle for sharing the life drops with friends; 3. shaking to check nearby people, and no strangers exist in the world; 4. the scanner can scan commodity bar codes, book covers and CD covers, and even scan English words to translate into Chinese; 5. public account, pay attention to stars, watch news and set reminding through the public account; 6. a game center for playing a game with friends; 7. expression store, interesting and fun expression.

The keywords of the application A can be extracted from the keyword: voice, message, tag, picture, video, friends, etc.

After the keywords corresponding to the basic application and the keywords corresponding to the other applications are obtained, exemplary keywords corresponding to the basic application and keywords corresponding to the other applications may be compared to determine a second similarity between the basic application and the other applications according to the proportion of the keywords corresponding to the basic application and the keywords corresponding to the other applications that are matched with each other.

Alternatively, a second similarity between the base application and the other applications may also be calculated by cosine similarity (Cosine similarity). For example, an n-dimensional sample point a (x 11, x12, …, x1 n) of the base application may be constructed according to keywords corresponding to the base application, and an n-dimensional sample point b (x 21, x22, …, x2 n) of the other application may be constructed according to keywords corresponding to the other application, so that the second similarity is calculated according to the a (x 11, x12, …, x1 n) and b (x 21, x22, …, x2 n) through a calculation formula of cosine similarity.

The calculation formula of cosine similarity is as follows:

then, the similarity between the base application and the other applications may be determined according to the first similarity, the second similarity, the third weight of the first similarity, and the fourth weight of the second similarity. Wherein, by way of example, the third weight may be 0.7 and the second weight may be 0.3.

In the embodiment of the application, after the inner product is obtained, the description information of the basic application and the description information of the other applications can be combined to judge the similarity between the basic application and the other applications, so that the similarity of the basic application to the other applications is more comprehensively evaluated by considering the description information besides the use behavior of the application by a user, and the judging dimension is improved, thereby improving the accuracy of the similarity.

In some embodiments, before step S103, further comprising:

step S201, determining a basic application according to application operation data corresponding to a plurality of users respectively;

step S202, for each basic application, acquiring at least two groups of initial tag sets of the basic application, wherein each group of initial tag sets comprises at least one initial tag;

step S203, using the same initial label as the preset label of the base application among the initial label sets.

In the embodiment of the present application, the application operation data may include data of operations such as installation, use, uninstallation, etc. of the corresponding first application by the user. According to application operation data corresponding to a plurality of users, a more representative application can be determined from the first applications and used as a basic application. For example, it may be determined that the Y first applications with the largest number of cumulative installations are the base applications in a certain period of time, or that the Y first applications with the highest frequency of use by the user are the base applications in a certain period of time, or the like.

After determining the base application, at least two initial tag sets for the base application may be obtained. The specific obtaining modes of the initial tag sets can be multiple, and each initial tag set can be respectively generated by different manual annotators and/or different annotating modes. For example, the initial label sets obtained by manually labeling a certain basic application by a plurality of manual labels may be obtained, in addition, keywords may be extracted from description information of the basic application as a set of initial label sets, and then the same initial label is used as a preset label of the basic application between each set of initial label sets.

By the embodiment of the application, multiple groups of initial label sets of the same basic application can be obtained, so that cross labeling of the same basic application is realized, and the same initial label is extracted from the cross labeling to serve as a preset label of the basic application. At this time, because the information of a plurality of initial labels of the same basic application is combined, the accuracy of the labels of the basic application can be greatly improved, and the accurate labels of the basic application are realized.

In some embodiments, for each user, the application operation data of the user includes a second application corresponding to the user, and a use duration and a use number of times of the user for each second application used in the preset time period, where the second application is a first application used by the user in the preset time period;

the determining the basic application according to the application operation data respectively corresponding to the plurality of users comprises the following steps:

for each second application, determining a heat score of the second application according to the use duration and the use times of each user for the second application in the preset time period;

and determining a basic application from each second application according to the heat scores of the second applications.

In the embodiment of the application, the heat score can indicate the using liveness of the user of the corresponding second application. The method for determining the heat score of the second application may be multiple according to the use duration and the use times of the second application by the user using the second application in the preset time period, for example, the use duration and the use times may be processed respectively by a normalized dimension change manner, so as to add or multiply the normalized use duration and the normalized use times. In addition, weights of the use duration and the use frequency can be set respectively to perform weighted calculation, so that a heat score of the second application is obtained.

And quantitatively evaluating the user using liveness of each second application according to the heat scores, so as to determine the basic application from each second application.

For example, the second application with a heat score greater than the preset score may be used as the base application, or R applications with the highest heat score (e.g., 1000 applications with the highest heat scores) may be used as the base application.

In some embodiments, for each second application, determining the heat score of the second application according to the usage duration and the usage times of the second application by the respective user in the preset time period includes:

For each second application, calculating a heat score for the second application according to a first formula, wherein the first formula is:

wherein S is _h A heat score of a second application h, n is the number of users using the second application h in the preset time period, F _i For the number of times of use of the second application h by the user i in the preset time period, T _i F, for the duration of the use of the second application h by the user i in the preset time period _max For the maximum number of uses of the second application in the preset time period, T _max For the longest use times, w, corresponding to the second application in the preset time period _f Is the first weight, w _t Is a second weight.

In the embodiment of the present application, the user i is the i-th user of n users using the second application h in the preset time period. Through Fmax and Tmax, fi and Ti can be normalized, respectively, so that the features of two different dimensions can be weighted, thereby obtaining the heat score.

The w is _f And w _t The specific value of (c) may be determined according to the specific scenario. For example, in some examples, the number of uses may be considered to be of greater importance than the time of use, then w _f Can be greater than w _t For example w _f Can be 0.65, w _t May be 0.35.

Step S104, determining application labels corresponding to other applications respectively according to the similarity and the preset labels of the basic applications.

In the embodiment of the application, since the preset label of the basic application is obtained in advance, whether the types of the basic application and other applications are similar can be judged according to the similarity between other applications and the basic application, and if so, the application labels of the other applications can be determined according to the preset label of the basic application.

For example, if the application a is an application based, the preset tag of the application a is social communication. And if the similarity between the application B belonging to other applications and the application A is detected to be greater than a preset similarity threshold (such as 60%), setting the application label of the application B as social communication. In some examples, there may be a similarity between the application B and the base application a that is greater than a preset similarity threshold, and a similarity between the application B and the base application C that is also greater than a preset similarity threshold, and then the application label of the application B may include a preset label of the base application a and a preset label of the base application C.

In some embodiments, the determining, according to the similarity and the preset label of the base application, an application label corresponding to each other application includes:

aiming at each basic application, taking a preset label of the basic application as at least part of application labels of M other applications with highest similarity with the basic application, wherein M is a positive integer;

or, for each basic application, taking the preset label of the basic application as at least part of application labels of other applications with the similarity with the basic application being larger than a preset similarity threshold value.

In the embodiment of the application, since the preset label of the basic application is the accurate label related to the basic application, the application of which the type is the same as that of each basic application can be determined according to the similarity, thereby realizing the rapid classification of each first application and labeling each other application after classification according to the preset label of the basic application.

Therefore, according to the embodiment of the application, the application labels of a large number of applications can be rapidly determined according to the preset labels of a small number of basic applications, so that rapid labeling of the large number of applications can be realized, and the labeling efficiency is higher.

In some embodiments, after determining application tags corresponding to each of the other applications, the first application having the same corresponding tag (including the preset tag and the application tag) may be counted, and a statistics list may be output according to the statistics result, so that a quick manual review may be implemented manually according to the statistics list.

Of course, tag review may be performed in other ways. For example, in some embodiments, after determining the application label of each first application according to the similarity and the preset label of the base application, the method further includes:

and for each other application, checking the application label of the other application according to the description information of the other application.

In an exemplary embodiment, the application labels of the other applications may be compared with information such as keywords in the corresponding description information, and if the application labels are matched with the information, it is determined that the application labels of the other applications are correct.

For example, for each other application, it may be determined whether at least X application tags exist in application tags of the other applications, and each application tag is matched with a keyword extracted from description information of the other application, and if the application tags are matched with the keywords, it may be determined that the application tags of the other applications are correct.

At this time, the automatic verification of the application tag can be realized by combining the description information of the other applications, and the application tag can be verified by combining the multidimensional application information, so that the accuracy of the application tag is verified from the multidimensional, and the accuracy and the efficiency of labeling the application are improved.

In the embodiment of the application, application sequences corresponding to a plurality of users can be acquired respectively, and each application sequence contains text information for describing a first application related to the corresponding user, so that the text information describing the associated application of each user can be acquired, each application sequence is input into a trained natural language processing model, the output result of the trained natural language processing model based on each application sequence is obtained, and the feature vector of each first application can be accurately and efficiently extracted through the trained natural language processing model; and then, according to the feature vectors respectively corresponding to the first applications, determining the similarity between the basic application and other applications, so that the application labels respectively corresponding to the other applications can be determined according to the similarity and the preset labels of the basic application. At this time, the similarity between each basic application and other applications can be evaluated according to the feature vectors obtained by accurate and efficient extraction, so that the labels of other similar applications can be accurately determined by combining the preset labels obtained by pre-labeling the basic applications, and the accuracy of labeling the applications is improved.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

Corresponding to the application labeling method described in the above embodiments, fig. 3 shows a block diagram of an application labeling device according to an embodiment of the present application, and for convenience of explanation, only the portions related to the embodiment of the present application are shown.

Referring to fig. 3, the application labeling apparatus 3 includes:

the acquiring module 301 is configured to acquire application sequences respectively corresponding to a plurality of users, where each application sequence includes text information for describing a first application related to the corresponding user;

the processing module 302 is configured to input each application sequence into a trained natural language processing model, obtain an output result of the trained natural language processing model based on each application sequence, where the output result includes feature vectors corresponding to each first application;

a first determining module 303, configured to determine, according to the feature vector, similarities between a base application and other applications, where the base application is a first application corresponding to a preset tag, and the other applications are applications in the first application except for the base application;

And a second determining module 304, configured to determine application labels corresponding to the other applications respectively according to the similarity and the preset labels of the basic applications.

Optionally, the application labeling device 3 further includes:

the third determining module is used for determining basic application according to application operation data corresponding to a plurality of users respectively;

the second acquisition module is used for acquiring at least two groups of initial tag sets of each basic application, wherein each group of initial tag sets comprises at least one initial tag;

the setting module is used for taking the same initial label among the initial label sets as the preset label of the basic application.

Optionally, for each user, the application operation data of the user includes a second application corresponding to the user, and a use duration and a use number of the second application used by the user in the preset time period, where the second application is a first application used by the user in the preset time period;

the third determining module includes:

a first determining unit, configured to determine, for each second application, a heat score of the second application according to a usage duration and a usage number of the second application by each user in the preset time period;

And the second determining unit is used for determining the basic application from each second application according to the heat score of each second application.

Optionally, the first determining unit is specifically configured to:

Optionally, the feature vector is an N-dimensional digital vector, and N is an integer greater than 1;

the first determining module 303 includes:

a calculating unit, configured to calculate, for any one basic application and any one other application, an inner product of a feature vector of the basic application and a feature vector of the other application;

And a third determining unit, configured to determine a similarity between the base application and the other applications according to the inner product.

Optionally, the third determining unit includes:

a processing subunit configured to take the inner product as a first similarity between the base application and the other application;

a first determining subunit, configured to determine, according to the description information of the base application and the description information of the other application, a second similarity between the base application and the other application;

and the second determining subunit is used for determining the similarity between the basic application and the other applications according to the first similarity, the second similarity, the third weight of the first similarity and the fourth weight of the second similarity.

The second determining module 304 is specifically configured to:

It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

Fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 4, the terminal device 4 of this embodiment includes: at least one processor 40 (only one is shown in fig. 4), a memory 41 and a computer program 42 stored in the memory 41 and executable on the at least one processor 40, the processor 40 implementing the steps in any of the various application labeling method embodiments described above when executing the computer program 42.

The terminal device 4 may be a server, a mobile phone, a wearable device, an Augmented Reality (AR)/Virtual Reality (VR) device, a desktop computer, a notebook computer, a desktop computer, a palm computer, or other computing devices. The terminal device may include, but is not limited to, a processor 40, a memory 41. It will be appreciated by those skilled in the art that fig. 4 is merely an example of the terminal device 4 and does not constitute a limitation of the terminal device 4, and may include more or less components than illustrated, or may combine certain components, or different components, such as may also include input devices, output devices, network access devices, etc. The input device may include a keyboard, a touch pad, a fingerprint collection sensor (for collecting fingerprint information of a user and direction information of the fingerprint), a microphone, a camera, and the like, and the output device may include a display, a speaker, and the like.

The processor 40 may be a central processing unit (Central Processing Unit, CPU), and the processor 40 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 41 may in some embodiments be an internal storage unit of the terminal device 4, such as a hard disk or a memory of the terminal device 4. The memory 41 may also be an external storage device of the terminal device 4 in other embodiments, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 4. Further, the memory 41 may include both the internal storage unit and the external storage device of the terminal device 4. The memory 41 is used for storing an operating system, an application program, a Boot Loader (Boot Loader), data, other programs, and the like, such as program codes of the computer programs. The above-described memory 41 may also be used to temporarily store data that has been output or is to be output.

In addition, although not shown, the terminal device 4 may further include a network connection module, such as a bluetooth module Wi-Fi module, a cellular network module, and so on, which will not be described herein.

In the embodiment of the present application, when the processor 40 executes the computer program 42 to implement the steps in any of the foregoing embodiments of the application labeling method, an application sequence corresponding to each of a plurality of users may be obtained, where each application sequence includes text information describing a first application related to the corresponding user, so that text information describing an associated application of each user may be obtained, and each application sequence may be input into a trained natural language processing model, so that an output result of the trained natural language processing model based on each application sequence may be obtained, and thus, feature vectors of each first application may be accurately and efficiently extracted through the trained natural language processing model; and then, according to the feature vectors respectively corresponding to the first applications, determining the similarity between the basic application and other applications, so that the application labels respectively corresponding to the other applications can be determined according to the similarity and the preset labels of the basic application. At this time, the similarity between each basic application and other applications can be evaluated according to the feature vectors obtained by accurate and efficient extraction, so that the labels of other similar applications can be accurately determined by combining the preset labels obtained by pre-labeling the basic applications, and the accuracy of labeling the applications is improved.

The embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps for implementing the various method embodiments described above.

Embodiments of the present application provide a computer program product enabling a terminal device to carry out the steps of the method embodiments described above when the computer program product is run on the terminal device.

The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. The computer program comprises computer program code, and the computer program code can be in a source code form, an object code form, an executable file or some intermediate form and the like. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of modules or elements described above is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. An application labeling method, comprising:

determining application labels corresponding to other applications respectively according to the similarity and the preset labels of the basic application;

before determining the similarity between the basic application and other applications in the first application according to the feature vector, the method further comprises the following steps:

determining basic application according to application operation data corresponding to a plurality of users respectively;

for each basic application, acquiring at least two groups of initial tag sets of the basic application, wherein each group of initial tag sets comprises at least one initial tag;

the same initial label is used as a preset label of the basic application among the initial label sets;

The application operation data of each user comprises second applications corresponding to the user, and the use duration and the use times of the second applications used by the user in a preset time period, wherein the second applications are first applications used by the user in the preset time period;

determining a basic application from each second application according to the heat score of each second application;

for each second application, determining a heat score of the second application according to the use duration and the use times of each user for the second application in the preset time period, including:

2. The method of claim 1, wherein the feature vector is an N-dimensional number vector, N being an integer greater than 1;

3. The application labeling method of claim 2, wherein the determining the similarity between the base application and the other applications based on the inner product comprises:

4. The application labeling method according to any one of claims 1 to 3, wherein the determining, according to the similarity and the preset label of the base application, the application label corresponding to each of the other applications includes:

5. An application labeling apparatus, comprising:

the second determining module is used for determining application labels corresponding to other applications respectively according to the similarity and the preset labels of the basic application;

the application labeling device further comprises:

the setting module is used for taking the same initial label as a preset label of the basic application among the initial label sets;

the third determining module includes:

a second determining unit, configured to determine a base application from each second application according to a heat score of each second application;

the first determining unit is further configured to:

6. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the application tagging method according to any one of claims 1 to 3 when executing the computer program.

7. A computer-readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the application tagging method according to any one of claims 1 to 3.