CN114385903A

CN114385903A - Application account identification method and device, electronic equipment and readable storage medium

Info

Publication number: CN114385903A
Application number: CN202011139997.0A
Authority: CN
Inventors: 康战辉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-10-22
Filing date: 2020-10-22
Publication date: 2022-04-22
Anticipated expiration: 2040-10-22
Also published as: CN114385903B

Abstract

The application relates to the technical field of internet, and discloses an application account identification method, an application account identification device, electronic equipment and a readable storage medium, wherein the application account identification method comprises the following steps: the method comprises the steps of obtaining at least one target hot search word of an application program, and obtaining a title of at least one first content message issued by at least one application account through the application program; determining a first probability that the application account belongs to a specific category based on the target hot search word and the title of the first content information; acquiring at least one piece of second content information issued by the application account through the application program, and determining a second probability that the application account belongs to a specific category based on the second content information; determining a category of the application account based on the first probability and the second probability. According to the application account identification method, the accuracy of application account category identification can be effectively improved through an artificial intelligence technology.

Description

Application account identification method and device, electronic equipment and readable storage medium

Technical Field

The application relates to the technical field of internet, in particular to an application account identification method and device, an electronic device and a readable storage medium.

Background

As mobile internet content moves from a single image-text to image-text combined short video, more and more application accounts are used on various application programs to issue a large amount of short video content, for example, more and more public accounts are used on a wechat platform to issue videos.

Hot Search terms appear in various application programs, more and more application accounts for the hot Search terms publish information associated with the hot Search terms, such as video publication or image-text information for guiding, namely, SEO (Search Engine Optimization), so that the types of the application accounts need to be identified.

At present, keywords are generally carried out on information issued by an application account to identify the type of the application account, and the accuracy of identification in this way is not high enough.

Disclosure of Invention

The purpose of the present application is to solve at least one of the above technical drawbacks, and to provide the following solutions:

in a first aspect, a method for identifying an application account is provided, including:

the method comprises the steps of obtaining at least one target hot search word of an application program, and obtaining a title of at least one first content message issued by at least one application account through the application program;

determining a first probability that the application account belongs to a specific category based on the target hot search term and the title of the first content information;

acquiring at least one piece of second content information issued by the application account through the application program, and determining a second probability that the application account belongs to the specific category based on the second content information;

determining a category of the application account based on the first probability and the second probability.

In an optional embodiment of the first aspect, the first content information comprises at least one of first teletext information and video; the second content information includes second teletext information.

In an optional embodiment of the first aspect, determining a first probability that the application account belongs to a particular category based on the target hot search term and the title of the first content information comprises:

determining semantic similarity between the title of the first content information and the target hot search word;

determining the first probability based on the semantic similarity.

In an optional embodiment of the first aspect, determining a semantic similarity between the title of the first content information and the target hot search term comprises:

converting the title of the first content information into a title vector, and converting the target hot search word into a corresponding hot search word vector;

determining semantic similarity between the title vector and the hot search term vector.

In an optional embodiment of the first aspect, converting the title of the first content information into a title vector comprises:

splitting the title into at least one word;

if the number of the words obtained by splitting is larger than or equal to the preset number, converting the words with the preset number in the front in the sequence in the title into the title vector;

and if the number of the words obtained by splitting is less than the preset number, repeating the last word in the title until the number of the words is equal to the preset number, and converting the title after the words are repeated into the title vector.

In an optional embodiment of the first aspect, determining the first probability based on the semantic similarity comprises:

determining a first quantity of the at least one first content information, and determining a second quantity of the at least one target hot search word;

normalizing the determined semantic similarity based on a maximum value of the first number and the second number to obtain the first probability.

In an optional embodiment of the first aspect, determining a second probability that the application account belongs to the particular category based on the second content information comprises:

converting the second content information into text information in a preset format;

performing word segmentation on the text information to obtain at least one word, and acquiring a word vector corresponding to the at least one word;

converting the word vector into a vector to be classified, classifying the vector to be classified, and determining the type of the second content information;

and determining the second probability based on the type corresponding to at least one piece of second content information issued by the application account.

In an optional embodiment of the first aspect, converting the word vector into a vector to be classified comprises:

obtaining the average value of the numerical values of every adjacent preset dimension in the word vector;

and constructing the vector to be classified based on the obtained average value.

In an optional embodiment of the first aspect, identifying the category of the application account based on the first probability and the second probability comprises:

fusing the first probability and the second probability based on preset weight to obtain a fused numerical value;

and determining the category of the application account corresponding to the fusion numerical value.

In an optional embodiment of the first aspect, fusing the first probability and the second probability based on a preset weight to obtain a fused value includes:

acquiring the registration time of the application account, and determining a third probability corresponding to the registration time;

and fusing the first probability, the second probability and the third probability based on preset weight to obtain a fused numerical value.

In an optional embodiment of the first aspect, the target hot search word is a hot search word of the application program within a preset time period, and the first content information is issued by the application account through the application program within the preset time period.

In a second aspect, an apparatus for identifying an application account is provided, including:

the system comprises an acquisition module, a search module and a search module, wherein the acquisition module is used for acquiring at least one target hot search word of an application program and acquiring a title of at least one first content information issued by at least one application account through the application program;

a first determining module, configured to determine, based on the target hot search term and a title of the first content information, a first probability that the application account belongs to a specific category;

the second determining module is used for acquiring at least one piece of second content information issued by the application account through the application program, and determining a second probability that the application account belongs to the specific category based on the second content information;

an identification module to determine a category of the application account based on the first probability and the second probability.

In an alternative embodiment of the second aspect, the first content information comprises at least one of first teletext information and video; the second content information includes second teletext information.

In an optional embodiment of the second aspect, the first determining module, when determining the first probability that the application account belongs to the specific category based on the target hot search term and the title of the first content information, is specifically configured to:

determining the first probability based on the semantic similarity.

In an optional embodiment of the second aspect, when determining the semantic similarity between the title of the first content information and the target hot search term, the first determining module is specifically configured to:

In an optional embodiment of the second aspect, when the first determining module converts the title of the first content information into a title vector, the first determining module is specifically configured to:

splitting the title into at least one word;

In an optional embodiment of the second aspect, the first determining module, when determining the first probability based on the semantic similarity, is specifically configured to:

In an optional embodiment of the second aspect, when determining, based on the second content information, a second probability that the application account belongs to the specific category, the second determining module is specifically configured to:

In an optional embodiment of the second aspect, the second determining module, when converting the word vector into a vector to be classified, is specifically configured to:

acquiring an average value of numerical values of every adjacent preset dimension in the word vector;

and constructing a vector to be classified based on the obtained average value.

In an optional embodiment of the second aspect, when determining the category of the application account based on the first probability and the second probability, the identification module is specifically configured to:

fusing the first probability and the second probability based on preset weight to obtain a fused value;

In an optional embodiment of the second aspect, the identification module is specifically configured to, when the first probability and the second probability are fused based on the preset weight to obtain a fused value:

and fusing the first probability, the second probability and the third probability based on the preset weight to obtain a fused numerical value.

In an optional embodiment of the second aspect, the target hot search word is a hot search word of the application program within a preset time period, and the first content information is issued by the application account through the application program within the preset time period.

In a third aspect, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method for identifying an application account according to the first aspect of the present application is implemented.

In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for identifying an application account according to the first aspect of the present application.

The beneficial effect that technical scheme that this application provided brought is:

the method comprises the steps of determining a first probability that an application account belongs to a specific category according to a target hot search word of an application program and a title of first content information issued by the application account, determining a second probability that the application account belongs to the specific category according to second content information issued by the application account, and identifying the category of the application account by combining the first probability and the second probability.

Furthermore, by converting the word vector into a vector to be classified, the semantics of at least two adjacent words can be combined, the at least two combined words may have relevance, the obtained semantics are more complete, the accuracy of the classification result can be improved, and the calculation amount in the classification process is reduced.

Furthermore, a third probability that the application account belongs to a specific category is determined according to the registration time of the application account, the category of the application account is identified according to the first probability, the second probability and the third probability, and the accuracy of identification of the category of the application account can be further improved.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is an application environment diagram of an application account identification method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of an identification method for an application account according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of an identification method for an application account according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram illustrating a scheme for setting the number of words in a title of a video to a preset number in an example provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a scheme for classifying words in an example provided by an embodiment of the present application;

fig. 6 is a schematic diagram of a scheme for classifying vectors to be classified in an example provided by an embodiment of the present application;

fig. 7 is a schematic diagram of a scheme for identifying categories of application accounts according to an embodiment of the present disclosure;

fig. 8 is a schematic diagram of a scheme for identifying categories of application accounts according to an embodiment of the present disclosure;

fig. 9 is a flowchart illustrating an identification method for an application account in an example provided by the embodiment of the present application;

fig. 10 is a schematic structural diagram of an application account identification apparatus according to an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of an electronic device applying account identification according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

The method and the device can determine the category of the application account based on the target hot search word of the application program, the title of the first content information issued by the application account and the second content information issued by the application account through a natural language processing technology.

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The application account identification method and device, the electronic device and the computer-readable storage medium aim to solve the technical problems in the prior art.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

SEO is to utilize the rules of search engines to improve the natural ranking of websites in the related search engines, so as to lead the websites in the industry and obtain brand benefits. It is largely a business activity of the website operator that moves its own or its company's rank forward.

The application account identification method provided by the application can be applied to the application environment shown in fig. 1. Specifically, the application program generates a target hot search word according to the search words of the multiple users, acquires a title of first content information issued by the application account, acquires second content information issued by the application account, and identifies the type of the application account according to the second content information and the title of the first content information, for example, determines whether the application account is an SEO application account.

The identification method of the application account can be performed in a terminal, and can also be applied to a server.

Those skilled in the art will understand that the "terminal" used herein may be a Mobile phone, a tablet computer, a PDA (Personal Digital Assistant), an MID (Mobile Internet Device), etc.; a "server" may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.

A possible implementation manner is provided in the embodiment of the present application, and as shown in fig. 2, a method for identifying an application account is provided, where the method may be applied to a terminal or a server, and may include the following steps:

step S201, acquiring at least one target hot search word of the application program, and acquiring a title of at least one first content information issued by the application account through the application program.

The hot search term is a term of the hot search list of the application program, and may be determined according to a search term or a search sentence input by a plurality of users when searching through the application program, and the hot search term is not necessarily in the form of a term, and may be in the form of a phrase or a sentence.

Specifically, the target hot search word may be a hot search word of the application program within a preset time period, and the first content information may be issued by the application account through the application program within the preset time period.

The first content information may include at least one of first image-text information and video, that is, video published by the application account in a preset time period, or first image-text information published by the application account in the preset time period, where the first image-text information may include an article published by the application account, where the article includes text information, and may also include an image or a video. Specifically, the application account may be an account registered by a developer, a user, or a merchant on a platform of the application program, for example, if the application program is a WeChat, the application account may be a public number.

The preset time period may be a time period between the current hot search word being updated and the next hot search word being updated, for example, if the hot search word is updated once a day, the preset time period is a time period of the current day corresponding to the current hot search word.

Step S202, determining a first probability that the application account belongs to a specific category based on the target hot search term and the title of the first content information.

The specific category may be a purpose category for representing that the application account publishes the video, and may include, for example, an SEO category and a non-SEO category.

Specifically, the first probability belonging to the specific category may be determined by determining a semantic similarity between the target hot search term and the title of the first content information, and a specific process of determining the first probability will be described in detail below.

Step S203, at least one piece of second content information issued by the application account through the application program is acquired, and a second probability that the application account belongs to the specific category is determined based on the second content information.

The second content information may be second image-text information released by the application account, and the second image-text information may also include an article released by the application account, where the article includes the text information and may also include an image or a video.

Specifically, the first teletext information and the second teletext information may be the same or different.

Specifically, the text information in the second content information may be acquired, the text information may be classified to obtain a second probability, and a process of specifically determining the second probability is described in detail below.

And step S204, determining the category of the application account based on the first probability and the second probability.

Specifically, the first probability and the second probability may be fused to obtain a fusion result, and the category of the application account is identified according to the fusion result; the process of identifying the category of the application account, specifically identifying the category of the application account, may also be performed in combination with the registration time and the fusion result of the application account, which will be described in detail below.

According to the method for identifying the application account, the first probability that the application account belongs to the specific category is determined according to the target hot search word of the application program and the title of the first content information issued by the application account, the second probability that the application account belongs to the specific category is determined according to the second content information issued by the application account, and the category of the application account is identified by combining the first probability and the second probability, so that the accuracy of identifying the category of the application account can be improved by considering the relationship between the title of the first content information issued by the application account and the hot search word and also considering the category corresponding to the second content information issued by the application account.

The specific process of determining the first probability will be described below in conjunction with specific embodiments.

As shown in fig. 3, the determining, based on the target hot search term and the title of the first content information in step S202, a first probability that the application account belongs to a specific category may include:

step S210, determining semantic similarity between the title of the first content information and the target hot search term.

Specifically, the title of the first content information and the target hot search word may be converted into corresponding vectors, and the similarity between the vectors is calculated to obtain the semantic similarity between the title and the target hot search word.

In a specific implementation process, the determining the semantic similarity between the title of the first content information and the target hot search term in step S210 may include:

(1) and converting the title of the first content information into a title vector, and converting the target hot search word into a corresponding hot search word vector.

The process of converting the title into the title vector will be described in detail below.

Specifically, converting the title of the first content information into a title vector may include:

a. the title is broken into at least one word.

Specifically, a word segmentation tool, such as an open source jieba word segmentation (a word segmentation tool), may be used to segment the title to obtain a plurality of words, i.e., word sequences.

b. And if the number of the words obtained by splitting is larger than or equal to the preset number, converting the words with the preset number in the front in the sequence in the title into the title vector.

Specifically, the length of the title is inconsistent with that of the target hot search word, that is, the obtained vector dimension is inconsistent, so that the title can be fixedly converted into a vector with a preset dimension.

c. And if the number of the words obtained by splitting is less than the preset number, repeating the last word in the title until the number of the words is equal to the preset number, and converting the title after the words are repeated into a title vector.

For example, mapping a single word to a 100-dimensional vector, the title can be fixed as 20 words in a "more-truncated-less-complemented" manner: the first 20 words of more than 20 words are selected, and the last word is repeatedly complemented by adopting less than 20 words.

As shown in fig. 4, taking the preset number of 5 as an example, if there are only three words, repeating the word 3 until 5 words are obtained; if there are 6 words, take the first 5 words.

Specifically, taking a 100-dimensional vector and 20 words as an example, the title may be mapped to a 100 × 20 two-dimensional vector, and the formal representation may be as follows:

v_text＝f_text(x_text)∈R^100×20 (1)

wherein v is_textIs the converted title vector;

the above description is directed to the process of converting the title into the vector, and the process of converting the target hot search word into the hot search word vector is the same, and the target hot search word is also split and converted into the vector with the preset dimension, and the above "multi-segment and few-complement" mode can also be followed.

Specifically, by the above way of multi-section and less supplement, when the number of the first content information is large and the title is long, the calculation amount can be effectively reduced, and the dimensions of the title vector and the hot search word vector can be the same, that is, the number of the included elements is the same, thereby improving the accuracy of similarity calculation.

(2) Semantic similarity between the title vector and the hot search term vector is determined.

Specifically, a cosine similarity algorithm may be used to calculate the semantic similarity between each title vector and each hot search word vector, where the calculation formula is as follows:

wherein, V_title(x)Representing a title vector; v_query(y)Representing a hot search word vector; v_{title(x)_i}Represents the ith element in the header vector; v_{query(y)_i}Representing the ith element in the hot search word vector; i is a natural number.

Step S220, determining a first probability based on the semantic similarity.

In a specific implementation, the first probability may be determined according to the first number of the first content information, the second number of the target hot search terms, and the determined semantic similarity.

Specifically, the determining the first probability based on the determined semantic similarity in step S220 may include:

(1) determining a first quantity of at least one first content information, and determining a second quantity of at least one target hot search word;

(2) based on the maximum value in the first quantity and the second quantity, the determined semantic similarity is normalized to obtain a first probability.

Specifically, the following formula can be adopted:

wherein, V_title(x)Representing a title vector; v_query(y)Representing a hot search word vector; m represents the number of hot search words; k represents the number of videos issued by the application account; m and k are both natural numbers.

The numerator meaning of the formula is that if the number of the target hot search words in all the titles of the first content information of one application account and the current hot list is larger, the accuracy of judging the correlation between the titles and the target hot list is higher, and the denominator is that the maximum value of the number of new videos of the hot list and the current application account is used for normalization, so that the overall value of the denominator is 0-1.

The above embodiments illustrate the process of determining the first probability, and the specific process of determining the second probability will be described below in conjunction with specific embodiments.

A possible implementation manner is provided in this embodiment of the application, and the determining, based on the second content information, the second probability that the application account belongs to the specific category in step S203 may include:

(1) converting the second content information into text information in a preset format;

(2) the text information is subjected to word segmentation to obtain at least one word, and a word vector corresponding to the at least one word is obtained.

Specifically, the following steps may be adopted:

a. detecting the font of the text information, and converting the font of the text information into a preset font; for example, a traditional character is converted into a simplified character;

b. segmenting the text information; for example, perform ansj (a word segmentation algorithm) chinese word segmentation;

c. removing the preset characters after word segmentation to obtain text information in a preset format; for example, the blank characters and punctuation marks are filtered to obtain the text information in the preset format.

Word vectors corresponding to a plurality of words can be preset, and the word vectors corresponding to the plurality of words obtained by word segmentation can be inquired.

(3) And converting the word vectors into vectors to be classified, classifying the vectors to be classified, and determining the type of the second content information.

Specifically, converting the word vector into a vector to be classified may include:

a. acquiring an average value of numerical values of every adjacent preset dimension in the word vector;

b. and constructing a vector to be classified based on the obtained average value.

For example, except that (x)₁,x₂,…,x_N-1,x_N) Representing an N-gram (N-dimensional) vector corresponding to one text message, wherein the vector to be classified can select the average value of every two adjacent elements, namely

The predetermined dimension may also be other numbers, such as 3, for example, an average value of every adjacent three elements is selected.

As shown in fig. 5, the words 1 to N are respectively and correspondingly converted into (x)₁,x₂,…,x_N-1,x_N) And selecting the average value of every two adjacent elements to obtain

Will be provided with

And inputting the data into a classification model for classification.

Specifically, by converting the word vector into the vector to be classified, the semantics of at least two adjacent words can be combined, the combined at least two words may have relevance, the obtained semantics are more complete, the accuracy of the classification result can be improved, and the calculation amount in the classification process is reduced.

As shown in fig. 6, the vector to be classified may be input to the classification model, and a corresponding classification result may be output, where the classification result may be a probability that the second content information corresponds to a specific type, or may be a probability that each of a plurality of types corresponds to.

Specifically, if the probability is greater than or equal to a preset threshold, it may be determined that the second content information is of a specific category; if the corresponding probability is smaller than the preset threshold, it may be determined that the second content information does not belong to the specific category.

For example, if the application account is the public number, the second content information is an article, the specific category is an advertisement category, and if an article X is provided_iThe probability of identifying an advertisement category based on the classification model is greater than a threshold K (e.g., K0.8), i.e., belongs to an advertisement category, if there is FasText (X)_iBecoming advertisement)>And K, the public number article Xi is considered as an advertisement article.

(4) And determining a second probability based on the type corresponding to at least one piece of second content information issued by the application account.

Specifically, the second probability may be determined according to the number of the second content information of the specific category and the total number of the second content information by using the number of the second content information of the specific category and the total number of the second content information in all the second content information issued by the account.

Specifically, the second probability may be calculated by using the following formula:

wherein BrandProb represents a second probability, and M represents the amount of second content information identified as a specific category issued by the application account; t denotes the total amount of the second content information issued by the application account.

The above embodiment illustrates a process for determining the second probability, and a process for identifying a category of an application account according to the first probability and the second probability will be described below with reference to a specific embodiment.

In one embodiment, the identifying and determining the category of the account based on the first probability and the second probability in step S204 may include:

(1) fusing the first probability and the second probability based on preset weight to obtain a fused value;

(2) and determining the category of the application account corresponding to the fusion numerical value.

Specifically, the category of the application account may be determined by a weighted sum calculation method, and the specific fusion calculation method is as follows:

S＝αRel(X，Y)+(1-α)BrandProb (5)

wherein S is a fusion numerical value; rel (X, Y) is the first probability; BrandProb is a second probability; alpha is a preset weight.

As shown in fig. 7, in the present embodiment, the first probability is determined according to the title of the first content information issued by the application account and the target hot search term; determining a second probability according to second content information issued by the application account; and determining a fusion numerical value according to the first probability and the second probability, and determining the category of the application account.

In another embodiment, the fusing the first probability and the second probability based on the preset weight in step S204 to obtain a fused value may include:

(1) and acquiring the registration time of the application account, and determining a third probability corresponding to the registration time.

In a specific implementation process, determining a third probability corresponding to the registration time may include:

a. determining the generation time of a target hot search word and determining the registration time of an application account;

b. if the registration time is after the generation time, determining a time difference between the registration time and the generation time;

c. a third probability is determined based on the time difference.

Specifically, the registration time of the application account may be after the time of generating the target hot-search term, and the third probability is negatively related to the time difference, that is, the shorter the time difference between the registration time of the application account and the time of generating the target hot-search term, the greater the third probability is.

(2) And fusing the first probability, the second probability and the third probability based on the preset weight to obtain a fused numerical value.

Specifically, the registration time may be fused, and the third probability corresponding to the registration time is determined, for example, the third probability corresponding to the time difference may be queried according to the time difference between the registration time and the current hot search word, and then the category of the application account is determined by using a weighted sum calculation method, where the specific fusion calculation method is as follows:

S＝αRel(X，Y)+βBrandProb+γt (6)

wherein S is a fusion numerical value; rel (X, Y) is the first probability; BrandProb is a second probability; t is a third probability; alpha, beta and gamma are all preset weights.

As shown in fig. 8, in the present embodiment, the first probability is determined according to the title of the first content information issued by the application account and the target hot search term; determining a second probability according to second content information issued by the application account; determining a third probability according to the registration time of the application account; and determining a fusion numerical value according to the first probability, the second probability and the third probability, and determining the category of the application account.

Specifically, if the fusion numerical value is greater than the preset numerical value, it can be determined that the application account is of a specific category; if the fusion value is smaller than the preset value, it can be determined that the application account does not belong to the specific category.

In the above embodiment, the third probability that the application account belongs to the specific category is determined according to the registration time of the application account, and the category of the application account is identified according to the first probability, the second probability and the third probability, so that the accuracy of identifying the category of the application account can be further improved.

In order to better understand the above identification method of the application account, as shown in fig. 9, an example of the identification method of the application account according to the present invention is described in detail as follows:

in one example, the application account identification method provided by the application includes the following steps:

step S900, acquiring at least one target hot search word of the application program in a preset time period;

step S901, acquiring a title of at least one piece of first content information issued by an application account to be identified through an application program;

step S902, determining semantic similarity between the title of the first content information and the target hot search word;

step S903, determining a first probability that the application account belongs to a specific category based on the determined semantic similarity;

step S904, converting the second content information issued by the application account into text information in a preset format;

step S905, performing word segmentation on the text information to obtain a plurality of words, and acquiring word vectors corresponding to the words;

step S906, converting the word vectors into vectors to be classified, classifying the vectors to be classified, and determining the type of the second content information;

step S907, determining a second probability based on the type corresponding to each of the at least one piece of second content information issued by the application account;

step S908, fusing the first probability and the second probability based on preset weight to obtain a fused value;

step S909, determining whether the fused value is greater than or equal to a preset value; if yes, the application account is of a specific category; if not, the application account does not belong to the specific category.

According to the identification method of the application account, the first probability that the application account belongs to the specific category is determined according to the target hot search word of the application program and the title of the first content information issued by the application account, the second probability that the application account belongs to the specific category is determined according to the second content information issued by the application account, and the category of the application account is identified by combining the first probability and the second probability, so that the accuracy of the category identification of the application account can be improved by considering the relation between the title issued by the application account and the target hot search word and also considering the category corresponding to the second content information issued by the application account.

A possible implementation manner is provided in the embodiment of the present application, as shown in fig. 10, an application account identification apparatus 100 is provided, where the application account identification apparatus 100 may include: an obtaining module 1001, a first determining module 1002, a second determining module 1003 and an identifying module 1004, wherein,

an obtaining module 1001, configured to obtain at least one target hot search term of an application program, and obtain a title of at least one first content information issued by the application program through at least one application account;

a first determining module 1002, configured to determine, based on the target hot search word and a title of the first content information, a first probability that the application account belongs to a specific category;

a second determining module 1003, configured to obtain at least one piece of second content information issued by the application account through the application program, and determine, based on the second content information, a second probability that the application account belongs to a specific category;

an identifying module 1004 for determining a category of the application account based on the first probability and the second probability.

The embodiment of the application provides a possible implementation manner, wherein the first content information comprises at least one of first image-text information and video; the second content information includes second teletext information.

In an embodiment of the present application, a possible implementation manner is provided, where the first determining module 1002 is specifically configured to, when determining, based on a target hot search term and a title of first content information, a first probability that an application account belongs to a specific category:

determining semantic similarity between a title of the first content information and the target hot search word;

based on the semantic similarity, a first probability is determined.

In the embodiment of the present application, a possible implementation manner is provided, and when determining the semantic similarity between the title of the first content information and the target hot search term, the first determining module 1002 is specifically configured to:

semantic similarity between the title vector and the hot search term vector is determined.

In the embodiment of the present application, a possible implementation manner is provided, and when the first determining module 1002 converts the title of the first content information into a title vector, the first determining module is specifically configured to:

splitting the title into at least one word;

if the number of the words obtained by splitting is larger than or equal to the preset number, converting the words with the preset number in the front in the sequence in the title into a title vector;

and if the number of the words obtained by splitting is less than the preset number, repeating the last word in the title until the number of the words is equal to the preset number, and converting the title after the words are repeated into a title vector.

In the embodiment of the present application, a possible implementation manner is provided, and when determining the first probability based on the semantic similarity, the first determining module 1002 is specifically configured to:

determining a first quantity of at least one first content information, and determining a second quantity of at least one target hot search word;

based on the maximum value in the first quantity and the second quantity, the determined semantic similarity is normalized to obtain a first probability.

In this embodiment of the present application, a possible implementation manner is provided, and when determining, based on the second content information, a second probability that the application account belongs to a specific category, the second determining module 1003 is specifically configured to:

segmenting the text information to obtain at least one word, and acquiring a word vector corresponding to the at least one word;

converting the word vectors into vectors to be classified, classifying the vectors to be classified, and determining the type of the second content information;

and determining a second probability based on the type corresponding to at least one piece of second content information issued by the application account.

In the embodiment of the present application, a possible implementation manner is provided, and when the second determining module 1003 converts the word vector into a vector to be classified, the second determining module is specifically configured to:

and constructing a vector to be classified based on the obtained average value.

In an embodiment of the present application, a possible implementation manner is provided, and when the identifying module 1004 identifies and determines the category of the application account based on the first probability and the second probability, specifically configured to:

In the embodiment of the present application, a possible implementation manner is provided, and when the identification module 1004 fuses the first probability and the second probability based on the preset weight and obtains a fused value, the identification module is specifically configured to:

The embodiment of the application provides a possible implementation manner, the target hot search word is a hot search word of an application program in a preset time period, and the first content information is issued by the application account through the application program in the preset time period.

According to the identification device for the application account, the first probability that the application account belongs to the specific category is determined according to the target hot search word of the application program and the title of the first content information issued by the application account, the second probability that the application account belongs to the specific category is determined according to the second content information issued by the application account, and the category of the application account is identified by combining the first probability and the second probability, so that the accuracy of the category identification of the application account can be improved by considering the relation between the title issued by the application account and the target hot search word and also considering the category corresponding to the second content information issued by the application account.

The recognition device for an application account of a picture according to the embodiment of the present disclosure may execute the recognition method for an application account of a picture according to the embodiment of the present disclosure, and the implementation principles thereof are similar, the actions performed by each module in the recognition device for an application account of a picture according to the embodiments of the present disclosure correspond to the steps in the recognition method for an application account of a picture according to the embodiments of the present disclosure, and for the detailed functional description of each module of the recognition device for an application account of a picture, reference may be specifically made to the description in the recognition method for an application account of a corresponding picture shown in the foregoing, and details are not repeated here.

Based on the same principle as the method shown in the embodiments of the present disclosure, embodiments of the present disclosure also provide an electronic device, which may include but is not limited to: a processor and a memory; a memory for storing computer operating instructions; and the processor is used for executing the identification method of the application account shown in the embodiment by calling the computer operation instruction. Compared with the prior art, the identification method of the application account number can improve the accuracy of the category identification of the application account number.

In an alternative embodiment, an electronic device is provided, as shown in fig. 11, the electronic device 4000 shown in fig. 11 comprising: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 9, but this does not indicate only one bus or one type of bus.

The Memory 4003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.

The memory 4003 is used for storing application codes for executing the scheme of the present application, and the execution is controlled by the processor 4001. Processor 4001 is configured to execute application code stored in memory 4003 to implement what is shown in the foregoing method embodiments.

Among them, electronic devices include but are not limited to: mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the prior art, the identification method of the application account number can improve the accuracy of the category identification of the application account number.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above embodiments.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a module does not in some cases constitute a limitation on the module itself, for example, identifying a module may also be described as "identifying a category of an application account".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. An identification method of an application account is characterized by comprising the following steps:

2. The method for identifying an application account according to claim 1, wherein the first content information includes at least one of first image-text information and video; the second content information includes second teletext information.

3. The method for identifying an application account according to claim 1, wherein the determining a first probability that the application account belongs to a specific category based on the target hot search term and the title of the first content information includes:

determining the first probability based on the semantic similarity.

4. The method for identifying an application account according to claim 3, wherein the determining semantic similarity between the title of the first content information and the target hot search term includes:

5. The method for identifying an application account according to claim 4, wherein the converting the title of the first content information into a title vector includes:

splitting the title into at least one word;

6. The method for identifying an application account according to claim 3, wherein the determining the first probability based on the semantic similarity includes:

7. The method for identifying an application account according to claim 1, wherein the determining a second probability that the application account belongs to the specific category based on the second content information includes:

8. The method for identifying an application account according to claim 7, wherein the converting the word vector into a vector to be classified includes:

9. The method for identifying an application account according to any one of claims 1 to 8, wherein the identifying and determining the category of the application account based on the first probability and the second probability includes:

10. The method for identifying the application account according to any one of claims 1 to 8, wherein the fusing the first probability and the second probability based on a preset weight to obtain a fused value comprises:

11. The method for identifying the application account according to any one of claims 1 to 8, wherein the target hot search term is a hot search term of the application program within a preset time period, and the first content information is issued by the application account through the application program within the preset time period.

12. An apparatus for recognizing an application account, comprising:

13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for identifying an application account according to any one of claims 1 to 11 when executing the program.

14. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the method for identifying an application account according to any one of claims 1 to 11.