CN110717038B

CN110717038B - Object classification method and device

Info

Publication number: CN110717038B
Application number: CN201910877089.2A
Authority: CN
Inventors: 康战辉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-09-17
Filing date: 2019-09-17
Publication date: 2022-10-04
Anticipated expiration: 2039-09-17
Also published as: CN110717038A

Abstract

The embodiment of the application provides an object classification method, an object classification device, a computer readable medium and electronic equipment. The object classification method comprises the following steps: acquiring text data associated with an object to be classified, wherein the text data comprises description data and search data of the object to be classified, and the search data comprises corpus data obtained by searching the object to be classified in a search engine; extracting subject words of the text data from the text data based on a preset subject model; and determining the category label of the object to be classified according to the subject term of the text data. The technical scheme of the embodiment of the application improves the comprehensiveness of the text data of the object to be classified, increases the data base range of the extraction of the subject term, improves the accuracy of the extraction of the subject term based on the text processing technology of natural language processing in artificial intelligence, and finally determines the class label of the object to be classified according to the subject term to realize object classification, improve the accuracy and efficiency of the object classification and enable the object classification to be more intelligent.

Description

Object classification method and device

Technical Field

The present application relates to the field of computer and communication technologies, and in particular, to an object classification method, an object classification device, a computer-readable medium, and an electronic device.

Background

In the process of searching the terminal application product, a user inputs a keyword to search for a desired product. However, this approach requires that each product be labeled in advance or classified according to product information to deduce the product corresponding to the search term when the user searches. The traditional mode is to process a product by manually labeling, but under the conditions that the number of products is increased sharply and the functions of the products are more comprehensive and diversified, the mode has higher cost and lower efficiency. Especially at the present time when artificial intelligence is rapidly developed, the traditional processing mode is low in precision and efficiency, and cannot keep pace with the development steps of other related technologies in the technical field.

Disclosure of Invention

The embodiment of the application provides an object classification method and device, and therefore the accuracy and efficiency of object classification can be improved at least to a certain extent.

Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.

According to an aspect of an embodiment of the present application, there is provided an object classification method, including: acquiring text data associated with an object to be classified, wherein the text data comprises description data and search data of the object to be classified, and the search data comprises corpus data obtained by searching the object to be classified in a search engine; extracting subject words of the text data from the text data based on a preset subject model; and determining the class label of the object to be classified according to the subject term of the text data.

According to an aspect of an embodiment of the present application, there is provided an object classification apparatus including: the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring text data associated with an object to be classified, the text data comprises description data and search data of the object to be classified, and the search data comprises corpus data obtained by searching the object to be classified in a search engine; the extraction unit is used for extracting the subject words of the text data from the text data based on a preset subject model; and the label unit is used for determining the class label of the object to be classified according to the subject term of the text data.

In some embodiments of the present application, based on the foregoing solution, the object classification apparatus further includes: the second acquisition unit is used for acquiring the search entries sent by the terminal; the searching unit is used for searching a target category label matched with the searching entry and a target object corresponding to the target category label; and the first sending unit is used for returning the information of the target object to the terminal.

In some embodiments of the present application, based on the foregoing solution, the first obtaining unit is configured to: obtaining description data associated with the object to be classified; and extracting word vector data from the description data according to a preset word vector model, and adding the word vector data to the text data.

In some embodiments of the present application, based on the foregoing solution, the extraction unit is configured to: the first calculating unit is used for extracting at least one theme from the text data based on the preset theme model, and determining the probability distribution of the at least one theme and the word probability distribution corresponding to each theme; a second calculating unit, configured to calculate a keyword probability distribution corresponding to the text data based on the probability distribution of the at least one topic and the word probability distribution corresponding to each topic; and the subject term extraction unit is used for extracting the subject terms of the text data according to the probability distribution of the keywords corresponding to the text data.

In the bookIn some embodiments of the application, based on the foregoing, the first computing unit is configured to: the probability distribution of the at least one topic is ; wherein p is _ti ＝n _ti /n，p _ti Representing the probability of the ith subject corresponding to the text data t; n is _ti Represents the number of words in the text data corresponding to the ith topic, and n represents the total number of words included in the text data.

In some embodiments of the present application, based on the foregoing scheme, the second computing unit is configured to: said word probability distribution is ; wherein p is _wi ＝N _wi /N，p _wi Representing the probability of the ith keyword corresponding to the w-th subject; n is a radical of hydrogen _wi Indicates the number of ith keywords corresponding to the w-th topic, and N indicates the total number of keywords corresponding to the w-th topic.

In some embodiments of the present application, based on the foregoing solution, the topic word extraction unit is configured to: calculating the probability distribution of the keywords corresponding to the text data by the following formula based on the probability distribution of the at least one topic and the word probability distribution corresponding to each topic, wherein the calculation comprises the following steps:

p(w|d)＝p(w|t)·p(t|d)

wherein p (w | d) represents the probability of the w-th keyword appearing on the basis of the known text data d; p (w | t) represents the probability of the w-th keyword appearing on the basis of the known t-th topic; p (t | d) represents the probability of the occurrence of the tth topic on the basis of the known text data d.

In some embodiments of the present application, based on the foregoing solution, the object classification apparatus further includes: the third acquisition unit is used for acquiring label adjustment information triggered by a user; and the adjusting unit is used for adjusting the category label according to the label adjusting information.

In some embodiments of the present application, based on the foregoing solution, the first obtaining unit is configured to: searching for information of the game application; the search result title and summary fields are identified as search data for the game application.

According to an aspect of embodiments of the present application, there is provided a computer-readable medium on which a computer program is stored, which computer program, when executed by a processor, implements the object classification method as described in the above embodiments.

According to an aspect of an embodiment of the present application, there is provided an electronic device including: one or more processors; a storage device to store one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the object classification method as in the above embodiments.

In the technical scheme provided by some embodiments of the application, the corpus of the object to be classified is expanded by acquiring various types of text data, the comprehensiveness of the text data of the object to be classified is improved, and the data base range of subject term extraction is increased; meanwhile, the subject term is extracted from the text data based on a preset subject model, and the accuracy of extracting the subject term is improved based on a text processing technology of natural language processing in artificial intelligence; and finally, determining the class label of the object to be classified according to the subject term, realizing object classification, improving the accuracy and efficiency of object classification and enabling the object classification to be more intelligent.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 shows a schematic diagram of an exemplary system architecture to which aspects of embodiments of the present application may be applied;

FIG. 2 schematically shows a flow diagram of an object classification method according to an embodiment of the present application;

FIG. 3 schematically illustrates a flow diagram for obtaining text data associated with an object to be classified according to one embodiment of the present application;

FIG. 4 schematically shows a flow diagram for obtaining text data associated with an object to be classified according to an embodiment of the present application;

FIG. 5 schematically illustrates a flow chart for extracting subject words from the text data according to an embodiment of the present application;

FIG. 6 schematically shows a keyword extraction diagram of textual data according to an embodiment of the present application;

FIG. 7 schematically shows a flow diagram of an object classification method according to an embodiment of the present application;

FIG. 8 schematically shows a flow diagram of an object classification method according to an embodiment of the present application;

FIG. 9 schematically illustrates a terminal game search diagram according to one embodiment of the present application;

FIG. 10 schematically shows a block diagram of an object classification apparatus according to an embodiment of the present application;

FIG. 11 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied.

As shown in fig. 1, the system architecture may include a terminal device (e.g., one or more of a smartphone 101, a tablet computer 102, and a portable computer 103 shown in fig. 1, and of course a desktop computer, etc.), a network 104, and a server 105. The network 104 serves as a medium for providing communication links between terminal devices and the server 105. Network 104 may include various connection types, such as wired communication links, wireless communication links, and so forth.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, and the like.

In an embodiment of the present application, a user may upload text data of an object to be classified to a server 105 by using a terminal device 103 (which may also be a terminal device 101 or 102), where the text data may be description data and search data of the object to be classified, or may also obtain description data associated with the object to be classified first, and extract word vector data from the description data according to a preset word vector model, or the like. After the text data are obtained, the server 105 extracts the subject term of the obtained text data from the text data based on the preset subject model, and then determines the category label of the object to be classified according to the subject term of the text data. The technical scheme of the embodiment can expand the corpus of the object to be classified by acquiring various types of text data, improve the comprehensiveness of the text data of the object to be classified, increase the data base range of the extraction of the subject word, extract the subject word from the text data based on the preset subject model, improve the accuracy of the extraction of the subject word based on the text processing technology of natural language processing in artificial intelligence, finally determine the class label of the object to be classified according to the subject word, realize the object classification, improve the accuracy and efficiency of the object classification, and enable the object classification to be more intelligent.

It should be noted that the object classification method provided in the embodiment of the present application is generally executed by the server 105, and accordingly, the object classification apparatus is generally disposed in the server 105. However, in other embodiments of the present application, the terminal device may also have a similar function as the server, so as to execute the scheme of object classification provided in the embodiments of the present application.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

The scheme provided by the embodiment of the application relates to technologies such as artificial intelligence natural language processing and the like, and is specifically explained by the following embodiment:

fig. 2 shows a flow diagram of an object classification method according to an embodiment of the present application, which may be performed by a server, which may be the server shown in fig. 1. Referring to fig. 2, the object classification method at least includes steps S210 to S240, which are described in detail as follows:

in step S210, text data associated with an object to be classified is obtained, where the text data includes description data and search data of the object to be classified, and the search data includes corpus data obtained by searching for the object to be classified in a search engine.

In an embodiment of the present application, the object to be classified may be any product that needs to be classified, and the product may be an actually existing object, or may also be data, an application program, and the like stored in a computer or a server, which is not limited herein. For example, the object to be classified is an application program, and each application program is classified in the application management program by obtaining text data of the application program, such as data of an application environment, a use method, an application function, and the like, so that a user can find application programs corresponding to different types. In addition, the object to be classified may also be a file, and the text data related to the file is content data, functional data, file information, and the like of the file, which is not limited herein.

In one embodiment of the present application, the text data associated with the object to be classified may include description data and search data of the object to be classified. The description data may be introduction information of the object to be classified, and the search data is corpus data, such as a web page or a document, of a search result obtained by searching the object to be classified in a search engine, which is not limited herein.

In an embodiment of the present application, as shown in fig. 3, the process of acquiring the text data associated with the object to be classified in step S210 includes the following steps S310 to S320, which are described in detail as follows:

in step S310, description data associated with the object to be classified is acquired.

In an embodiment of the present application, the description data associated with the object to be classified may be related information, application data, and the like of the object to be classified, and is not limited herein. The method for obtaining the description data associated with the object to be classified can be obtained from an official website of the object to be classified, or the vocabulary entry of the object to be classified is searched through a search engine, and the search result is used as the description data of the object to be classified.

In step S320, word vector data is extracted from the description data according to a preset word vector model, and the word vector data is added to the text data.

In an embodiment of the application, word vector data is extracted from description data through a preset word vector model, and the word vector data is added to text data of an object to be classified.

In one embodiment of the present application, a word vector model is used to extract word vector data from description data. Specifically, the word vector model may be constructed according to historical data, a word prediction target word of a context of a target word in the historical data is determined by obtaining the historical data, a word vector dimension d of the target word is set, all words in the historical data are randomly initialized to a d-dimensional vector, then all word vectors of the context are encoded to obtain a vector of a hidden layer, and the target word in the historical data is obtained by vector prediction of the hidden layer and serves as word vector data.

In an embodiment of the present application, as shown in fig. 4, the process of acquiring the text data associated with the object to be classified in step S210 further includes the following steps S410 to S420, which are described in detail as follows:

in step S410, information of the game application is searched.

In one embodiment of the present application, the object to be classified is a game application. The information of the game application may include game data, event information, community interaction information, player information, and the like, which is not limited herein.

In one embodiment of the application, the information of the game application program can be obtained by game official website search, publisher website search, or by searching the entry of the game application program in a search engine, and the search result is used as the information of the game application program.

Illustratively, the name of the game application program is searched in an article search engine to obtain corresponding article search results, and according to the popularity of each article search result, titles and abstract fields of a preset number of articles are used as the information of the game application program.

In step S420, the searched title and summary fields are identified as the search data of the game application.

In one embodiment of the present application, the title and summary fields of the search are identified as search data for the game application. In the embodiment, the search data is obtained by a search mode to further expand the text data category of the object to be classified, so that the object to be classified can be described by more diversified text data, and the data base range of subject term extraction is increased.

In step S220, a subject word of the text data is extracted from the text data based on a preset subject model.

In an embodiment of the application, after the text data associated with the object to be classified is acquired, based on a preset topic model, by calculating probability distribution and word probability distribution of topics in the text data, a subject word of the text data is extracted from the text data, so that the object to be classified is represented by the extracted subject word.

In this embodiment, the topic model is a text processing technology based on natural language processing in artificial intelligence, and topics corresponding to various text data are obtained by identifying and analyzing historical text data, and finally the obtained topic model is obtained. The topic model in this embodiment is used to extract the topic words from the text data to represent the information in the text data.

Natural language processing is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language processing. In the embodiment, the accuracy and efficiency of object classification are improved through a text processing technology based on natural language processing in artificial intelligence, so that the object classification is more efficient and intelligent.

In an embodiment of the present application, as shown in fig. 5, the process of extracting the topic word of the text data from the text data in step S220 includes the following steps S510 to S530, which are described in detail as follows:

in step S510, at least one topic is extracted from the text data based on the preset topic model, and a probability distribution of the at least one topic and a word probability distribution corresponding to each topic are determined.

In one embodiment of the present application, a topic model is used to extract topics from a corpus of textual data via the topic model. The input data of the theme model is text data, and the output data is a theme corresponding to the text data.

Referring to fig. 6, in an embodiment of the present application, the text data in the corpus is denoted by D, where a single data in the text data is D, such as a single document data; all the texts are denoted by TA set of topics corresponding to the data, wherein the T totally comprises k topics, and each single topic is T; d, all different keywords form a keyword set V, and the V contains n keywords in total. Each topic can be considered to consist of n keywords, i.e. < w by a sequence of keywords ₁ ,w ₂ ,···,w _i ,···,w _m Composition, wherein w _i The ith keyword is represented, and the position of each keyword does not have any influence on the extraction of the topic.

In an embodiment of the application, the process of determining the probability distribution of the at least one topic in step S410 includes:

the probability distribution of the at least one topic is < p _t1 ,···,p _ti ,···,p _tk ＞；

Wherein p is _ti ＝n _ti /n，p _ti Representing the probability of the ith subject corresponding to the text data t; n is _ti Represents the number of words in the text data corresponding to the ith topic, and n represents the total number of words included in the text data.

In an embodiment of the present application, each piece of text data in the corpus is composed of a plurality of topics, a probability of occurrence of each topic in one piece of text data can be used to measure an importance degree of one topic in one piece of text data, each piece of text data in the corpus corresponds to a distribution of k topics, and the distribution of k topics is denoted as θ, that is, a probability distribution of topics.

When determining the probability distribution of the subject, firstly, the following formula is used: p is a radical of _ti ＝n _ti And/n, calculating the probability of the ith theme corresponding to the text data t. Wherein n is _ti Represents the number of words in the text data corresponding to the ith topic, and n represents the total number of words included in the text data.

After calculating to obtain a second probability that the ith theme appears in the text data, determining the probability that the document D in the document D corresponds to different themes as p _t1 ,···,p _ti ,···,p _tk The k subjects contained in the subject library TAnd combining the second probabilities corresponding to the topics to obtain the probability distribution of the topics of the text data as follows: < p _t1 ,···,p _ti ,···,p _tk ＞。

In an embodiment of the present application, the probability distribution of the topics is used to represent the distribution of each topic in one text data, such as the number of occurrences or the probability, so as to measure the occurrence frequency or the importance degree of each topic in the text data according to the distribution.

In an embodiment of the present application, the process of determining the word probability distribution corresponding to each topic in step S410 includes:

said word probability distribution is ; wherein p is _wi ＝N _wi /N，p _wi Representing the probability of the ith keyword corresponding to the w-th subject; n is a radical of _wi Indicating the number of ith keywords corresponding to the w-th topic, and N indicating the total number of keywords corresponding to the w-th topic.

In one embodiment of the present application, a topic may be composed of a plurality of different keywords, and the probability of a keyword appearing in a topic is used to measure the importance of the keyword in a topic. Thus, the probabilistic relationship between each topic and m keywords in the vocabulary can be represented by a multinomial distribution, denoted as φ, the word probability distribution.

Specifically, the probability of the ith keyword in the keyword set appearing in the topic is calculated to be p _wi ＝N _wi N, wherein N _wi The number of keywords corresponding to the ith keyword in the ith topic is represented, and N represents the total number of keywords included in the topic.

And p _wi In the same way, by calculating the probability p of all keywords appearing in the topic _w1 ,···,p _wi ,···,p _wm The probability relationship between each topic and the m keywords in the vocabulary can be represented by a polynomial distribution < p _w1 ,···,p _wi ,···,p _wm To indicate that the number of the electronic devices,i.e. the word probability distribution.

In step S520, a keyword probability distribution corresponding to the text data is calculated based on the probability distribution of the at least one topic and the word probability distribution corresponding to each topic.

In an embodiment of the present application, based on the probability distribution of the at least one topic and the word probability distribution corresponding to each topic, calculating a keyword probability distribution corresponding to the text data by using the following formula, including:

p(w|d)＝p(w|t)·p(t|d)

In step S530, a subject word of the text data is extracted according to the keyword probability distribution corresponding to the text data.

In an embodiment of the application, after determining the probability distribution of the keywords corresponding to the text data, selecting the subject word of the text data from all the keywords according to the probability distribution of the keywords. The selection mode may be that a preset number of subject words are selected from all the keywords according to the order of the word probability of the keyword probability distribution from large to small.

It should be noted that the number of subject words in this embodiment may be one, or two or more, and is not limited herein. For example, the resulting topic word may be: play, jump, obstacle, gather, control, action, etc. The larger the number of the subject terms is, the more possible the comprehensiveness and representativeness of the labeling of the object to be classified can be ensured.

In step S230, a category label of the object to be classified is determined according to a subject term of the text data.

In one embodiment of the application, after determining the subject term of the text data, the category label of the object to be classified is determined according to the subject term of the text data, so as to realize the classification of the object to be classified.

Optionally, when determining the category label, a subject word may be directly selected from the subject words manually as the category label. And determining the class label with finer granularity by depending on a theme model algorithm and a manual screening mode.

In addition, in this embodiment, besides determining a category label of the object to be classified according to the subject word, all the subject words may be retained and used as fine-grained category labels other than the category label to describe the object to be classified in more detail and accurately.

In one embodiment of the present application, examples of subject words and their category labels of a game application may be as shown in table 1:

TABLE 1

As shown in table 1, when classifying game applications, according to the obtained subject words "game, jump, obstacle, collection, running, avoidance, action, stop, control, on the road, running, prop, avoidance, challenge, escape, danger", when the first probability corresponding to "running cool" is the largest, it is determined that the type of the game application is "cool running"; according to the obtained subject words of games, players, level, challenges, difficulty, intelligence, leisure, playing, pictures, props, cross-gates, breakthrough, fun, level, commons and features, when the first probability corresponding to the level is the maximum, the type of the game application program can be determined to be level.

In an embodiment of the present application, as shown in fig. 7, after the process of determining the category label of the object to be classified according to the subject term of the text data in step S230, steps S710 to S720 are further included, which are described in detail as follows:

in step S710, tag adjustment information triggered by the user is acquired.

In an embodiment of the application, the tag adjustment information is determined by the user according to the generated category tag, and the tag adjustment information may be obtained in a form of direct triggering on the server, or may be generated at the terminal to obtain the tag adjustment information sent by the terminal.

In one embodiment of the present application, the tag adjustment information may be in the form of an instruction, for example, to adjust the category tag "level" to "running cool".

In step S720, the category label is adjusted according to the label adjustment information.

In one embodiment of the present application, the category labels are adjusted according to specific instructions in the label adjustment information. Specifically, the adjustment manner may be a deletion manner, a replacement manner, and the like, and is not limited herein.

In an embodiment of the application, the subjective opinion of the user is added to the category label processed by the server in a manual adjustment mode, so that the server can be prevented from generating an improper category label, and the manual subjective opinion can be added for adjustment, thereby more vividly representing the category of the object to be classified.

In an embodiment of the present application, as shown in fig. 8, after the process of determining the category label of the object to be classified according to the subject word of the text data in step S230, steps S810 to S830 are further included, which are described in detail as follows:

in step S810, a search term sent by the terminal is obtained.

In one embodiment of the present application, the category label may be used for retrieval or query of the object. When a user queries an object, a search term is input at a terminal, the terminal sends the search term to a server, and the server obtains the search term sent by the terminal.

In one embodiment of the present application, the object to be classified may be an application. For example, the category label of each application program is determined, so that each application program is classified according to the category label in the application management program, and the type of each application program is determined, so that when a user searches for an application program, the user can input a search term to find the application program corresponding to the search term, and then download, install and the like are performed.

In step S820, a target category label matching the search term and a target object corresponding to the target category label are searched.

In an embodiment of the present application, a target category tag matching the search term is searched for in the category tags of all existing objects, and a target object corresponding to the target category tag is determined from all existing objects.

In an embodiment of the present application, a similarity between a search term and a category label may be calculated to determine a matching condition between the search term and the category label, and then determine that the category label with a higher similarity is a target category label.

In an embodiment of the present application, based on the description in step S230, in addition to determining a category label for an object to be classified, the subject term of the object to be classified may also be used as a fine-grained category label. Therefore, when the target category tag matched with the search term is searched, the target category tag matched with the search term can be searched in the category tag, then the search term is matched with all the subject terms, the target subject term matched with the search term is determined, and then the target object corresponding to the target category tag and the target subject term is identified. It should be noted that, after the target category label is determined, the target object corresponding to the target category label may be identified first, and then after the target subject word is determined, the target object corresponding to the target subject word may be identified. The target object corresponding to the target category label and the target subject term may be identified simultaneously or sequentially, and is not limited herein.

According to the method, more target objects related to the search terms of the user can be pushed to the user, the selection range of the user is improved, the success rate of target search is improved for the user, and the success rate of object pushing is improved for a developer.

For example, referring to fig. 9, when a user searches for a game through the terminal 900, the user first inputs a "level" 901 in the search field and transmits a search command of the "level" to the server. After receiving the search instruction of the "level" sent by the terminal 900, the server matches the search term "level" with all the category tags to determine the target category tag and the target game 1 corresponding to the target category tag, that is, 902. Then matching the search term 'level' with the subject terms of all games, determining the target subject term matched with the search term 'level', and identifying the object corresponding to the target subject term as the target games 2-n, namely 903-904. In addition to pushing the game with the category label of "level" as the target game 1, the present embodiment returns to the game with other subject terms including "level", but the category label may not include "level", that is, the game with the fine-grained category label of "level" is actually played. The method enlarges the selection right of the user, improves the success rate of game search for the user, and improves the success rate of game push for the developer.

In step S830, information of the target object is returned to the terminal.

In one embodiment of the application, after the target object is determined, information of the target object is returned to the terminal. The information of the target object may include a name of the target object, a website link, a download link, text data, and the like, which is not limited herein.

According to the technical scheme of the embodiment of the application, the corpus of the object to be classified is expanded by acquiring various types of text data, the comprehensiveness of the text data of the object to be classified is improved, and the data base range of subject word extraction is enlarged; meanwhile, the subject term is extracted from the text data based on a preset subject model, and the accuracy of extracting the subject term is improved based on a text processing technology of natural language processing in artificial intelligence; and finally, determining the class label of the object to be classified according to the subject term, realizing object classification, improving the accuracy and efficiency of object classification and enabling the object classification to be more intelligent.

Embodiments of the apparatus of the present application are described below, which may be used to perform the object classification methods in the above-described embodiments of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the object classification method described above in the present application.

Fig. 10 schematically shows a block diagram of an object classification apparatus according to an embodiment of the present application.

Referring to fig. 10, an object classification apparatus 1000 according to an embodiment of the present application includes: a first acquisition unit 1002, an extraction unit 1004, and a labeling unit 1006.

The first obtaining unit 1002 is configured to obtain text data associated with an object to be classified, where the text data includes description data and search data of the object to be classified, and the search data includes corpus data obtained by searching for the object to be classified in a search engine; the extracting unit 1004 is configured to extract a subject word of the text data from the text data based on a preset subject model; the label unit 1006 is configured to determine a category label of the object to be classified according to the subject word of the text data.

In an embodiment of the present application, the object classification apparatus 1000 further includes: the second acquisition unit is used for acquiring the search terms sent by the terminal; the searching unit is used for searching a target category label matched with the searching entry and a target object corresponding to the target category label; and the first sending unit is used for returning the information of the target object to the terminal.

In an embodiment of the present application, the first obtaining unit 1002 is configured to: obtaining description data associated with the object to be classified; and extracting word vector data from the description data according to a preset word vector model, and adding the word vector data to the text data.

In one embodiment of the present application, the extraction unit 1004 is configured to: the first calculating unit is used for extracting at least one theme from the text data based on the preset theme model, and determining the probability distribution of the at least one theme and the word probability distribution corresponding to each theme; a second calculating unit, configured to calculate a keyword probability distribution corresponding to the text data based on the probability distribution of the at least one topic and the word probability distribution corresponding to each topic; and the subject term extraction unit is used for extracting the subject terms of the text data according to the probability distribution of the keywords corresponding to the text data.

In one embodiment of the present application, the first calculation unit is configured to: the probability distribution of the at least one topic is < p _t1 ,···,p _ti ,···,p _tk ＞；

In one embodiment of the present application, the second calculation unit is configured to: said word probability distribution is < p _w1 ,···,p _wi ,···,p _wm ＞；

Wherein p is _wi ＝N _wi /N，p _wi Representing the probability of the ith keyword corresponding to the w-th subject; n is a radical of hydrogen _wi Indicating the number of ith keywords corresponding to the w-th topic, and N indicating the total number of keywords corresponding to the w-th topic.

In one embodiment of the present application, the topic word extraction unit is configured to: calculating the probability distribution of the keywords corresponding to the text data by the following formula based on the probability distribution of the at least one topic and the word probability distribution corresponding to each topic, wherein the calculation comprises the following steps:

p(w|d)＝p(w|t)·p(t|d)

In an embodiment of the present application, the object classification apparatus 1000 further includes: a third obtaining unit, configured to obtain tag adjustment information triggered by a user; and the adjusting unit is used for adjusting the category label according to the label adjusting information.

In an embodiment of the present application, the first obtaining unit 1002 is configured to: searching for information of the game application; the search result title and summary fields are identified as search data for the game application.

FIG. 11 illustrates a schematic structural diagram of a computer system suitable for use to implement the electronic device of the embodiments of the subject application.

It should be noted that the computer system 1100 of the electronic device shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 11, a computer system 1100 includes a Central Processing Unit (CPU) 1101, which can perform various appropriate actions and processes in accordance with a program stored in a Read-Only Memory (ROM) 1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for system operation are also stored. The CPU1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An Input/Output (I/O) interface 1105 is also connected to bus 1104.

The following components are connected to the I/O interface 1105: an input portion 1106 including a keyboard, mouse, and the like; an output section 1107 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 1108 including a hard disk and the like; and a communication section 1100 including a Network interface card such as a Local Area Network (LAN) card, a modem, or the like. The communication section 1109 performs communication processing via a network such as the internet. A drive 1109 is also connected to the I/O interface 1105 as necessary. A removable medium 1111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1110 as necessary, so that a computer program read out therefrom is mounted into the storage section 1108 as necessary.

In particular, the processes described below with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the application. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 1109 and/or installed from the removable medium 1111. When the computer program is executed by a Central Processing Unit (CPU) 1101, various functions defined in the system of the present application are executed.

It should be noted that the computer readable media shown in the embodiments of the present application may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may be separate and not incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.

It will be understood that the present application is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. An object classification method, comprising:

acquiring text data associated with an object to be classified, wherein the text data comprises description data and search data of the object to be classified, and the search data comprises corpus data obtained by searching the object to be classified in a search engine; the method for classifying the objects to be classified is characterized in that the objects to be classified are game application programs, text data associated with the objects to be classified are acquired, and the method comprises the following steps: searching the information of the game application program, and identifying the searched title and abstract fields as the search data of the game application program;

extracting at least one theme from the text data based on a preset theme model, and determining probability distribution of the at least one theme and word probability distribution corresponding to each theme;

calculating the probability distribution of keywords corresponding to the text data based on the probability distribution of the at least one topic and the word probability distribution corresponding to each topic;

extracting subject words of the text data according to the probability distribution of the keywords corresponding to the text data;

determining a category label of the object to be classified according to the subject term of the text data;

acquiring a search entry sent by a terminal;

searching a target category label matched with the search entry, matching the search entry with all subject terms, determining a target subject term matched with the search entry, and identifying an object corresponding to the target category label and the target subject term as a target object;

and returning the information of the target object to the terminal.

2. The object classification method of claim 1, wherein the obtaining text data associated with the object to be classified comprises:

obtaining description data associated with the object to be classified;

and extracting word vector data from the description data according to a preset word vector model, and adding the word vector data to the text data.

3. The object classification method according to claim 1, characterized in that said at leastA probability distribution of a topic is；

4. The object classification method according to claim 1, characterized in that the word probability distribution is；

Wherein p is _wi ＝N _wi /N，p _wi Representing the probability of the ith keyword corresponding to the w-th subject; n is a radical of _wi Indicates the number of ith keywords corresponding to the w-th topic, and N indicates the total number of keywords corresponding to the w-th topic.

5. The object classification method according to claim 1, wherein calculating the probability distribution of the keywords corresponding to the text data based on the probability distribution of the at least one topic and the probability distribution of the words corresponding to the respective topics by the following formula comprises:

p(w|d)＝p(w|t)·p(t|d)

wherein p (w | d) represents the probability of the w-th keyword appearing on the basis of the known text data d; p (w | t) represents the probability of the w-th keyword occurring on the basis of the known t-th topic; p (t | d) represents the probability of the occurrence of the tth topic on the basis of the known text data d.

6. The object classification method according to claim 1, wherein after determining the class label of the object to be classified according to the subject term of the text data, further comprising:

acquiring label adjustment information triggered by a user;

and adjusting the category label according to the label adjustment information.

7. An object classification apparatus, comprising:

the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring text data associated with an object to be classified, the text data comprises description data and search data of the object to be classified, and the search data comprises corpus data obtained by searching the object to be classified in a search engine; the object to be classified is a game application program, and the first obtaining unit is specifically used for searching information of the game application program; identifying the searched title and abstract fields as the search data of the game application program;

the extraction unit is used for extracting at least one theme from the text data based on a preset theme model, and determining the probability distribution of the at least one theme and the word probability distribution corresponding to each theme; calculating the probability distribution of keywords corresponding to the text data based on the probability distribution of the at least one topic and the word probability distribution corresponding to each topic; extracting subject words of the text data according to the probability distribution of the keywords corresponding to the text data;

the label unit is used for determining the class label of the object to be classified according to the subject term of the text data;

the second acquisition unit is used for acquiring the search terms sent by the terminal;

the searching unit is used for searching a target category label matched with the searching entry, matching the searching entry with all subject terms, determining a target subject term matched with the searching entry, and identifying an object corresponding to the target category label and the target subject term as a target object;

and the first sending unit is used for returning the information of the target object to the terminal.

8. The object classification device according to claim 7, wherein the first obtaining unit is specifically configured to:

obtaining description data associated with the object to be classified;

9. The object classification apparatus according to claim 7, wherein the extraction unit includes a first calculation unit, and the first calculation unit is specifically configured to: the probability distribution of the at least one subject is；

Wherein p is _ti ＝n _ti /n，p _ti Representing the probability of the ith subject corresponding to the text data t; n is a radical of an alkyl radical _ti Represents the number of words in the text data corresponding to the ith topic, and n represents the total number of words included in the text data.

10. The object classification apparatus according to claim 7, wherein the extraction unit includes a second calculation unit, and the second calculation unit is specifically configured to: the word probability distribution is；

Wherein p is _wi ＝N _wi /N，p _wi Representing the probability of the ith keyword corresponding to the w topic; n is a radical of hydrogen _wi Indicating the number of ith keywords corresponding to the w-th topic, and N indicating the total number of keywords corresponding to the w-th topic.

11. The object classification device according to claim 7, wherein the extraction unit comprises a subject term extraction unit, the subject term extraction unit being specifically configured to: calculating the probability distribution of the keywords corresponding to the text data by the following formula, including:

p(w|d)＝p(w|t)·p(t|d)

12. The object classification apparatus of claim 7, further comprising:

a third obtaining unit, configured to obtain tag adjustment information triggered by a user;

and the adjusting unit is used for adjusting the category label according to the label adjusting information.

13. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the object classification method according to any one of claims 1 to 6.

14. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the object classification method according to any one of claims 1 to 6.