CN111858983A

CN111858983A - Picture type determining method and device, electronic equipment and storage medium

Info

Publication number: CN111858983A
Application number: CN202010698782.6A
Authority: CN
Inventors: 徐瑞聪; 刘瑞峰; 曹佐; 李东帅; 周鑫; 左凯; 黄彦春; 腊磊; 马潮; 张弓; 王仲远
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2020-07-20
Filing date: 2020-07-20
Publication date: 2020-10-30

Abstract

The application discloses a method and a device for determining picture categories, electronic equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: extracting a first keyword corresponding to a picture from text information of the picture to be classified, wherein the text information is comment information corresponding to the picture; determining a second keyword in the first keyword corresponding to the picture based on the target search term corresponding to the picture; determining semantic relevance of the target search word and the second keyword; and determining the picture category corresponding to the picture based on the target search word and the semantic relevance between the target search word and the second keyword. According to the method for determining the picture category, due to the fact that the text information corresponding to the picture is considered, under the conditions that the picture is not clear enough and the picture quality is poor, the picture category can be accurately obtained based on the keywords and the search words of the picture, and the accuracy of the determined picture category can be improved to a certain extent.

Description

Picture type determining method and device, electronic equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for determining picture categories, electronic equipment and a storage medium.

Background

With the continuous development of computer technology, electronic devices have more and more powerful functions, users can obtain various pictures through various social applications and web browsing, and a large number of pictures of different types can exist in the electronic devices. Therefore, a method for determining the picture category is needed to classify the pictures.

In the related technology, a picture to be classified is acquired, the picture is input into a picture classification model, the picture classification model is used for extracting the characteristics of the picture, and the picture is classified according to the characteristics of the picture, so that the picture category of the picture is determined.

However, due to the fact that the quality of the picture is unstable, the picture content of the picture is fuzzy, the features of the picture extracted by the picture classification model are not accurate enough, the accuracy of the picture classification model on the picture is further influenced, and the accuracy of the determined picture category is low.

Disclosure of Invention

The embodiment of the application provides a method and a device for determining picture categories, electronic equipment and a storage medium, which can be used for solving the problems in the related art. The technical scheme is as follows:

in one aspect, an embodiment of the present application provides a method for determining a picture category, where the method includes:

Extracting a first keyword corresponding to a picture from text information of the picture to be classified, wherein the text information is comment information corresponding to the picture;

determining a second keyword in the first keyword corresponding to the picture based on the target search term corresponding to the picture;

determining semantic relevancy between the target search word and the second keyword;

and determining the picture category corresponding to the picture based on the target search term and the semantic relevance between the target search term and the second keyword.

In a possible implementation manner, the determining, based on the target search term corresponding to the picture, a second keyword from the first keyword corresponding to the picture includes:

performing character matching on the first keywords corresponding to the picture and the target search word to obtain the character matching number of each first keyword of the picture;

and determining the first keywords of which the character matching numbers meet the target requirement as second keywords.

In a possible implementation manner, the determining, as the second keyword, the first keyword whose number of character matches meets a target requirement includes:

and calculating the ratio of the character matching number to the total number of the characters of the target search word, and determining the first keyword of which the ratio meets a target threshold value as a second keyword.

In a possible implementation manner, before determining a second keyword in a first keyword corresponding to the picture based on the target search term corresponding to the picture, the method further includes:

inputting the picture into a picture classification model, wherein the picture classification model is used for determining a search term corresponding to the picture;

determining a search term corresponding to the picture based on an output result of the picture classification model;

and determining a target search term in the search terms corresponding to the pictures.

In a possible implementation manner, the determining the semantic relevance of the target search term and the second keyword includes:

coding the second keyword, and determining keyword information corresponding to the second keyword;

encoding the target search word, and determining search word information corresponding to the target search word;

calculating semantic correlation between the keyword information and the search word information according to an attention mechanism based on the keyword information and the search word information;

and determining the semantic relevance between the keyword information and the search word information as the semantic relevance between the target search word and the second keyword.

In a possible implementation manner, the encoding the second keyword and determining keyword information corresponding to the second keyword includes:

coding each word included in the second keyword to obtain a vector corresponding to each word in the second keyword;

determining keyword information corresponding to the second keyword based on the vector corresponding to each word in the second keyword;

the encoding the target search term and determining the search term information corresponding to the target search term includes:

coding each character included in the target search word to obtain a vector corresponding to each character in the target search word;

and determining the search word information corresponding to the target search word based on the vector corresponding to each character in the target search word.

In a possible implementation manner, the determining, based on the target search term and the semantic relevance between the target search term and the second keyword, a picture category corresponding to the picture includes:

inputting the target search word and the semantic relevance between the target search word and the second keyword into a picture category determination model, wherein the picture category determination model is used for determining a picture category corresponding to the picture in the target search word based on the semantic relevance between the target search word and the second keyword;

And determining the picture category of the picture based on the output result of the picture category determination model.

In another aspect, an embodiment of the present application provides an apparatus for determining a picture category, where the apparatus includes:

the extraction module is used for extracting a first keyword corresponding to a picture from text information of the picture to be classified, wherein the text information is comment information corresponding to the picture;

the first determining module is used for determining a second keyword in the first keyword corresponding to the picture based on the target search term corresponding to the picture;

the second determining module is used for determining the semantic relevance of the target search word and the second keyword;

and the third determining module is used for determining the picture category corresponding to the picture based on the target search term and the semantic relevance between the target search term and the second keyword.

In a possible implementation manner, the first determining module is configured to determine whether the first determination result is a first determination result

In a possible implementation manner, the first determining module is configured to calculate a ratio of the number of character matches to a total number of characters of the target search term, and determine a first keyword of which the ratio satisfies a target threshold as a second keyword.

In one possible implementation, the apparatus further includes:

the input module is used for inputting the picture into a picture classification model, and the picture classification model is used for determining a search term corresponding to the picture;

the fourth determining module is used for determining a search term corresponding to the picture based on an output result of the picture classification model;

and the fifth determining module is used for determining a target search term in the search terms corresponding to the pictures.

In a possible implementation manner, the second determining module is configured to encode the second keyword and determine keyword information corresponding to the second keyword;

In a possible implementation manner, the second determining module is configured to encode each word included in the second keyword to obtain a vector corresponding to each word in the second keyword;

In a possible implementation manner, the third determining module is configured to input the target search term and the semantic relevance between the target search term and the second keyword into a picture category determining model, where the picture category determining model is configured to determine, based on the target search term and the semantic relevance between the target search term and the second keyword, a picture category corresponding to the picture in the target search term;

In another aspect, an electronic device is provided, which includes a processor and a memory, where at least one program code is stored in the memory, and the at least one program code is loaded and executed by the processor to implement any one of the above-mentioned picture category determination methods.

In another aspect, a computer-readable storage medium is provided, in which at least one program code is stored, and the at least one program code is loaded and executed by a processor to implement any one of the above methods for determining a picture category.

The technical scheme provided by the embodiment of the application at least has the following beneficial effects:

the technical scheme provided by the embodiment of the application determines the first keyword of the picture based on the text information of the picture to be classified, determines the second keyword in the first keyword corresponding to the picture based on the target search word corresponding to the picture, so that the determination of the second keyword is more accurate, determines the picture category of the picture in the target search word according to the semantic correlation between the second keyword and the target search word, and can accurately obtain the picture category based on the keyword and the search word of the picture under the conditions that the picture is not clear enough and the picture quality is poor due to the consideration of the text information corresponding to the picture, and the accuracy of the determined picture category can be improved to a certain extent.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an implementation environment of a method for determining a picture category according to an embodiment of the present application;

fig. 2 is a flowchart of a method for determining a picture category according to an embodiment of the present application;

fig. 3 is a schematic diagram of a process for determining a picture category according to an embodiment of the present application;

fig. 4 is a schematic diagram of a picture to be classified according to an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating a determination of picture categories according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an apparatus for determining picture categories according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of an implementation environment of a method for determining a picture category according to an embodiment of the present application, and as shown in fig. 1, the implementation environment includes: an electronic device 101 and a server 102.

The electronic device 101 may be at least one of a smart phone, a game console, a desktop computer, a tablet computer, an MP3(Moving Picture Experts Group Audio Layer III) player, an MP4(Moving Picture Experts Group Audio Layer IV) player, and a laptop computer. The electronic device 101 is configured to extract first keywords corresponding to the pictures from text information of the pictures to be classified, and determine second keywords from the first keywords corresponding to the pictures based on target search terms corresponding to the pictures. The electronic device 101 is configured to determine semantic relevance between the target search term and the second keyword, and the electronic device 101 is further configured to determine a picture category corresponding to the picture based on the semantic relevance between the target search term and the second keyword and the target search term.

The electronic device 101 may be generally referred to as one of a plurality of electronic devices, and the embodiment of the present application is illustrated by the electronic device 101. Those skilled in the art will appreciate that the number of electronic devices 101 described above may be greater or fewer. For example, the number of the electronic devices 101 may be only one, or the number of the electronic devices 101 may be tens or hundreds, or more, and the number and the device type of the electronic devices 101 are not limited in the embodiment of the present application.

The server 102 may be one server, may be multiple servers, and may be at least one of a cloud computing platform and a virtualization center. The server 102 may communicate with the electronic device 101 over a wired network or a wireless network. The server 102 may receive an acquisition request sent by the electronic device 101, and send text information corresponding to the pictures to be classified to the electronic device 101 based on the acquisition request. Alternatively, the number of the servers 102 may be more or less, and the embodiment of the present application is not limited thereto. Of course, the server 102 may also include other functional servers to provide more comprehensive and diverse services.

Based on the foregoing implementation environment, the embodiment of the present application provides a method for determining a picture category, which is implemented by the electronic device 101 in fig. 1, taking a flowchart of the method for determining a picture category provided in the embodiment of the present application as an example, as shown in fig. 2. As shown in fig. 2, the method comprises the steps of:

In step 201, a first keyword corresponding to a picture is extracted from text information of the picture to be classified.

In the embodiment of the application, the text information of the picture to be classified is comment information corresponding to the picture, and the first keyword corresponding to the picture is a noun in the comment information corresponding to the picture.

In a possible implementation manner, an application program for determining a picture category is installed and run in the electronic device, a user inputs a picture in the application program, the picture is taken as a picture to be classified, and the electronic device acquires text information corresponding to the picture, that is, comment information corresponding to the picture. If the text information corresponding to the picture is one, the text information is used as the text information corresponding to the picture, and if the text information corresponding to the picture is multiple, one or more text information is randomly selected from the multiple text information to be used as the text information corresponding to the picture.

In a possible implementation manner, the electronic device may have any one of the following implementation manners to obtain text information corresponding to the picture to be classified.

According to the first implementation manner, all pictures and text information corresponding to all the pictures are stored in the electronic equipment, the electronic equipment determines a storage space corresponding to the pictures based on codes corresponding to the pictures to be classified, and the text information corresponding to the pictures is acquired in the storage space corresponding to the pictures.

In the second implementation manner, the storage space of the server stores text information corresponding to all pictures, and the electronic device sends an acquisition request to the server, wherein the acquisition request carries codes of the pictures to be classified. The server receives the acquisition request, analyzes the acquisition request, obtains the code of the picture to be classified carried in the acquisition request, determines the storage space corresponding to the picture to be classified based on the code of the picture to be classified, acquires the text information corresponding to the picture to be classified in the storage space, and sends the acquired text information to the electronic equipment.

It should be noted that any implementation manner may be selected to obtain text information corresponding to the picture to be classified, which is not limited in the embodiment of the present application.

In a possible implementation manner, after the electronic device acquires text information of a picture, the text information is identified to obtain a noun in the text information, and the noun in the text information is used as a first keyword corresponding to the picture.

Fig. 3 is a schematic diagram of a process for determining a picture category according to an embodiment of the present application, in fig. 3, text information corresponding to pictures is "a big meeting of flowery and romantic fireworks in japan is unexpectedly met, a flower blooms in the air, and is like a gold coin full of gold laugh in the sky and like a fireworks rising up in the bonfire evening", keyword extraction is performed on the text information, and an obtained first keyword is "the flower, the sky, the fireworks, the gold coin, the bonfire, and the fireworks".

In step 202, a second keyword is determined among the first keywords corresponding to the picture based on the target search term corresponding to the picture.

In a possible implementation manner, before determining the second keyword in the first keyword corresponding to the picture based on the target search term corresponding to the picture, the target search term corresponding to the picture needs to be obtained first, and the obtaining process of the target search term corresponding to the picture includes the following steps one to three.

Step one, inputting the picture into a picture classification model, wherein the picture classification model is used for determining a retrieval word corresponding to the picture.

In a possible implementation manner, the image classification model is to identify an image to obtain characteristics of the image, determine image categories to which the image is likely to correspond according to the characteristics of the image, and use the image categories to which the image is likely to correspond as search terms corresponding to the image.

In a possible implementation manner, before inputting a picture into a picture classification model, the picture classification model needs to be obtained first, and the obtaining process of the picture classification model is as follows:

and obtaining the pictures with the determined picture categories and the picture categories corresponding to each picture, and training the initial picture classification model based on the pictures with the determined picture categories to obtain the picture classification model.

In a possible implementation manner, the initial image classification model may be a visual geometry Group network (VGGNet) model, a generation countermeasure network (GAN) model, or a Generation Image Description (GID) model, and the type of the initial image classification model is not limited in the embodiment of the present application.

In a possible implementation manner, based on the obtained image classification model, the image is input into the image classification model to obtain a search term corresponding to the image.

And step two, determining the retrieval words corresponding to the pictures based on the output results of the picture classification models.

In a possible implementation manner, the first step inputs the picture into the picture classification model, in which the picture classification model identifies the picture to obtain the feature of the picture, and determines the picture category corresponding to the picture according to the feature of the picture, that is, determines the search term corresponding to the picture.

Exemplarily, fig. 4 is a schematic diagram of a picture to be classified according to an embodiment of the present application, the picture of fig. 4 is input into the picture classification model, and a search term corresponding to the picture is obtained based on an output result of the picture classification model, where the search term includes red bean paste, plum juice, dark plum juice, tangerine peel red bean paste, plum, water, and brown sugar water.

And step three, determining a target search word in the search words corresponding to the pictures.

In a possible implementation manner, based on the search terms corresponding to the picture determined in the second step, the first N search terms in the search terms corresponding to the picture are determined as target search terms corresponding to the picture, where N is an integer greater than or equal to 1. For example, the first 5 search terms in the search terms corresponding to the picture are determined as the target search terms. Exemplarily, the target search words corresponding to the picture corresponding to fig. 4 are red bean paste, plum juice, dark plum juice, and tangerine peel red bean paste.

In a possible implementation manner, the process of determining the second keyword from the first keyword corresponding to the picture based on the target search term corresponding to the picture includes the following steps 2021 to 2022.

In step 2021, the first keywords corresponding to the picture are character-matched with the target search term to obtain the character matching number of each first keyword corresponding to the picture.

In a possible implementation manner, the first keyword corresponding to the picture obtained in step 201 is character-matched with the target search term corresponding to the picture obtained in step 202, so as to obtain the number of character matches corresponding to each first keyword. The character matching number refers to the number of identical characters between two words, that is, the number of identical characters between the first keyword and the target search word. For example, the first keyword is "braised pork in soy sauce", the target search word is "pork in pot", and the two words are subjected to character matching, so that the number of identical characters between the two words is 1, that is, the number of character matching of "braised pork in soy sauce" relative to "pork in pot" is 1.

In step 2022, the first keyword whose number of character matches satisfies the target requirement is determined as the second keyword.

In a possible implementation manner, based on the character matching number of each first keyword calculated in the above step 2021, the first keyword whose character matching number meets the target requirement is determined as the second keyword. For example, a first keyword with a character matching number greater than 1 is determined as a second keyword, and the embodiment of the present application is only described by taking the target requirement as greater than 1 as an example, and is not used to limit the present application. Of course, the target requirement may be other requirements, and the embodiment of the present application does not limit this.

In a possible implementation manner, in addition to the method for determining the second keyword in the first keyword, the following method may also be used for determining the second keyword in the first keyword corresponding to the picture.

Illustratively, the first keyword is "braised pork in soy sauce", the target search term is "pork in pot", the total number of characters of the target search term is 3, the number of character matches of the first keyword with the target search term is 1, and the ratio of the number of character matches of the first keyword to the total number of characters of the target search term is 1/3. If the target threshold is set to 1/2, 1/3 is smaller than 1/2, and therefore the first keyword "braised pork in brown sauce" cannot be determined as the second keyword. If the target threshold is set to 1/4, since 1/3 is greater than 1/4, the first keyword "braised pork in brown sauce" can be determined as the second keyword.

It should be noted that the target threshold may be determined based on experience, or may be adjusted according to an implementation environment, and the determination method and the numerical value of the target threshold are not limited in the embodiments of the present application.

It should be further noted that the second keyword may be determined in the first keyword corresponding to the picture based on any one of the above implementation manners, and the determination process of the second keyword is not limited in the embodiment of the present application.

In step 203, the semantic relevance of the target search term to the second keyword is determined.

In one possible implementation, determining the semantic relevance of the target search term to the second keyword includes steps 2031 to 2034 described below.

Step 2031, encoding the second keyword, and determining keyword information corresponding to the second keyword.

In a possible implementation manner, the process of determining the keyword information corresponding to the second keyword is as follows:

and step 1, coding each word included in the second keyword to obtain a vector corresponding to each word in the second keyword.

In a possible implementation manner, each word in the second keyword is input into a word vector (word _ embedding), and each word in the second keyword is encoded through the word _ embedding, so that a vector corresponding to each word is obtained.

Illustratively, three words of "red", "burned" and "meat" are respectively input into the word _ embedding, and the three words are encoded based on the word _ embedding, resulting in that the vector of "red" is (1, 768), "burned" is (1, 768), and "meat" is (1, 768).

And 2, determining keyword information corresponding to the second keyword based on the vector corresponding to each word in the second keyword.

In a possible implementation manner, the vectors of the words included in the second keyword obtained in step 1 are combined to obtain a vector corresponding to the second keyword, that is, keyword information corresponding to the second keyword.

Illustratively, since the vector of "red" obtained in the above step 1 is (1, 768), "burnt" is (1, 768), and "meat" is (1, 768), the vector of the second keyword "red-burnt meat" is (3, 768), that is, the keyword information corresponding to "red-burnt meat" is (3, 768).

It should be noted that, the above is only an example, the vector corresponding to each word in the second keyword may be determined empirically, or may be adjusted according to an implementation environment, and the determination of the vector of each word of the second keyword is not limited in the embodiment of the present application.

Step 2032, encoding the target search term, and determining the search term information corresponding to the target search term.

In one possible implementation, the process of determining the search term information of the target search term is as follows:

step 1, coding each character included in the target search term to obtain a vector corresponding to each character in the target search term.

In a possible implementation manner, the process is identical to the process of step 1 in step 2031, and is not described herein again.

And 2, determining search word information corresponding to the target search word based on the vector corresponding to each character in the target search word.

In a possible implementation manner, the process is identical to the process of step 2 in step 2031, and is not described herein again.

Step 2033, based on the keyword information and the search word information, calculating the semantic correlation between the keyword information and the search word information according to the attention mechanism.

In a possible implementation manner, an Attention (Attention) mechanism is a solution proposed by simulating human Attention, which is to simply quickly screen out high-value information from a large amount of information, and is mainly used for solving the problem that it is difficult to obtain final reasonable vector representation when an input sequence of an LSTM (Long Short-Term Memory Network)/RNN (Recurrent Neural Network) model is Long. In the Attention mechanism, weight is added to each character, so that semantic relevance between keyword information and search word information can be calculated according to the weight.

In a possible implementation manner, the keyword information and the search term information are input into a model added with an Attention mechanism, and the semantic correlation between the keyword information and the search term information can be obtained according to the output result of the model.

Step 2034, determining the semantic relevance between the keyword information and the search word information as the semantic relevance between the target search word and the second keyword.

In a possible implementation manner, based on the semantic relevance between the keyword information and the search word information obtained in the above step 2033, the semantic relevance between the keyword information and the search word information is determined as the semantic relevance between the corresponding second keyword and the target search word. For example, if the semantic relevance between the first keyword information and the first search term information is 0.5, the semantic relevance between the first second keyword and the first target search term is determined to be 0.5.

In step 204, a picture category corresponding to the picture is determined based on the target search term and the semantic relevance between the target search term and the second keyword.

In a possible implementation manner, determining the picture category corresponding to the picture based on the target search term and the semantic relevance between the target search term and the second keyword includes the following steps 2041 to 2042.

Step 2041, inputting the target search term and the semantic relevance between the target search term and the second keyword into the image category determination model.

In a possible implementation manner, the image category determination model is configured to determine an image category corresponding to an image in the target search term based on the target search term and a semantic relevance between the target search term and the second keyword.

In a possible implementation manner, the image category determination model is used for fusing a target search term of an image, semantic relevance between the target search term and a second keyword of the image and image features of the image, and determining an image category corresponding to the image by using fusion information. The picture category determination model is determined as follows:

and inputting the target search word, the semantic relevance between the target search word and the second keyword and the picture characteristics of the picture into the picture category determination model, and obtaining the picture category corresponding to the picture based on the output result of the picture category determination model.

The picture characteristics of the picture are obtained through the middle layer of the picture classification model, the picture classification model comprises a plurality of layers, the result output by the last layer of the picture classification model is the search word corresponding to the picture, and the middle layer of the picture classification model can output the picture characteristics of the picture. A picture characteristic is a characteristic or characteristic that a certain category of pictures is distinguished from other categories of pictures. Illustratively, the picture features of the picture include brightness features, edge features, texture features, color features, and the like of the picture.

For example, the image classification model has 5 layers, where the layer 3 is an image feature extraction layer, the result output by the layer 3 is the image feature of the image, and the result output by the layer 5 is the search term corresponding to the image.

In one possible implementation, a masking attention mechanism (mask attention) is added to the picture category determination model. The picture category determination model may be a Visual Geometry Group network (VGGNet) model, a generated confrontation network (GAN) model, or a Generated Image Description (GID) model, and the category of the picture category determination model is not limited in the embodiment of the present application.

Step 2042, determining the picture type of the picture based on the output result of the picture type determination model.

In one possible implementation, the result output by the picture category determination model is determined as the picture category of the picture.

Fig. 5 is a schematic diagram illustrating determination of picture categories according to an embodiment of the present application. Taking the category determination of the first picture in fig. 5 as an example, the first keyword corresponding to the picture is decoration, radish soaking, luxury, plum juice, sour, taste, radish, appetizing, pickled, chinese cabbage, and peanut, the target search term of the picture includes 5 search terms, which are red bean paste, plum juice, plum soup, dark plum juice, and tangerine peel red bean paste, respectively, and the determined picture category of the picture is plum juice based on the first keyword and the target search term of the picture. The process of determining the picture category of the other pictures is the same as the process of determining the picture category of the first picture, and is not described herein again.

According to the method, the first keywords of the picture are determined based on the text information of the picture to be classified, the second keywords are determined in the first keywords corresponding to the picture based on the target search words corresponding to the picture, so that the picture category of the picture is determined in the target search words according to the semantic relevance between the second keywords and the target search words, the picture category can be accurately obtained based on the keywords and the search words of the picture under the conditions that the picture is not clear enough and the picture quality is poor, and the accuracy of the determined picture category can be improved to a certain extent.

In addition, the method for determining the picture category not only fully utilizes the result of the picture classification model, but also takes the text information corresponding to the picture into consideration, fuses the target search word, the semantic relevance of the target search word and the second keyword and the picture characteristics of the picture, and determines the picture category of the picture by utilizing the fusion information, so that the accuracy of the determined picture category can be further improved.

Fig. 3 is a schematic diagram illustrating a process for determining a picture category according to an embodiment of the present application. In this fig. 3, the text information that the picture corresponds is "the unexpected meeting of japanese flowery and romantic fireworks majors in japan, looks at a certain fireworks and soars up, just as if the sky spills over the golden coin that blooms and laugh, also just as if the bonfire goes up the fireworks that ran and rise late", carries out keyword extraction to this text information, obtains first keyword and is fireworks, sky, gold coin, bonfire, fireworks, the target search word that the picture corresponds is dandelion, fireworks, urchin, shaddock, loquat. And determining the second keywords to be fireworks, bonfires and fireworks in the first keywords corresponding to the picture based on the first keywords corresponding to the picture and the target search word. And calculating the semantic correlation between the keyword information and the search word information according to the keyword information of the second keyword and the search word information of the target search word to obtain the semantic correlation between the target search word and the second keyword. The semantic relevance of the dandelion and the second keyword is taken as an example for explanation, the semantic relevance of the dandelion and the fireworks is 0.3, the semantic relevance of the dandelion and the bonfire is 0.2, the semantic relevance of the dandelion and the fireworks is 0.2, the semantic relevance of other target search words and the semantic relevance of the second keyword are detailed in fig. 3, and the semantic relevance is not repeated here. The picture is input into the picture classification model, picture features corresponding to the picture are obtained on the basis of an intermediate layer of the picture classification model, the target search word, the picture features and semantic relevance of the target search word and the second keyword are input into the picture category determination model, and the picture category of the picture is determined to be fireworks on the basis of an output result of the picture category determination model.

Fig. 6 is a schematic structural diagram of an apparatus for determining a picture category according to an embodiment of the present application, and as shown in fig. 6, the apparatus includes:

the extraction module 601 is configured to extract a first keyword corresponding to a picture from text information of the picture to be classified, where the text information is comment information corresponding to the picture;

a first determining module 602, configured to determine a second keyword from the first keyword corresponding to the picture based on the target search term corresponding to the picture;

a second determining module 603, configured to determine semantic relevance between the target search term and the second keyword;

the third determining module 604 is configured to determine a picture category corresponding to the picture based on the target search term and the semantic relevance between the target search term and the second keyword.

In a possible implementation manner, the first determining module 602 is configured to determine whether the first signal is a first signal

and determining the first keyword of which the character matching number meets the target requirement as a second keyword.

In a possible implementation manner, the first determining module 602 is configured to calculate a ratio of the number of character matches to the total number of characters of the target search term, and determine a first keyword of which the ratio satisfies a target threshold as a second keyword.

In one possible implementation, the apparatus further includes:

and the fifth determining module is used for determining a target search term in the search terms corresponding to the picture.

In a possible implementation manner, the second determining module 603 is configured to encode the second keyword, and determine keyword information corresponding to the second keyword;

encoding the target search word and determining search word information corresponding to the target search word;

based on the keyword information and the search word information, calculating semantic correlation between the keyword information and the search word information according to an attention mechanism;

In a possible implementation manner, the second determining module 603 is configured to encode each word included in the second keyword to obtain a vector corresponding to each word in the second keyword;

and determining the search term information corresponding to the target search term based on the vector corresponding to each character in the target search term.

In a possible implementation manner, the third determining module 604 is configured to input the target search term and the semantic relevance between the target search term and the second keyword into a picture category determining model, where the picture category determining model is configured to determine a picture category corresponding to the picture in the target search term based on the target search term and the semantic relevance between the target search term and the second keyword;

and determining the picture type of the picture based on the output result of the picture type determination model.

The device determines the first keywords of the picture based on the text information of the picture to be classified, determines the second keywords in the first keywords corresponding to the picture based on the target search term corresponding to the picture, so that the determination of the second keywords is more accurate, determines the picture category of the picture in the target search term according to the semantic relevance between the second keywords and the target search term, and can accurately obtain the picture category based on the keywords and the search term of the picture under the conditions that the picture is not clear enough and the picture quality is poor due to the consideration of the text information corresponding to the picture, and the accuracy of the determined picture category can be improved to a certain extent.

It should be noted that: the image category determining apparatus provided in the foregoing embodiment is only illustrated by the division of the functional modules when determining the image category, and in practical applications, the above function allocation may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the apparatus for determining a picture category and the method for determining a picture category provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail, and are not described herein again.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 700 may be: a smart phone, a tablet computer, an MP3(Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4) player, a notebook computer or a desktop computer. Electronic device 700 may also be referred to by other names as user equipment, portable electronic device, laptop electronic device, desktop electronic device, and so on.

In general, the electronic device 700 includes: one or more processors 711 and one or more memories 702.

The processor 711 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 711 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 711 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 711 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 711 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 702 may include one or more computer-readable storage media, which may be non-transitory. Memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 702 is used to store at least one program code for execution by the processor 711 to implement the method for determining a picture category provided by the method embodiments herein.

In some embodiments, the electronic device 700 may further optionally include: a peripheral interface 703 and at least one peripheral. The processor 711, memory 702, and peripheral interface 703 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 703 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 704, display 705, camera 706, audio circuitry 707, positioning components 708, and power source 709.

The peripheral interface 703 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 711 and the memory 702. In some embodiments, the processor 711, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 711, the memory 702 and the peripheral interface 703 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 704 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 704 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 704 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 704 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 704 may communicate with other electronic devices via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 704 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 705 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 705 is a touch display screen, the display screen 705 also has the ability to capture touch signals on or over the surface of the display screen 705. The touch signal may be input to the processor 711 as a control signal for processing. At this point, the display 705 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 705 may be one, providing the front panel of the electronic device 700; in other embodiments, the number of the display screens 705 may be at least two, and the at least two display screens are respectively disposed on different surfaces of the electronic device 700 or are in a folding design; in still other embodiments, the display 705 may be a flexible display disposed on a curved surface or on a folded surface of the electronic device 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The Display 705 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or the like.

The camera assembly 706 is used to capture images or video. Optionally, camera assembly 706 includes a front camera and a rear camera. Generally, a front camera is disposed on a front panel of an electronic apparatus, and a rear camera is disposed on a rear surface of the electronic apparatus. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals into the processor 711 for processing or inputting the electric signals into the radio frequency circuit 704 to realize voice communication. For stereo capture or noise reduction purposes, the microphones may be multiple and disposed at different locations of the electronic device 700. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert the electrical signal from the processor 711 or the radio frequency circuit 704 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 707 may also include a headphone jack.

The positioning component 708 is operable to locate a current geographic location of the electronic device 700 to implement navigation or LBS (location based Service). The positioning component 708 may be a positioning component based on the GPS (global positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.

The power supply 709 is used to supply power to various components in the electronic device 700. The power source 709 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When power source 709 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the electronic device 700 also includes one or more sensors 170. The one or more sensors 170 include, but are not limited to: acceleration sensor 711, gyro sensor 712, pressure sensor 713, fingerprint sensor 714, optical sensor 715, and proximity sensor 716.

The acceleration sensor 711 may detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the electronic device 700. For example, the acceleration sensor 711 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 711 may control the display screen 705 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 711. The acceleration sensor 711 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 712 may detect a body direction and a rotation angle of the electronic device 700, and the gyro sensor 712 may cooperate with the acceleration sensor 711 to acquire a 3D motion of the user with respect to the electronic device 700. The processor 711 may implement the following functions according to the data collected by the gyro sensor 712: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 713 may be disposed on a side bezel of electronic device 700 and/or underlying display screen 705. When the pressure sensor 713 is disposed on a side frame of the electronic device 700, a user's holding signal of the electronic device 700 may be detected, and the processor 711 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 713. When the pressure sensor 713 is disposed at a lower layer of the display screen 705, the processor 711 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 705. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 714 is used for collecting a fingerprint of the user, and the processor 711 identifies the user according to the fingerprint collected by the fingerprint sensor 714, or the fingerprint sensor 714 identifies the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 711 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for, and changing settings, etc. The fingerprint sensor 714 may be disposed on the front, back, or side of the electronic device 700. When a physical button or vendor Logo is provided on the electronic device 700, the fingerprint sensor 714 may be integrated with the physical button or vendor Logo.

The optical sensor 715 is used to collect the ambient light intensity. In one embodiment, the processor 711 may control the display brightness of the display screen 705 according to the ambient light intensity collected by the optical sensor 715. Specifically, when the ambient light intensity is high, the display brightness of the display screen 705 is increased; when the ambient light intensity is low, the display brightness of the display screen 705 is adjusted down. In another embodiment, the processor 711 may also dynamically adjust the shooting parameters of the camera assembly 706 according to the intensity of the ambient light collected by the optical sensor 715.

A proximity sensor 716, also referred to as a distance sensor, is typically disposed on the front panel of the electronic device 700. The proximity sensor 716 is used to capture the distance between the user and the front of the electronic device 700. In one embodiment, the processor 711 controls the display 705 to switch from the bright screen state to the dark screen state when the proximity sensor 716 detects that the distance between the user and the front surface of the electronic device 700 is gradually decreased; when the proximity sensor 716 detects that the distance between the user and the front surface of the electronic device 700 is gradually increased, the processor 711 controls the display 705 to switch from the breath-screen state to the bright-screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 7 does not constitute a limitation of the electronic device 700 and may include more or fewer components than those shown, or combine certain components, or employ a different arrangement of components.

Fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 800 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 801 and one or more memories 802, where at least one program code is stored in the one or more memories 802, and is loaded and executed by the one or more processors 801 to implement the method for determining the picture category provided by the foregoing method embodiments. Of course, the server 800 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the server 800 may also include other components for implementing the functions of the device, which are not described herein again.

In an exemplary embodiment, a computer-readable storage medium is further provided, in which at least one program code is stored, and the at least one program code is loaded and executed by a processor of a computer device to implement any one of the above-mentioned picture category determination methods.

Alternatively, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

The above description is only exemplary of the present application and is not intended to limit the present application, and any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for determining picture category, the method comprising:

2. The method according to claim 1, wherein the determining a second keyword among the first keywords corresponding to the picture based on the target search term corresponding to the picture comprises:

3. The method according to claim 2, wherein the determining the first keyword with the character matching number meeting the target requirement as the second keyword comprises:

4. The method according to any one of claims 1 to 3, wherein before determining the second keyword in the first keyword corresponding to the picture based on the target search term corresponding to the picture, the method further comprises:

5. The method according to any one of claims 1 to 3, wherein the determining the semantic relevance of the target search term to the second keyword comprises:

6. The method of claim 5, wherein the encoding the second keyword and determining the keyword information corresponding to the second keyword comprises:

7. The method according to any one of claims 1 to 3, wherein the determining the picture category corresponding to the picture based on the target search term and the semantic relevance between the target search term and the second keyword comprises:

8. An apparatus for determining picture category, the apparatus comprising:

9. The apparatus according to claim 8, wherein the first determining module is configured to perform character matching on the first keyword corresponding to the picture and the target search term to obtain a character matching number of each first keyword of the picture;

10. The apparatus of claim 9, wherein the first determining module is configured to calculate a ratio of the number of character matches to a total number of characters of the target search term, and determine a first keyword with the ratio satisfying a target threshold as a second keyword.

11. The apparatus of any of claims 8-10, further comprising:

12. The apparatus according to any one of claims 8-10, wherein the second determining module is configured to encode the second keyword and determine keyword information corresponding to the second keyword;

13. The apparatus according to claim 12, wherein the second determining module is configured to encode each word included in the second keyword to obtain a vector corresponding to each word in the second keyword;

14. An electronic device, comprising a processor and a memory, wherein at least one program code is stored in the memory, and wherein the at least one program code is loaded and executed by the processor to implement the method for determining a picture class according to any one of claims 1 to 7.

15. A computer-readable storage medium having stored therein at least one program code, the at least one program code being loaded and executed by a processor, to implement the method for determining a picture class according to any one of claims 1 to 7.