CN115761717A

CN115761717A - Method and device for identifying topic image, electronic equipment and storage medium

Info

Publication number: CN115761717A
Application number: CN202211399051.7A
Authority: CN
Inventors: 兴百桥
Original assignee: Shenzhen Xingtong Technology Co ltd
Current assignee: Shenzhen Xingtong Technology Co ltd
Priority date: 2022-11-09
Filing date: 2022-11-09
Publication date: 2023-03-07

Abstract

The disclosure provides a topic image identification method and device, electronic equipment and a storage medium, and belongs to the field of image processing. The method comprises the following steps: acquiring a target image to be identified; determining whether the target image is a title image or not based on the image characteristics of the target image; determining the target image as a topic image in response to the image features based on the target image, and determining the confidence of the target image as the topic image based on the image features and the text features of the target image; determining whether the target image is a title image again based on the confidence coefficient; determining whether the target image is a topic image again based on the confidence and the similarity between the target image and each preset topic in response to determining that the target image is the topic image based on the confidence; and determining the target image as a theme image and determining a theme identification result corresponding to the target image in response to the confidence and the similarity between the target image and each preset theme. By adopting the method and the device, the non-topic image can be accurately filtered.

Description

Method and device for identifying topic image, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image processing, and in particular, to a method and an apparatus for identifying a question image, an electronic device, and a storage medium.

Background

When the user uses the answering system, the question to be answered can be uploaded to the answering system in an image form, and the answering system automatically returns a question which is most similar to the question and has a detailed answer according to the uploaded image.

In practice, the user may take a non-topical image at will, which does not require a solution to be returned. However, existing systems often have difficulty filtering such non-topical images, such as commercial advertising images, newspapers, product specifications, and the like. Therefore, the vulnerability of the answering system can be easily grasped, a non-question image is generated by technical simulation, the answering system is requested, and the normal question and answer of the answering system are crawled for analysis, so that the data leakage of the answering system is caused.

Therefore, a method for identifying topic images is needed to filter non-topic images.

Disclosure of Invention

In view of this, embodiments of the present disclosure provide a method and an apparatus for identifying a topic image, an electronic device, and a storage medium, which can implement accurate filtering on a non-topic image.

According to an aspect of the present disclosure, there is provided a method for identifying a topic image, the method including:

acquiring a target image to be identified;

determining whether the target image is a title image or not based on the image characteristics of the target image;

in response to determining that the target image is a topic image based on image features of the target image, determining a confidence that the target image is a topic image based on image features and text features of the target image;

determining whether the target image is a topic image again based on the confidence coefficient;

responding to the fact that the target image is determined to be a topic image based on the confidence coefficient, and determining whether the target image is a topic image again based on the confidence coefficient and the similarity between the target image and each preset topic;

and determining the target image as a theme image and determining a theme identification result corresponding to the target image in response to the confidence and the similarity between the target image and each preset theme.

According to another aspect of the present disclosure, there is provided an apparatus for recognizing a title image, the apparatus including:

the acquisition module is used for acquiring a target image to be identified;

the first judgment module is used for determining whether the target image is a theme image or not based on the image characteristics of the target image;

the second judgment module is used for responding to the image characteristics of the target image to determine that the target image is the theme image, and determining the confidence coefficient of the target image as the theme image based on the image characteristics and the text characteristics of the target image; determining whether the target image is a topic image again based on the confidence coefficient;

the third judging module is used for responding to the fact that whether the target image is the theme image or not is determined based on the confidence coefficient, and determining whether the target image is the theme image again based on the confidence coefficient and the similarity between the target image and each preset theme;

and the determining module is used for responding to the similarity between the target image and each preset theme based on the confidence coefficient, determining the target image as a theme image and determining a theme identification result corresponding to the target image.

According to another aspect of the present disclosure, there is provided an electronic device including:

a processor; and

a memory for storing the program, wherein the program is stored in the memory,

wherein the program includes instructions that, when executed by the processor, cause the processor to execute the method of identifying the topic image.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the title image recognition method.

According to the method and the device, after the target image to be recognized is obtained, three layers of judgment can be carried out to judge whether the target image is the topic image, wherein the first layer of judgment is carried out based on the image characteristics of the target image, the second layer of judgment is carried out based on the image characteristics and the text characteristics of the target image, the third layer of judgment is carried out based on the similarity between the target image and each preset topic and the confidence coefficient obtained by calculation in the second layer of judgment, the information amount used by the three layers of judgment is gradually increased, and the accuracy is gradually improved. Therefore, with the present disclosure, accurate filtering of non-topic images can be achieved.

Drawings

Further details, features and advantages of the disclosure are disclosed in the following description of exemplary embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 illustrates a flow chart of a method for identifying a topic image provided in accordance with an exemplary embodiment of the present disclosure;

FIG. 2 illustrates a second determination processing flow diagram provided in accordance with an exemplary embodiment of the present disclosure;

fig. 3 illustrates a fully connected network schematic provided in accordance with an exemplary embodiment of the present disclosure;

FIG. 4 shows a schematic block diagram of a multimedia search apparatus provided in accordance with an exemplary embodiment of the present disclosure;

FIG. 5 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein is intended to be open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description. It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The method can perform multiple identification on non-topic images, and further improves the accuracy of identifying the non-topic images. The method may be performed by a terminal, server, and/or other processing-capable device. The method provided by the embodiment of the present disclosure may be completed by any one of the above devices, or may be completed by a plurality of devices together, which is not limited in the present disclosure.

The method will be described with reference to a flow chart of the title image recognition method shown in fig. 1. The method comprises the following steps 101-105.

Step 101, a target image to be identified is acquired.

In some application scenarios, the target image may be identified to determine whether it is a topic image. For example, when the user needs to search for the answer to the question, the user may use the terminal to photograph a problem book or the like, and use the photographed image as a target image. For another example, in some image processing processes, the original image may be clipped, and whether each part of the image is a topic image may be determined.

The present embodiment does not limit the specific application scenario of identifying the topic image.

And 102, determining whether the target image is a theme image or not based on the image characteristics of the target image.

In one possible implementation, after the target image is acquired, a preliminary determination may be made thereon. Based on the image characteristics of the target image, images which are different from the topic image can be preliminarily screened out.

Optionally, the processing of step 102 may be as follows: inputting the target image into a lightweight image two-classification model, and determining the probability of the target image being a title image based on the image characteristics of the target image in the lightweight image two-classification model; and determining the target image as the theme image in response to the probability being greater than or equal to a preset probability threshold.

The lightweight image classification model ii may adopt a model such as squeezet, mobileNet, shuffleNet, and the like, and this embodiment does not limit the specifically adopted model. By adopting the lightweight image classification model, the operation efficiency of the system can be improved while the normal subject image is recalled.

In one possible implementation, the lightweight image binary model may be trained in advance, so that it may sufficiently learn the image features of the topic image, and thus may perform binary judgment on the image as to whether it is a topic image or not based on the image features. The embodiment does not limit the specific training process.

The target image is pre-processed, such as scaled to a fixed size, image normalization, etc., to accommodate the lightweight image dichotomy model. Inputting the preprocessed target image into a lightweight image binary model, processing image data in the lightweight image binary model, and determining the probability that the target image belongs to the topic image, thereby judging whether the probability is greater than a preset probability threshold value.

When the probability is smaller than the probability threshold value, the target image is regarded as a non-topic image, and corresponding prompt information can be returned to remind the user that the target image is not a topic image. Further, the subsequent process may not be continued.

When the probability is greater than or equal to the probability threshold, the target image is regarded as the topic image, and the process may proceed to step 103 for subsequent processing.

Step 103, in response to determining that the target image is a topic image based on the image features of the target image, determining that the target image is a confidence of the topic image based on the image features and the text features of the target image, and determining whether the target image is the topic image again based on the confidence.

In a possible implementation manner, after the target image passes the first determination, the image feature and the text feature may be used to further determine whether the target image is a topic image.

The specific treatment may be as follows:

performing text recognition on the target image to obtain a text characteristic vector of the target image;

extracting image features of the target image to obtain an image feature vector of the target image;

performing feature extraction on the text feature vector and the image feature vector through a first full-connection network to obtain an intermediate feature vector;

and determining the confidence degree of the target image as the topic image based on the intermediate feature vector through a second fully-connected network.

In one possible implementation, referring to the second determination processing flow shown in fig. 2, the target image may be input into an OCR (Optical Character Recognition) module for text Recognition, and the OCR module may include a text detection module and a text Recognition module. In the OCR module, the target image is input into the text detection module, the target image is processed through the text detection module, and text line position information in the target image is output. And then inputting the text line position information and the target image into a text recognition module, and extracting text characteristic vectors corresponding to text contents in the target image. Optionally, after text recognition is performed on the target image, text content in the target image may also be obtained.

Inputting the target image into a feature extraction module (such as a backbone network backbone of a MobileNet model), and outputting an image feature vector of the target image. And inputting the image characteristic vector and the text characteristic vector into a first full-connection network, extracting the characteristics of the image characteristic vector and the text characteristic vector through the first full-connection network, and outputting an intermediate characteristic vector. And then, inputting the intermediate characteristic vector into a second full-connection network, calculating the intermediate characteristic vector through the second full-connection network, and outputting the confidence coefficient that the target image is the topic image so as to judge whether the target image is the topic image.

When the confidence is smaller than a preset confidence threshold, the target image is considered to be a non-topic image, and corresponding prompt information can be returned at the moment to remind a user that the target image is not a topic image. Further, the subsequent process may not be continued.

When the confidence is greater than or equal to the preset confidence threshold, the target image is considered as the topic image, and the process may proceed to step 104 for subsequent processing.

Optionally, after obtaining the intermediate feature vector, feature extraction may be performed on the intermediate feature vector through a third full-connection network to obtain a target feature vector of the target image, where the target feature vector carries image feature information and text feature information, that is, represents image features and text features at the same time.

As shown in the fully-connected network diagram of fig. 3, the fully-connected network may include a first fully-connected network, a second fully-connected network, and a third fully-connected network, where the second fully-connected network may be used to determine the confidence that the target image is the topic image, and the third fully-connected network may be used to determine the target feature vector.

The image feature vector and the text feature vector are input into a first fully-connected network, and the processed intermediate feature vector can be respectively input into a second fully-connected network and a third fully-connected network. And outputting the confidence coefficient of the target image as the topic image through the processing of the second full-connection network. And outputting to obtain the target characteristic vector through the processing of the third fully-connected network.

It should be noted that the OCR module can be replaced by other models for text recognition, such as Attention-based models; the above feature extraction model may also be replaced by other models that can be used to extract image features, such as convolutional neural networks. The present embodiment does not limit the specific model used.

The lightweight MobileNet model is used for reducing the processing amount, so that the topic image identification method provided by the disclosure can be applied to a mobile terminal, and the application range of the disclosure is expanded.

And 104, in response to the fact that whether the target image is the topic image is determined based on the confidence, determining whether the target image is the topic image again based on the confidence and the similarity between the target image and each preset topic.

In one possible implementation, topic contents may be stored in an ES (electronic Search) Search library in advance, and the topic contents may include text contents (such as topic stem contents, answer contents, and the like) of each topic and corresponding topic images.

And when the target image passes the second judgment, calculating the similarity between the target image and each preset title. Optionally, the similarity may refer to a first similarity determined based on the target feature vector, and/or a second similarity determined based on the text content. The similarity between the target image and each preset topic is determined in the following way: determining a first similarity between a target feature vector of a target image and a target feature vector of each preset topic, and/or determining a second similarity between text content of the target image and text content of each preset topic.

For similarity calculation based on target feature vectors, possible implementations are as follows:

and extracting the text characteristic vector and the image characteristic vector of each preset topic in the same way of constructing the target characteristic vector, and outputting the target characteristic vector of the preset topic. Therefore, after the target feature vector of the target image is obtained in step 103, each preset topic may be traversed in the ES search library, and a first similarity between the target feature vector of the target image and the target feature vector of the preset topic may be calculated. For example, a cosine similarity algorithm may be specifically used for calculation, and the specific vector similarity algorithm is not limited in this embodiment. Optionally, in order to improve processing efficiency, topic contents and similarities of k1 (k 1 is an integer greater than 0) preset topics with the largest first similarity may be obtained and returned for subsequent processing.

For similarity calculation based on text content, possible embodiments are as follows:

and acquiring the text content of the target image identified in the process, traversing each preset topic in the ES search library, and calculating a second similarity between the text content of the target image and the text content of the preset topic. For example, the calculation may specifically be performed by using a TF-IDF (Term Frequency-Inverse Document Frequency) algorithm, and the specific text similarity calculation method is not limited in this embodiment. Optionally, in order to improve the processing efficiency, the topic contents and the similarity of k2 (k 2 is an integer greater than 0) preset topics with the largest second similarity may be obtained and returned for subsequent processing.

Optionally, after the similarity is calculated, the processing in step 104 includes: determining a reference question in each preset question based on the similarity between the target image and each preset question, and acquiring the target similarity corresponding to the reference question; and determining whether the target image is the topic image or not based on the degree of similarity corresponding to the confidence and the reference topic.

In a possible implementation manner, after the topic contents and the corresponding similarities of the returned preset topics are obtained, the confidence of the target image obtained through calculation in the step 103 may be obtained, the confidence of the target image, the returned topic contents and the corresponding similarities are used as input, and the returned preset topics are accurately sorted through a fine sorting algorithm, so as to obtain a sorted topic sequence. Or, the determined similarity degrees may be simply sorted from large to small, and the topic sequence corresponding to the sorted similarity degree sequence is obtained.

In the topic sequence, a topic with the first rank (i.e., a topic optimally matching with the target image) may be used as a reference topic, or a few topics with a preset number of positions (e.g., a 5 th topic) may be used as reference topics, and the similarity of the reference topics is obtained.

Further, the similarity score of the reference topic may be calculated by the following formula:

Pq＝W1*Pq1+W2*Pq2

in the formula, pq is the similarity score of the reference topic, pq1 is the confidence coefficient of the target image, pq2 is the target similarity corresponding to the reference topic, W1 is the weight corresponding to the confidence coefficient of the target image, and W2 is the weight corresponding to the similarity of the reference topic.

And judging the similarity score of the reference topic and a preset score threshold value. If the similarity score is larger than the score threshold value, the target image can be regarded as a topic image; if the similarity score is not greater than the score threshold, the target image can be considered as a non-topic image.

And 105, determining the target image as a topic image and determining a topic identification result corresponding to the target image in response to the confidence and the similarity between the target image and each preset topic.

In a possible implementation manner, if the target image passes the third determination and can be determined as the topic image, the topic contents of a preset number of topics ranked at the top in the topic sequence can be obtained as the corresponding topic identification result.

If the target image is determined to be a non-topic image in step 104, corresponding prompt information may be returned as a corresponding topic identification result to remind the user that the target image is not a topic image.

In the embodiment of the disclosure, after a target image to be recognized is obtained, three layers of judgment can be performed to determine whether the target image is a topic image, wherein the first layer of judgment is performed based on image features of the target image, the second layer of judgment is performed based on image features and text features of the target image, the third layer of judgment is performed based on similarity between the target image and each preset topic and confidence calculated in the second layer of judgment, information used in the three layers of judgment is gradually increased, and accuracy is gradually improved. Therefore, with the present disclosure, accurate filtering of non-topic images can be achieved.

The embodiment of the disclosure provides a topic image identification device, which is used for realizing the identification method of the topic image. As shown in fig. 4, the apparatus 400 for recognizing a title image includes: the device comprises an acquisition module 401, a first judgment module 402, a second judgment module 403, a third judgment module 404, and a determination module 405.

An obtaining module 401, configured to obtain a target image to be identified;

a first determining module 402, configured to determine whether the target image is a title image based on image features of the target image;

a second determination module 403, configured to determine, in response to determining that the target image is a topic image based on the image features of the target image, a confidence level that the target image is a topic image based on the image features and the text features of the target image; determining whether the target image is a topic image again based on the confidence;

a third determining module 404, configured to determine whether the target image is a topic image again based on the confidence and the similarity between the target image and each preset topic in response to determining whether the target image is a topic image based on the confidence;

and a determining module 405, configured to determine, in response to the confidence and the similarity between the target image and each preset topic, that the target image is a topic image, and determine a topic identification result corresponding to the target image.

Optionally, the first determining module 402 is configured to:

inputting the target image into a lightweight image two-classification model, and determining the probability that the target image is a topic image based on the image characteristics of the target image in the lightweight image two-classification model;

and determining the target image as a theme image in response to the probability being greater than or equal to a preset probability threshold.

Optionally, the second determining module 403 is configured to:

performing text recognition on the target image to obtain a text feature vector of the target image;

carrying out image feature extraction on the target image to obtain an image feature vector of the target image;

determining, through a second fully connected network, a confidence level that the target image is a topic image based on the intermediate feature vectors.

Optionally, the second determining module 403 is further configured to: after the target image is subjected to text recognition, text content in the target image is obtained; after the intermediate characteristic vector is obtained, performing characteristic extraction on the intermediate characteristic vector through a third full-connection network to obtain a target characteristic vector of the target image, wherein the target characteristic vector carries image characteristic information and text characteristic information;

the second determining module 403 is configured to:

determining a first similarity between the target feature vector of the target image and the target feature vector of each preset topic; and/or

And determining second similarity between the text content of the target image and the text content of each preset topic.

Optionally, the third determining module 404 is configured to:

determining a reference topic in each preset topic based on the similarity between the target image and each preset topic, and acquiring a target similarity corresponding to the reference topic, wherein the similarity comprises a first similarity determined based on a target feature vector and/or a second similarity determined based on text content;

calculating the similarity score of the benchmark topic by the following formula:

Pq＝W1*Pq1+W2*Pq2

in the formula, pq is the similarity score of the reference topic, pq1 is the confidence coefficient of the target image, pq2 is the target similarity corresponding to the reference topic, W1 is the weight corresponding to the confidence coefficient of the target image, and W2 is the weight corresponding to the target similarity of the reference topic;

and determining the target image as a theme image in response to the similarity score being larger than a preset score threshold value.

In the embodiment of the disclosure, after a target image to be recognized is obtained, three layers of judgment can be performed to determine whether the target image is a question image, wherein the first layer of judgment is performed based on image features of the target image, the second layer of judgment is performed based on image features and text features of the target image, the third layer of judgment is performed based on similarity between the target image and each preset question and confidence obtained by calculation in the second layer of judgment, information used in the three layers of judgment is gradually increased, and accuracy is gradually improved. Therefore, with the present disclosure, accurate filtering of non-topic images can be achieved.

An exemplary embodiment of the present disclosure also provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor, the computer program, when executed by the at least one processor, is for causing the electronic device to perform a method according to an embodiment of the disclosure.

The exemplary embodiments of the present disclosure also provide a non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is adapted to cause the computer to perform a method according to an embodiment of the present disclosure.

The exemplary embodiments of the present disclosure also provide a computer program product comprising a computer program, wherein the computer program, when executed by a processor of a computer, is adapted to cause the computer to perform a method according to an embodiment of the present disclosure.

Referring to fig. 5, a block diagram of a structure of an electronic device 500, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the electronic device 500 includes a computing unit 501, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The computing unit 501, the ROM 502, and the RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in the electronic device 500 are connected to the I/O interface 505, including: an input unit 506, an output unit 507, a storage unit 508, and a communication unit 509. The input unit 506 may be any type of device capable of inputting information to the electronic device 500, and the input unit 506 may receive input numerical or text information and generate key signal inputs related to user settings and/or function controls of the electronic device. Output unit 507 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. The storage unit 508 may include, but is not limited to, a magnetic disk, an optical disk. The communication unit 509 allows the electronic device 500 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 501 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 501 performs the various methods and processes described above. For example, in some embodiments, the above-described method of identifying a topic image can be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 500 via the ROM 502 and/or the communication unit 509. In some embodiments, the computing unit 501 may be configured to perform the above-described identification method of the topic image by any other suitable means (e.g., by means of firmware).

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

As used in this disclosure, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Claims

1. A method for identifying a topic image, the method comprising:

acquiring a target image to be identified;

determining whether the target image is a theme image or not based on the image characteristics of the target image;

in response to determining that the target image is a topic image based on the image features of the target image, determining a confidence level that the target image is a topic image based on the image features and the text features of the target image;

2. The method of claim 1, wherein determining whether the target image is a topic image based on image features of the target image comprises:

3. The method of claim 1, wherein determining the confidence level that the target image is the topic image based on the image features and the text features of the target image comprises:

4. The method of claim 3, wherein after the text recognition of the target image, further comprising: obtaining text content in the target image;

after the obtaining of the intermediate feature vector, the method further includes: extracting features of the intermediate feature vector through a third full-connection network to obtain a target feature vector of the target image, wherein the target feature vector carries image feature information and text feature information;

the similarity between the target image and each preset theme is determined in the following way:

5. The method according to any one of claims 1 to 4, wherein the step of determining whether the target image is a topic image again based on the confidence level and the similarity between the target image and each preset topic comprises:

calculating the similarity score of the reference topic by the following formula:

Pq＝W1*Pq1+W2*Pq2

6. An apparatus for recognizing a topic image, the apparatus comprising:

the acquisition module is used for acquiring a target image to be identified;

the second judgment module is used for responding to the image characteristics of the target image to determine that the target image is the theme image, and determining the confidence coefficient of the target image as the theme image based on the image characteristics and the text characteristics of the target image; determining whether the target image is a topic image again based on the confidence;

7. The apparatus of claim 6, wherein the second determining module is configured to:

8. The apparatus according to any one of claims 6-7, wherein the third determining module is configured to:

Pq＝W1*Pq1+W2*Pq2

9. An electronic device, comprising:

a processor; and

a memory for storing the program, wherein the program is stored in the memory,

wherein the program comprises instructions which, when executed by the processor, cause the processor to carry out the method according to any one of claims 1-5.

10. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-5.