CN116612488A

CN116612488A - Book cover identification method and device, electronic equipment and storage medium

Info

Publication number: CN116612488A
Application number: CN202310603847.8A
Authority: CN
Inventors: 兴百桥
Original assignee: Shenzhen Xingtong Technology Co ltd
Current assignee: Shenzhen Xingtong Technology Co ltd
Priority date: 2023-05-25
Filing date: 2023-05-25
Publication date: 2023-08-18

Abstract

The disclosure provides a book cover identification method, a device, an electronic device and a storage medium, wherein the method comprises the following steps: performing text recognition on a book cover image to be recognized to obtain text content contained in the book cover image; inputting the book cover image into a pre-trained press classification model to obtain a target press contained in the book cover image; based on the text content, the target publishing company and the book cover image, performing feature extraction by utilizing a pre-trained feature extraction model to obtain a feature vector corresponding to the book cover image; text searching is carried out based on the text content to obtain text searching results, and vector searching is carried out based on the feature vectors to obtain vector searching results; and determining target book information corresponding to the book cover image according to the text search result and the vector search result. According to the scheme, more reliable vector search results can be searched, and the accuracy of book cover recognition is improved.

Description

Book cover identification method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of image processing, and in particular relates to a book cover identification method and device, electronic equipment and a storage medium.

Background

At present, the identification of the book covers is generally implemented by adopting an optical character recognition (Optical Character Recognition, OCR) technology, and characters contained in the book covers are identified by the OCR technology, so that information such as names of books, publishers and the like is returned.

The OCR technology has a good recognition effect on book covers with good image quality, and can return accurate book information, and when the image quality is poor, the book information cannot be accurately recognized by adopting the OCR technology.

Disclosure of Invention

In order to solve the above technical problems or at least partially solve the above technical problems, embodiments of the present disclosure provide a method, an apparatus, an electronic device, and a storage medium for identifying book covers.

According to an aspect of the present disclosure, there is provided a method for identifying book covers, including:

performing text recognition on a book cover image to be recognized to obtain text content contained in the book cover image;

inputting the book cover image into a pre-trained press classification model to obtain a target press contained in the book cover image;

Based on the text content, the target publishing company and the book cover image, performing feature extraction by utilizing a pre-trained feature extraction model to obtain a feature vector corresponding to the book cover image;

text searching is carried out based on the text content to obtain text searching results, vector searching is carried out based on the feature vector to obtain vector searching results, and the text searching results and the vector searching results respectively contain at least one book information;

and determining target book information corresponding to the book cover image according to the text search result and the vector search result.

According to another aspect of the present disclosure, there is provided an identification device for book covers, including:

the first recognition module is used for carrying out text recognition on the book cover image to be recognized to obtain text content contained in the book cover image;

the second recognition module is used for inputting the book cover image into a pre-trained press classification model so as to obtain a target press contained in the book cover image;

the feature extraction module is used for carrying out feature extraction by utilizing a pre-trained feature extraction model based on the text content, the target publishing company and the book cover image to obtain feature vectors corresponding to the book cover image;

The searching module is used for carrying out text searching based on the text content to obtain text searching results, carrying out vector searching based on the feature vectors to obtain vector searching results, wherein the text searching results and the vector searching results respectively contain at least one book information;

and the determining module is used for determining target book information corresponding to the book cover image according to the text search result and the vector search result.

According to another aspect of the present disclosure, there is provided an electronic device including:

a processor; and

a memory in which a program is stored,

wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the method of identifying book covers according to the preceding aspect.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of identifying book covers according to the previous aspect.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the method of identifying book covers according to the previous aspect.

According to one or more technical schemes provided by the embodiment of the disclosure, text recognition is performed on a book cover image to be recognized to obtain text content contained in the book cover image, the book cover image is input into a pre-trained press classification model to obtain a target press corresponding to the book cover, then feature extraction is performed by using a pre-trained feature extraction model based on the text content, the target press and the book cover image to obtain feature vectors corresponding to the book cover image, text search is performed based on the text content to obtain a text search result, vector search is performed based on the feature vectors to obtain a vector search result, and target book information corresponding to the book cover image is determined according to the text search result and the vector search result. According to the scheme, the corresponding target publishing houses are obtained by conducting publishing houses identification on the book cover images, and the identified target publishing houses are used for feature extraction to obtain the feature vectors corresponding to the book cover images, so that more robust feature vectors can be obtained, more reliable vector search results can be searched when searching is conducted based on the feature vectors, and accuracy of book cover identification is improved.

Drawings

Further details, features and advantages of the present disclosure are disclosed in the following description of exemplary embodiments, with reference to the following drawings, wherein:

FIG. 1 illustrates a flowchart of a method of identifying book covers according to an exemplary embodiment of the present disclosure;

FIG. 2 illustrates a flowchart of a method of identifying book covers according to another exemplary embodiment of the present disclosure;

FIG. 3 illustrates a flowchart of a method of identifying book covers according to yet another exemplary embodiment of the present disclosure;

FIG. 4 shows a schematic block diagram of an identification device for book covers according to an exemplary embodiment of the disclosure;

fig. 5 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below. It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

The following describes a method, an apparatus, an electronic device, and a storage medium for identifying book covers provided by the present disclosure with reference to the accompanying drawings.

In an educational scene, a situation of photographing a book cover and identifying the book name corresponding to the cover, the book information of a publishing company and the like is often encountered. For example, in an application scenario of book knowledge activation, a student uploads a teaching cover image, an answering system identifies the teaching cover image to determine information such as names and publishers of the teaching books, knowledge points in the books are automatically activated, and then the student can check page contents in the books in the answering system.

At present, the identification of book covers is generally realized by adopting an OCR technology, and the OCR technology can identify more accurate results and return correct book names, publishing companies and other book information for book cover images with better image quality shot by a mobile phone. However, the recognition effect of recognizing the book cover image with the moire pattern, which is blurred or photographed against the computer screen, by adopting the OCR technology is poor, and the book information cannot be accurately recognized.

In view of the above problems, the present disclosure provides a method for identifying a book cover, by performing text identification on a book cover image to be identified to obtain text content included in the book cover, inputting the book cover image into a pre-trained press classification module to obtain a target press corresponding to the book cover, then performing feature extraction by using a pre-trained feature extraction model based on the text content, the target press and the book cover image to obtain feature vectors corresponding to the book cover image, further performing text search based on the text content to obtain a text search result, and performing vector search based on the feature vectors to obtain a vector search result, and determining target book information corresponding to the book cover image according to the text search result and the vector search result. According to the scheme, the corresponding target publishing houses are obtained by conducting publishing houses identification on the book cover images, and the identified target publishing houses are used for feature extraction to obtain the feature vectors corresponding to the book cover images, so that more robust feature vectors can be obtained, more reliable vector search results can be searched when searching is conducted based on the feature vectors, and accuracy of book cover identification is improved.

Fig. 1 illustrates a flowchart of a method for identifying book covers according to an exemplary embodiment of the present disclosure, which may be performed by an apparatus for identifying book covers provided by an embodiment of the present disclosure, where the apparatus may be implemented in software and/or hardware, and may be generally integrated in an electronic device, where the electronic device includes a mobile phone, a tablet computer, a wearable device, and so on.

As shown in fig. 1, the method for identifying book covers may include the steps of:

step 101, performing text recognition on a book cover image to be recognized to obtain text content contained in the book cover image.

The book cover image to be identified may be, for example, a cover image of a book such as a textbook, a tutorial book, a drawing book, or the like, and the book cover image to be identified may be input by uploading, capturing a picture, or the like after photographing.

In the embodiment of the disclosure, for the data cover image to be identified, text identification can be performed on the data cover image to obtain text content contained in the book cover image.

The text content may include, but is not limited to, a name of a book, a grade to which the book is applicable, a publisher, a writer, etc.

For example, OCR technology may be employed to text recognize the book cover image to be recognized. When OCR is performed on the book cover image to be recognized, the method can be divided into two parts, namely text detection and text line recognition. In the text detection part, the book cover image to be identified can be input into a pre-trained text detection model for forward reasoning, and the text detection model outputs the position coordinate information of each text row in the book cover image. Considering that the text contained in the book cover is detected, the book cover images uploaded by the user are various, text lines are not all horizontal, various vertical and inclined art characters bring challenges to text detection, and experiments show that the detection efficiency and the detection accuracy of the mask-CNN model in the text detection of the book cover can obtain the optimal effect compared with other models, so that in the embodiment of the disclosure, the mask-CNN detection model is adopted to detect the position coordinate information of the text lines in the book cover images. It can be appreciated that other text detection models, such as EAST model, PSENT model, etc., may be used in practice. Then, in the text line recognition part, according to the position coordinate information of the text line output by the text detection part, the text image of the corresponding line can be cut out from the book cover image, and the text of the vertical edition can be uniformly converted into the transverse text, and the conversion method can be that the text of the vertical edition is rotated by 90 degrees anticlockwise, for example. The truncated line text image may then be text line corrected, and the corrected text line data may be preprocessed, e.g., scaled to standard size or blank, image data normalized, etc. And then, inputting the preprocessed text data into a pre-trained text recognition model for forward reasoning, and outputting text information corresponding to the text images by the text recognition model to obtain text contents contained in the book cover images. In the embodiment of the disclosure, a CRNN+attribute model can be adopted as a text recognition model, and other text recognition models, such as a CRNN+CTC model, can be adopted in practical application.

In an alternative embodiment of the present disclosure, the book cover image to be identified may also be pre-processed prior to text recognition of the book cover image to be identified, which may include, but is not limited to, scaling the image to a fixed size, image data normalization, and so forth.

Step 102, inputting the book cover image into a pre-trained press classification model to obtain a target press included in the book cover image.

In the embodiment of the disclosure, for a book cover image to be identified, in addition to text recognition is performed to obtain text content included in the book cover image, press recognition is performed to obtain a press included in the book cover image, which is called a target press. The method comprises the steps that a to-be-identified book cover image is input into a pre-trained press classification model, press classification models are used for identifying press included in the book cover image, and target press included in the book cover image is output.

The target press may be a name of the identified press, a class number of the identified press, or a press expressed in another form, and the expression form of the target press is not limited in this disclosure.

Illustratively, the publishing classification model may employ a Resnet18 model, which operates efficiently in the field of image classification and has a good classification effect. Other lightweight classification models, such as a MobileNet model, a SqueezeNet model, etc., may also be used in practice.

It should be noted that, in the embodiment of the present disclosure, the execution order of the step 101 and the step 102 is not separate, and the step 101 and the step 102 may be executed sequentially or may be executed simultaneously, and the embodiment shown in fig. 1 only uses the step 102 executed after the step 101 as an example to illustrate the present disclosure, and should not be taken as a limitation of the present disclosure.

And step 103, performing feature extraction by utilizing a pre-trained feature extraction model based on the text content, the target publishing company and the book cover image to obtain feature vectors corresponding to the book cover image.

In the embodiment of the disclosure, after the text content and the target publishing agent contained in the book cover image are obtained, feature extraction can be performed by using a pre-trained feature extraction model based on the text content, the target publishing agent and the book cover image to be identified, so as to obtain the feature vector corresponding to the book cover image.

The method includes the steps that a plurality of sample cover images can be collected in advance, text recognition and press recognition are conducted on the sample cover images, corresponding sample text content and sample press are obtained, the sample cover images, the corresponding sample text content and the sample press are used as a training sample, feature vectors corresponding to the training sample are obtained to serve as training targets, and the training sample and the corresponding training targets are used for training an initial feature extraction model, so that a trained feature extraction model is obtained. Furthermore, when the feature vector corresponding to the book cover image to be identified is obtained, the book cover image to be identified, the corresponding text content and the target publishing company can be input into a trained feature extraction model, and the output of the feature extraction model is obtained and used as the feature vector corresponding to the book cover image.

And 104, performing text search based on the text content to obtain text search results, and performing vector search based on the feature vector to obtain vector search results.

In the embodiment of the disclosure, after the text content included in the book cover image is obtained, text searching can be performed based on the text content, so as to obtain a text searching result. The text search result may include one book information or a plurality of book information, and the number of the book information included in the obtained text search result is not limited in the present disclosure, and the book information may include, but is not limited to, a book name, a publisher, and an applicable grade.

For example, the method includes the steps of searching in a text search library with pre-built indexes based on text content, calculating similarity between the text content and each index in the text search library during searching, sorting search results corresponding to each index according to the sequence from high similarity to low similarity, and returning N search results with the top sorting as text search results, wherein N is a positive integer. Among them, the text search library may employ ES (Elasticsearch) library. In the embodiment of the disclosure, when indexing the text search library, accurate text content of the book cover can be adopted for indexing, such as a publishing company, a grade, a book name and the like, and the text content of the book cover adopted during indexing is a result after manual verification so as to ensure the accuracy of the index. The indexing of OCR text recognition content is not adopted, because when the OCR text recognition content has errors, the search accuracy is prevented from being influenced by the recognition errors, and the indexing adopts manually verified accurate text content.

In the embodiment of the disclosure, after the feature vector corresponding to the book cover image is obtained, vector search is performed based on the feature vector, and a vector search result is obtained. The number of the book information contained in the obtained vector search result is not limited, and the book information can include, but is not limited to, book names, publishers and applicable grades.

For example, the method may perform query in a vector search library with pre-built indexes based on the obtained feature vector, calculate the similarity between the feature vector and each index in the vector search library during the query, and rank the search results corresponding to each index according to the order of the similarity from high to low, and return the N search results ranked in front as vector search results, where N is a positive integer. The indexes in the vector search library are represented in a vector form, and feature vectors corresponding to the book information in the vector search library can be used as indexes of search contents.

It can be appreciated that in the embodiments of the present disclosure, the number of search results included in the text search result and the vector search result may be the same or different, and the present disclosure does not limit the number of search results included in the text search result and the vector search result, respectively.

And 105, determining target book information corresponding to the book cover image according to the text search result and the vector search result.

In the embodiment of the disclosure, after the text search result and the vector search result are obtained, the target book information corresponding to the book cover image may be determined according to the text search result and the vector search result.

For example, the candidate book information including the target publishing company may be screened out from the text search result and the vector search result, respectively, and then the book information with the highest similarity may be screened out from the candidate book information as the target book information. It can be understood that the similarity herein refers to the similarity calculated when searching in the text search library and the vector search library to obtain the text search result and the vector search result.

For example, one book information having the highest similarity may be selected from the text search result and the vector search result as the target book information corresponding to the book cover image.

According to the method for identifying the book covers, text content contained in the book cover images to be identified is obtained through text identification of the book cover images, the book cover images are input into a pre-trained press classification model to obtain target press corresponding to the book covers, then feature extraction is carried out by utilizing a pre-trained feature extraction model based on the text content, the target press and the book cover images to obtain feature vectors corresponding to the book cover images, text searching is carried out based on the text content to obtain text searching results, vector searching is carried out based on the feature vectors to obtain vector searching results, and target book information corresponding to the book cover images is determined according to the text searching results and the vector searching results. According to the scheme, the corresponding target publishing houses are obtained by conducting publishing houses identification on the book cover images, and the identified target publishing houses are used for feature extraction to obtain the feature vectors corresponding to the book cover images, so that more robust feature vectors can be obtained, more reliable vector search results can be searched when searching is conducted based on the feature vectors, and accuracy of book cover identification is improved.

In an alternative embodiment of the present disclosure, the feature extraction model includes a first output branch and a second output branch, the first output branch is used for outputting a feature vector, the second output branch is used for outputting a publishing company recognition result, and when the feature extraction model is obtained by training, iterative training is performed based on the feature vector output by the first output branch and the publishing company recognition result output by the second output branch.

In the embodiment of the disclosure, the feature extraction model includes two output branches, which are respectively marked as a first output branch and a second output branch, where the first output branch is used to output a feature vector with a fixed dimension, and the second output branch is used to output a recognition result of a publishing company, where when the feature extraction model is obtained by training, the first output branch and the second output branch are trained at the same time, and the initial feature extraction model is iteratively trained based on the feature vector output by the first output branch and the recognition result of the publishing company output by the second output branch until a trained feature extraction model is obtained. Therefore, the multi-task learning is completed by setting two different output branches, the feature learning capacity of the feature extraction model can be improved, and the finally obtained feature extraction model can extract better features.

For example, a Swin transformer model may be adopted, and an output branch is added on the basis of the network structure of the model, so as to obtain an initial feature extraction model, and the initial feature extraction model is trained to obtain a feature extraction model, where the added output branch is a full connection layer and is used for outputting a recognition result of a publishing company, for example, outputting a class number of the publishing company. In actual use, other network models may be employed.

Further, in an alternative implementation of the present disclosure, as shown in fig. 2, based on the example shown in fig. 1, step 103 may include the following sub-steps:

and step 201, performing data conversion on the word vector corresponding to the text content and the target publishing company to obtain a text data matrix, wherein the column number of the text data matrix is consistent with the column number of the image data matrix corresponding to the book cover image.

The word vector corresponding to the text content may be obtained by converting the word vector by a word2vector tool, or may be obtained by other manners, for example, may be obtained by using a word vector model trained in advance, and the obtaining manner of the word vector corresponding to the text content is not limited in the present disclosure.

In the embodiment of the disclosure, the word vector corresponding to the text content and the target publishing agency can be subjected to data conversion to obtain the text data matrix.

For example, assuming that the target press is represented by a press classification number, that is, the target press is a number, when the word vector corresponding to the text content is converted from the target press, the target press may be copied according to the dimension of the word vector corresponding to the text content, and then the word vector corresponding to the text content is spliced to obtain the text data matrix. For example, assuming that the dimension of the word vector corresponding to the text content is 1*C, the word vector corresponding to each word in the text content forms a word vector matrix with dimension w×c, where W is the number of rows, and is consistent with the number of words contained in the text content, and C represents the number of columns, and is consistent with the number of columns of the image data matrix corresponding to the book cover image, the target press may be replicated (C-1) times according to the number of columns C, and the C target press forms a vector with dimension 1*C, and is spliced after the word vector matrix, to obtain the text data matrix, where the dimension of the text data matrix is (w+1) ×c.

For example, assuming that the target press is represented by the name of the press, when the word vector corresponding to the text content is converted with the target press, the target press may be converted into the corresponding word vector, and then the word vector corresponding to the text content and the word vector corresponding to the target press are spliced to obtain the text data matrix. In the embodiment of the disclosure, the splicing direction of the word vector corresponding to the text content and the word vector corresponding to the target publishing agency is not limited, and the word vector can be spliced in the row direction or the column direction, so long as the column number of the obtained text data matrix is ensured to be consistent with the column number of the image data matrix corresponding to the book cover image.

And 202, splicing the image data matrix and the text data matrix in the row direction to generate book cover data.

In the embodiment of the disclosure, for the obtained text data matrix, the image data matrix corresponding to the book cover image may be spliced in the row direction to obtain the book cover data. Because the number of columns of the text data matrix is consistent with the number of columns of the image data matrix corresponding to the book cover image, the text data matrix and the image data matrix can be spliced in the row direction, and therefore the number of columns of the spliced matrix is not changed.

For example, assuming that the dimension of the text data matrix is d×c and the dimension of the image data matrix corresponding to the book cover image is h×c, the image data matrix and the text data matrix are spliced in the row direction to obtain the dimension of the book cover data as (h+d) ×c. Wherein H is the height of the image data matrix corresponding to the book cover image, which can be represented by the number of rows of pixels included in the height direction of the book cover image, and C is the width of the image data matrix corresponding to the book cover image, which can be represented by the number of columns of pixels included in the width direction of the book cover image.

Step 203, inputting the book cover data into a feature extraction model trained in advance, and obtaining a feature vector output by the first output branch of the feature extraction model as a feature vector corresponding to the book cover image.

In the embodiment of the disclosure, after the book cover data is obtained based on the text content, the target publishing company and the book cover image, the book cover data may be input into a feature extraction model trained in advance, a feature vector output by a first output branch of the feature extraction model is obtained, and the obtained feature vector is used as a feature vector corresponding to the book cover image.

According to the method for identifying the book cover, the word vector corresponding to the text content is subjected to data conversion with the target publishing agency to obtain the text data matrix, then the image data matrix and the text data matrix are spliced in the row direction to generate the book cover data, the book cover data are input into the feature extraction model trained in advance, the feature vector output by the first output branch of the feature extraction model is obtained to serve as the feature vector corresponding to the book cover image, and therefore the text feature, publishing agency feature and image feature related to the book cover are automatically extracted by the model, and the feature vector learned by the model has good generalization capability.

In an alternative implementation of the present disclosure, as shown in fig. 3, based on the foregoing example, step 105 may include the following sub-steps:

Step 301, obtaining a search result with highest similarity with the text content in the text search results as first candidate book information.

Step 302, determining the first candidate book information as target book information corresponding to the book cover image when the first candidate book information meets a first preset condition.

The first preset condition comprises:

the press information corresponding to the first candidate book information is consistent with the target press;

the text data corresponding to the first candidate book information is consistent with the text content; and

the confidence coefficient corresponding to the text content is larger than a preset confidence coefficient threshold value.

In the embodiment of the disclosure, for the obtained text search result, the book information with the highest similarity to the text content may be obtained from the text search result, and the book information with the highest similarity to the text content may be used as the first candidate book information. Then, whether the first candidate book information meets a first preset condition can be judged, and when the first candidate book information meets the first preset condition, the first candidate book information is determined to be target book information corresponding to the book cover image.

That is, for the first candidate book information determined from the text search result, it may be determined whether the publisher information corresponding to the first candidate book information, such as a name, a classification number, etc., of the publisher coincides with a target publisher obtained by performing the publisher recognition on the book closed image, and whether the text data corresponding to the first candidate book information, such as the publisher information, the grade information, the book name, etc., coincides with text content obtained by performing the text recognition on the book closed image, and whether the confidence corresponding to the text content is greater than a preset confidence threshold, and if all of the above three conditions are satisfied, it is determined that the first candidate book information satisfies the first preset condition, and the first candidate book information is directly returned as the target book information. When the text content comprises a plurality of text blocks, for example, the publisher information, the grade information and the book name are respectively one text block, whether the confidence coefficient of each text block is larger than a corresponding confidence coefficient threshold value can be compared, and if the confidence coefficient is larger than the corresponding confidence coefficient threshold value, the confidence coefficient of the text content is considered to be larger than a preset confidence coefficient threshold value.

Step 303, acquiring book information with highest similarity with the feature vector in the vector search result as second candidate book information when the first candidate book information does not meet the first preset condition.

Step 304, determining the second candidate book information as target book information corresponding to the book cover image when the second candidate book information meets a second preset condition.

Wherein the second preset condition includes:

the second book candidate information is consistent with the first book candidate information; and

and the press information corresponding to the second candidate book information is consistent with the target press.

In the embodiment of the disclosure, when the first candidate book information does not satisfy the first preset condition, the book information with the highest similarity to the feature vector may be obtained from the vector search result, and the book information with the highest similarity to the feature vector may be used as the second candidate book information. Then, whether the second candidate book information meets a second preset condition can be judged, and when the second candidate book information meets the second preset condition, the second candidate book information is determined to be target book information corresponding to the book cover image.

That is, in the embodiment of the present disclosure, for the second candidate book information determined from the vector search result, it may be determined whether the second candidate book information is identical to the first candidate book information, that is, whether the second candidate book information is identical to the first candidate book information, and whether the press information corresponding to the second candidate book information, such as a name, a class number, etc., of the press is identical to a target press obtained by performing press recognition on the book closed image, and if both conditions are satisfied, it is determined that the second candidate book information satisfies the second preset condition, and the second candidate book information is directly returned as the target book information.

And step 305, traversing the text search result to judge whether the second book candidate information exists in the text search result or not under the condition that the second book candidate information does not meet a second preset condition.

Step 306, increasing the similarity of the second candidate book information by a first preset value to obtain a new similarity of the second candidate book information when the second candidate book information exists in the text search result.

The first preset value can be preset according to actual requirements.

It can be understood that the similarity of the second book information is the similarity between the index of the second book information in the vector search library and the feature vector when the vector search is performed based on the feature vector to obtain the vector search result.

Step 307, determining the second candidate book information as target book information corresponding to the book cover image when the new similarity of the second candidate book information is greater than a first threshold and the publishing agent information corresponding to the second candidate book information is consistent with the target publishing agent.

The first threshold may be preset according to actual requirements, and the specific value of the first threshold is not limited in the disclosure.

In the embodiment of the disclosure, when the second candidate book information does not meet the second preset condition, the text search result may be traversed to determine whether the second candidate book information exists in the text search result. If no second book candidate information is found from the text search results, step 308 is performed. If the second candidate book information is found from the text search result, determining that the second candidate book information exists in the text search result, and further increasing the similarity of the second candidate book information by a first preset value to obtain new similarity of the second candidate book information. Next, whether the new similarity of the second candidate book information is greater than a first threshold value is determined, whether the press information corresponding to the second candidate book information (for example, a name of the press, a classification number of the press, etc.) is consistent with the target press is determined, and if the new similarity of the second candidate book information is greater than the first threshold value, and the press information corresponding to the second candidate book information is consistent with the target press, the second candidate book information is determined as the target book information corresponding to the book cover image. If the new similarity of the second candidate book information is not greater than the first threshold, or if the publishing house information corresponding to the second candidate book information is inconsistent with the target publishing house, step 308 is performed.

In the embodiment of the disclosure, when the second candidate book information does not meet the second preset condition, whether the second candidate book information exists in the text search result is judged, and if so, the similarity of the second candidate book information is increased by a first preset value, so that the probability that the book information with the highest similarity in the vector search result is determined as the target book information can be increased, and the accuracy of the returned result is improved.

Step 308, traversing the vector search result to determine whether the first candidate book information exists in the vector search result when the second candidate book information does not exist in the text search result, or the new similarity of the second candidate book information is not greater than the first threshold, or the publishing company information corresponding to the second candidate book information is inconsistent with the target publishing company.

Step 309, increasing the similarity of the first candidate book information by a second preset value to obtain a new similarity of the first candidate book information when the first candidate book information exists in the vector search result.

The second preset value can be preset according to actual requirements.

It can be understood that the similarity of the first candidate book information is the similarity between the index of the first candidate book information in the text search library and the text content when the text search is performed based on the text content to obtain the text search result.

Step 310, determining the first candidate book information as target book information corresponding to the book cover image when the new similarity of the first candidate book information is greater than a second threshold and the publishing agency information corresponding to the first candidate book information is consistent with the target publishing agency.

The second threshold may be preset according to actual requirements, and the specific value of the second threshold is not limited in the disclosure.

In the embodiment of the disclosure, when the second candidate book information does not exist in the text search result, or the new similarity of the second candidate book information is not greater than the first threshold, or the publishing agency information corresponding to the second candidate book information is inconsistent with the target publishing agency, the vector search result may be traversed to determine whether the first candidate book information exists in the vector search result. If the first candidate book information is found from the vector search result, it is determined that the first candidate book information exists in the vector search result, at this time, the similarity of the first candidate book information may be increased by a second preset value, a new similarity of the first candidate book information may be obtained, the new similarity may be compared with a second threshold, and if the new similarity of the first candidate book information is greater than the second threshold, and publishing information (for example, a name of a publishing company, a class number of a publishing company, etc.) corresponding to the first candidate book information is consistent with the target publishing company, the first candidate book information may be determined as the target book information corresponding to the book cover image.

In the embodiment of the disclosure, the vector search result is traversed to judge whether the first candidate book information exists in the vector search result, and the similarity of the first candidate book information is increased by a second preset value when the first candidate book information exists, so that the probability that the book information with the highest similarity in the text search result is determined as the target book information can be increased, and the accuracy of the returned result is improved.

In the embodiment of the disclosure, the steps 301 to 302 are the first judgment policy when determining the target book information based on the text search result and the vector search result, and when integrating the text search result and the vector search result to determine the target book information, the first judgment policy is executed first, and the first candidate book information is directly returned as the target book information when the first judgment policy is satisfied, and the second judgment policy is continuously judged if the second judgment policy is satisfied when the first judgment policy is not satisfied (corresponding to the steps 303 to 304). If the second judgment policy is satisfied, the target book information is returned, and if the second judgment policy is not satisfied, whether the third judgment policy is satisfied (corresponding to steps 305-307 described above) may be continued, where if the second judgment policy is satisfied, if the second candidate book information is not found from the text search result, it is also determined that the second judgment policy is not satisfied. If the third judgment policy is satisfied, the target book information is returned, and if the third judgment policy is not satisfied, whether the fourth judgment policy is satisfied may be continued (corresponding to steps 308 to 310). If the judgment policy four is met, returning the target book information, and if the judgment policy four is not met, aiming at the situation that the publishing house information corresponding to the first candidate book information is inconsistent with the target publishing house, the disclosure provides a judgment policy five, which comprises the following steps:

Under the condition that the press information corresponding to the first candidate book information is inconsistent with the target press, replacing the press in the text content with the target press to obtain new text content;

re-searching the text based on the new text content, and sorting the search results according to the sequence from high to low of similarity between the new text content to obtain M pieces of book information with the front sorting, wherein M is a positive integer;

generating new text search results based on the M pieces of book information and the text search results;

based on a preset similarity adjustment rule, adjusting the similarity of the new text search result and the vector search result;

and according to the adjusted similarity, determining the book information with the highest similarity from the new text search result and the vector search result as target book information corresponding to the book cover image.

In an optional embodiment of the present disclosure, when adjusting the similarity between the new text search result and the vector search result based on a preset similarity adjustment rule, the adjustment may be implemented by at least one of the following manners: (1) Increasing the similarity of the press information in the new text search result corresponding to the book information consistent with the target press by a third preset value, and increasing the similarity of the press information in the vector search result corresponding to the book information consistent with the target press by a fourth preset value; (2) Traversing the new text search result and the vector search result, and increasing the similarity corresponding to the third candidate book information by a fifth preset value and increasing the similarity corresponding to the fourth candidate book information by a sixth preset value when the third candidate book information in the new text search result is consistent with the fourth candidate book information in the vector search result.

The third preset value, the fourth preset value, the fifth preset value and the sixth preset value can be preset according to actual requirements.

It can be appreciated that the implementation of adjusting the similarity of the new text search results and the vector search results is not limited to the two ways provided by the above embodiments of the present disclosure, and the similarity of the search results may be adjusted in other ways.

In the embodiment of the disclosure, when the publishing agent information corresponding to the first candidate book information is inconsistent with the target publishing agent, the publishing agent in the text content can be replaced by the target publishing agent to obtain new text content, text searching is performed again in the text searching library based on the new text content, new searching results are ordered according to the calculated similarity between each index in the text searching library and the new text content and the order from high to low, and M book information ordered in front is acquired. Then, a new text search result may be generated based on the newly searched M pieces of book information and the original text search result, for example, assuming that the original text search result includes N pieces of book information, the N pieces of book information may also be sorted in order of high-to-low similarity, M pieces of book information sorted in the original text search result are removed, and the remaining (N-M) pieces of book information are combined with the newly searched M pieces of book information to obtain N pieces of book information as the new text search result, where N and M are both positive integers, and N is greater than M. Then, the similarity of the new text search result and the vector search result can be adjusted based on a preset similarity adjustment rule, for example, book information of which the press information is consistent with the target press can be found from the new text search result, the similarity of the book information is increased by a third preset value, book information of which the press information is consistent with the target press can be found from the vector search result, and the similarity of the book information is increased by a fourth preset value; and/or traversing the new text search result and the vector search result, and increasing the similarity corresponding to the third candidate book information by a fifth preset value and increasing the similarity corresponding to the fourth candidate book information by a sixth preset value when one or more book information (called third candidate book information for convenience of description) in the new text search result coincides with one or more book information (called fourth candidate book information for convenience of description) in the vector search result. That is, if the same book information exists in the new text search result and the vector search result, the similarity of the book information in the new text search result is increased by a fifth preset value, and the similarity of the book information in the vector search result is increased by a sixth preset value. Then, according to the adjusted similarity of each book information, determining one piece of book information with the highest similarity from the new text search result and the vector search result as the target book information corresponding to the book cover image.

Because of the variety and the number of books, when books are identified, the book covers can not be accurately judged to which book the books specifically belong in a classified mode, when the image quality is poor, text identification is inaccurate through text searching, so that text searching results are unreliable, and similarly, vector searching results are also unreliable, but the obtained text searching results and the vector searching results have accurate results with high probability, so that the invention provides various judging strategies to achieve the aim of finding more accurate target book information from the text searching results and the vector searching results.

In addition, although books are of various types, the number of publishers is limited, so that the scheme provided by the disclosure combines the characteristic of limited types of publishers, a target publisher contained in a book cover image is obtained through a publisher classification model, and the publisher classification model has high anti-interference capability on images because of very accurate classification results in practical application, even if the image quality is relatively poor, the classification results still have very high accuracy, and therefore, the publisher recognition can be carried out by utilizing the publisher classification model to obtain a trusted publisher recognition result.

In the embodiment of the disclosure, the first to fourth judging strategies belong to accurate judgment, and the result of the classification model of the press is not introduced under normal conditions, so that the probability of errors is still higher, and the scheme of the disclosure provides the fifth judging strategy, and the text and vector search are guided to reorder by introducing the classification result of the press, so that the images of the book covers which cannot be identified based on the first four judging strategies can be identified based on the fifth judging strategy provided by the disclosure, so that the correct identification result can be obtained with high probability. When a press classification result is not introduced, simply combining the text search result and the vector search result, it is difficult to obtain an accurate return result, because the book information with the first similarity rank in the text search result and the book information with the first similarity rank in the vector search result in the text search result of the book cover images entering the judgment policy five are not credible, and the correct book information is hidden in the rest of the book information (except the book information with the highest similarity) of the text search result and the vector search result with a high probability, so that the scheme of the disclosure guides the text search and the rest of the book information in the vector search result to be reordered through the credible press classification result, thereby obtaining the correct recognition result. In the judgment strategy five of the disclosure, by increasing the similarity of the press information and the book information consistent with the press classification result in the rest book information, the book information arranged at the back can acquire the opportunity arranged at the front again, and the book information not consistent with the press classification result can move backwards, so that after being guided by the press classification result, the text search result and the vector search result are combined again, the correct result can move forwards, the correct recognition result can be finally obtained, and the accuracy of the returned target book information can be improved.

In an optional embodiment of the present disclosure, if the fourth judgment policy is not satisfied, for a case that the first candidate book information does not exist in the vector search result, or the new similarity of the first candidate book information is not greater than the second threshold, the text search result and the vector search result may be combined, and one book information with the highest similarity is selected from the text search result and the vector search result as target book information corresponding to the book cover image, where when the text search result and the vector search result are combined, the book information coexisting in the text search result and the vector search result may be taken as the similarity of the combined book information, or the average of the two similarities may also be taken as the similarity of the combined book information. Alternatively, all the book information coexisting in the text search result and the vector search result may be found out from the text search result and the vector search result, and then one of the book information including the target publishing company and having a high similarity may be found out as the target book information corresponding to the book cover image.

The exemplary embodiment of the disclosure also provides a device for identifying book covers.

Fig. 4 shows a schematic block diagram of an identification device of a book cover according to an exemplary embodiment of the present disclosure, and as shown in fig. 4, the identification device 40 of a book cover includes: a first recognition module 410, a second recognition module 420, a feature extraction module 430, a search module 440, and a determination module 450.

The first recognition module 410 is configured to perform text recognition on a book cover image to be recognized, so as to obtain text content included in the book cover image;

the second recognition module 420 is configured to input the book cover image into a pre-trained press classification model to obtain a target press included in the book cover image;

the feature extraction module 430 is configured to perform feature extraction by using a feature extraction model trained in advance based on the text content, the target publishing company and the book cover image, so as to obtain feature vectors corresponding to the book cover image;

the search module 440 is configured to perform a text search based on the text content to obtain a text search result, and perform a vector search based on the feature vector to obtain a vector search result, where the text search result and the vector search result respectively include at least one book information;

And the determining module 450 is configured to determine target book information corresponding to the book cover image according to the text search result and the vector search result.

Optionally, the feature extraction model includes a first output branch and a second output branch, where the first output branch is used to output a feature vector, and the second output branch is used to output a publishing company recognition result, and when the feature extraction model is obtained by training, iterative training is performed based on the feature vector output by the first output branch and the publishing company recognition result output by the second output branch.

Optionally, the feature extraction module 430 is further configured to:

carrying out data conversion on the word vector corresponding to the text content and the target publishing company to obtain a text data matrix, wherein the column number of the text data matrix is consistent with the column number of the image data matrix corresponding to the book cover image;

splicing the image data matrix and the text data matrix in the row direction to generate book cover data;

and inputting the book cover data into a pre-trained feature extraction model, and acquiring a feature vector output by the first output branch of the feature extraction model as a feature vector corresponding to the book cover image.

Optionally, the determining module 450 is further configured to:

acquiring book information with highest similarity with the text content in the text search result as first candidate book information;

determining the first candidate book information as target book information corresponding to the book cover image under the condition that the first candidate book information meets a first preset condition;

wherein the first preset condition includes:

Optionally, the determining module 450 is further configured to:

acquiring book information with highest similarity with the feature vector in the vector search result as second candidate book information under the condition that the first candidate book information does not meet the first preset condition;

determining the second candidate book information as target book information corresponding to the book cover image under the condition that the second candidate book information meets a second preset condition;

Wherein the second preset condition includes:

Optionally, the determining module 450 is further configured to:

traversing the text search result to judge whether the second book candidate information exists in the text search result or not under the condition that the second book candidate information does not meet a second preset condition;

increasing the similarity of the second candidate book information by a first preset value under the condition that the second candidate book information exists in the text search result, so as to obtain new similarity of the second candidate book information;

and determining the second candidate book information as target book information corresponding to the book cover image under the condition that the new similarity of the second candidate book information is larger than a first threshold value and the publishing agency information corresponding to the second candidate book information is consistent with the target publishing agency.

Optionally, the determining module 450 is further configured to:

traversing the vector search result to judge whether the first candidate book information exists in the vector search result when the second candidate book information does not exist in the text search result, or the new similarity of the second candidate book information is not greater than the first threshold value, or the publishing agency information corresponding to the second candidate book information is inconsistent with the target publishing agency;

Under the condition that the first candidate book information exists in the vector search result, increasing the similarity of the first candidate book information by a second preset value to obtain new similarity of the first candidate book information;

and determining the first candidate book information as target book information corresponding to the book cover image under the condition that the new similarity of the first candidate book information is larger than a second threshold value and the publishing agency information corresponding to the first candidate book information is consistent with the target publishing agency.

Optionally, the determining module 450 is further configured to:

increasing the similarity of the press information in the new text search result corresponding to the book information consistent with the target press by a third preset value, and increasing the similarity of the press information in the vector search result corresponding to the book information consistent with the target press by a fourth preset value;

and/or the number of the groups of groups,

traversing the new text search result and the vector search result, and increasing the similarity corresponding to the third candidate book information by a fifth preset value and increasing the similarity corresponding to the fourth candidate book information by a sixth preset value when the third candidate book information in the new text search result is consistent with the fourth candidate book information in the vector search result.

The device for identifying the book covers provided by the embodiment of the disclosure can execute any method for identifying the book covers applicable to the electronic equipment, and has the corresponding functional modules and beneficial effects of the executing method. Details of the embodiments of the apparatus of the present disclosure that are not described in detail may refer to descriptions of any of the embodiments of the method of the present disclosure.

The exemplary embodiments of the present disclosure also provide an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor for causing the electronic device to perform a method of identifying book covers according to embodiments of the present disclosure when executed by the at least one processor.

The present disclosure also provides a non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is for causing the computer to perform a method of identifying book covers according to an embodiment of the present disclosure.

The exemplary embodiments of the present disclosure also provide a computer program product comprising a computer program, wherein the computer program, when being executed by a processor of a computer, is for causing the computer to perform a method of identifying book covers according to an embodiment of the present disclosure.

Referring to fig. 5, a block diagram of an electronic device 1100 that may be a server or client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the electronic device 1100 includes a computing unit 1101 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data required for the operation of the device 1100 can also be stored. The computing unit 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

A number of components in the electronic device 1100 are connected to the I/O interface 1105, including: an input unit 1106, an output unit 1107, a storage unit 1108, and a communication unit 1109. The input unit 1106 may be any type of device capable of inputting information to the electronic device 1100, and the input unit 1106 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 1107 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 1108 may include, but is not limited to, magnetic disks, optical disks. The communication unit 1109 allows the electronic device 1100 to exchange information/data with other devices through computer networks such as the internet and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 1101 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1101 performs the respective methods and processes described above. For example, in some embodiments, the method of identifying book covers may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1108. In some embodiments, some or all of the computer programs may be loaded and/or installed onto electronic device 1100 via ROM 1102 and/or communication unit 1109. In some embodiments, the computing unit 1101 may be configured to perform the method of identifying book covers by any other suitable means (e.g., by means of firmware).

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The terms "machine-readable medium" and "computer-readable medium" as used in this disclosure refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) for providing machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Claims

1. A method of identifying book covers, wherein the method comprises:

2. The book cover recognition method of claim 1, wherein the feature extraction model comprises a first output branch and a second output branch, the first output branch is used for outputting feature vectors, the second output branch is used for outputting a publisher recognition result, and when the feature extraction model is obtained through training, iterative training is performed based on the feature vectors output by the first output branch and the publisher recognition result output by the second output branch.

3. The method for identifying book covers according to claim 2, wherein the feature extraction based on the text content, the target publishing company and the book cover image by using a pre-trained feature extraction model to obtain feature vectors corresponding to the book cover image comprises:

4. The book cover identification method as claimed in any one of claims 1 to 3, wherein said determining target book information corresponding to the book cover image based on the text search result and the vector search result comprises:

wherein the first preset condition includes:

5. The book cover identification method of claim 4, wherein said determining target book information corresponding to the book cover image based on the text search result and the vector search result further comprises:

wherein the second preset condition includes:

6. The book cover identification method of claim 5, wherein said determining target book information corresponding to the book cover image based on the text search result and the vector search result further comprises:

7. The book cover identification method of claim 6, wherein said determining target book information corresponding to the book cover image based on the text search result and the vector search result further comprises:

8. The book cover identification method of claim 7, wherein the determining target book information corresponding to the book cover image according to the text search result and the vector search result further comprises:

9. The book cover identification method of claim 8, wherein the adjusting the similarity of the new text search result and the vector search result based on a preset similarity adjustment rule comprises:

and/or the number of the groups of groups,

10. An identification device for book covers, wherein the device comprises:

11. An electronic device, comprising:

A processor; and

a memory in which a program is stored,

wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the method of identifying book covers according to any one of claims 1-9.

12. A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of identifying book covers according to any one of claims 1-9.