CN114120346A

CN114120346A - Content identification method and device, electronic equipment and storage medium

Info

Publication number: CN114120346A
Application number: CN202111229111.6A
Authority: CN
Inventors: 恩孟一; 谢群义; 钦夏孟; 李世虎; 张栋栋; 姚锟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-10-21
Filing date: 2021-10-21
Publication date: 2022-03-01

Abstract

The present disclosure provides a content Recognition method, an apparatus, an electronic device and a storage medium, which relate to the technical field of artificial intelligence, specifically to the technical field of computer vision and deep learning, and can be applied to scenes such as Optical Character Recognition (OCR). The specific implementation scheme is as follows: the method comprises the steps of obtaining an image to be recognized, determining the content type of the image to be recognized, determining a content recognition model related to the content type, and performing content recognition on the image to be recognized by adopting the content recognition model to obtain target content. Therefore, the matched content identification model can be flexibly screened out according to the content type to assist in image identification, so that the image to be identified is subjected to targeted identification processing, the content identification efficiency of the image is improved, the flexibility of the content identification is improved, and the content identification effect is improved.

Description

Content identification method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, can be applied to scenes such as OCR optical character recognition, and particularly relates to a content recognition method and device, electronic equipment and a storage medium.

Background

Artificial intelligence is the subject of research that makes computers simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning technology, a deep learning technology, a big data processing technology, a knowledge map technology and the like.

In the related art, all models required by an image of a vertical class are initialized one by one and stand by, when the vertical class image requests to enter an identification service, the image is subjected to content identification according to a content identification model related to a defined process, and when other vertical classes occur, the related content identification model needs to be developed and maintained independently, and a new identification process is formed.

Disclosure of Invention

The disclosure provides a content identification method, a content identification device, an electronic device, a storage medium and a computer program product.

According to a first aspect of the present disclosure, there is provided a content identification method, including: acquiring an image to be identified; determining the content type of the image to be recognized; determining a content recognition model associated with the content type; and adopting the content identification model to identify the content of the image to be identified so as to obtain target content.

According to a second aspect of the present disclosure, there is provided a content recognition apparatus including: the acquisition module is used for acquiring an image to be identified; the first determination module is used for determining the content type of the image to be identified; a second determination module for determining a content recognition model associated with the content type; and the identification module is used for identifying the content of the image to be identified by adopting the content identification model so as to obtain the target content.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the content recognition methods of the embodiments of the present disclosure.

According to a fourth aspect of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a content recognition method of an embodiment of the present disclosure is presented.

According to a fifth aspect of the present disclosure, a computer program product is presented, comprising a computer program which, when executed by a processor, implements the content recognition method of embodiments of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a recognition model topology network according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram according to a fourth embodiment of the present disclosure;

FIG. 6 is a schematic diagram according to a fifth embodiment of the present disclosure;

FIG. 7 shows a schematic block diagram of an example electronic device that may be used to implement the content recognition methods of embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure.

It should be noted that the main execution body of the content identification method of this embodiment is a content identification device, the device may be implemented by software and/or hardware, the device may be configured in an electronic device, and the electronic device may include, but is not limited to, a terminal, a server, and the like.

The embodiment of the disclosure relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning, and can be applied to scenes such as Optical Character Recognition (OCR).

Wherein, Artificial Intelligence (Artificial Intelligence), english is abbreviated as AI. The method is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.

Deep learning is to learn the intrinsic rules and expression levels of sample data, and the information obtained in the learning process is helpful to the interpretation of data such as characters, images and sounds. The final goal of deep learning is to make a machine capable of human-like analytical learning, and to recognize data such as characters, images, and sounds.

Computer vision, which is machine vision that uses camera and computer to replace human eyes to identify, track and measure the target, and further processes the graph, making the computer process into the image more suitable for human eyes observation or transmitted to the instrument for detection.

The OCR optical character recognition scene may be an application scene in which an image is subjected to character recognition by an optical character recognition method, and the optical character recognition OCR refers to examining characters in the image by an electronic device (e.g., a scanner or a digital camera), determining the shape of the characters by detecting dark and light patterns, and then translating the shape into computer characters by a character recognition method.

When the content recognition method provided by the embodiment of the disclosure is applied to an OCR optical character recognition scene, the content extraction accuracy for an image can be effectively improved, and the OCR optical character recognition is integrated in the process of performing targeted recognition processing on an image to be recognized, so that the content recognition method has applicability while the flexibility and the content recognition effect of the content recognition are effectively guaranteed.

As shown in fig. 1, the content identification method includes:

s101: and acquiring an image to be identified.

The image to be identified may be referred to as an image to be identified, and the image to be identified may specifically be, for example, an image including ticket image information, an image including certificate information, and the like, which is not limited herein.

When the image to be identified is acquired, the corresponding image acquisition device can be configured in advance for the content identification device, the image acquisition device can acquire images containing bill image information, images containing certificate information and the like, and the image acquisition device performs preprocessing on the acquired images to obtain the image to be identified.

In the embodiment of the present disclosure, the image acquisition device may be used to acquire and process an image to obtain an image to be identified, or any other possible manner may also be used, for example, a data interface may be configured in advance for the content identification device, the content identification request is received through the data interface, and then the image to be identified is obtained by parsing the content identification request, which is not limited herein.

S102: the content type of the image to be recognized is determined.

The content type is used to represent a main content category contained in the image to be recognized, and the main content category may be, for example, a bill category, a two-dimensional code category, and the like, which is not limited thereto.

The content type may be further specifically used to characterize a type corresponding to any other element in the image to be recognized, for example, a scene content type, a background content type, an image feature type, and the like, which is not limited to this.

In some embodiments, after the image to be recognized is obtained, the obtained image to be recognized may be correspondingly processed, for example, the image to be recognized may be subjected to image color normalization processing, and the like, which is not limited herein.

In other embodiments, when the content type of the image to be recognized is determined, the image to be recognized may be recognized by using an image recognition algorithm, and an output result of the algorithm is obtained as the content type of the image to be recognized.

In other embodiments, when the content type of the image to be recognized is determined, the image to be recognized may be input into the feature recognition model, the image feature of the image to be recognized may be output, and the content type of the image to be recognized may be determined by using the extracted image feature.

S103: a content recognition model associated with the content type is determined.

The content identification model is used for identifying the content of the image to be identified, different content types can correspond to the same or different content identification models, and the content identification model can be used for identifying the image to be identified of the corresponding content type to obtain the target content.

The content recognition model may be specifically an artificial intelligence model, for example, a neural network model, a machine learning model, or the like, and may be obtained by pre-training, and the content recognition model may support a corresponding content recognition function, for example, recognizing a two-dimensional code in an image to be recognized, recognizing a face in the image to be recognized, recognizing a tag in the image to be recognized, or the like, which is not limited thereto.

The data included in the image to be recognized are obtained under the condition of meeting the relevant laws and regulations, for example, the data may be from a public data set, or may be obtained from an authorized organization after being authorized by the relevant organization.

It should be noted that the data included in the image to be recognized is not data acquired for a specific user, and cannot reflect personal information of a specific user.

In the embodiment of the present disclosure, when determining the content identification model related to the content type, the content identification model corresponding to the related content type may be trained in advance according to the content of the image to be identified, or a general content identification model may be trained for the corresponding content type to obtain a plurality of candidate content identification models, and the candidate content types are marked corresponding to the respective candidate content identification models, and then the candidate content types matching the content types may be determined, and the candidate content identification models marked by the candidate content types are used as the content identification models related to the content types, which is not limited herein.

S104: and performing content identification on the image to be identified by adopting a content identification model to obtain target content.

The content obtained by identifying the content of the image to be identified may be referred to as target content, where the target content may specifically be structured text information, or any other possible form, such as an image block form, a character form, an encrypted information form, and the like, which is not limited herein.

In the embodiment of the disclosure, when the content recognition model is used for recognizing the content of the image to be recognized, the content recognition model can be used for detecting the target area where the characters in the image to be recognized are located, and then the optical character recognition is used for extracting the characters in the target area so as to obtain the structured text information of the characters in the image to be recognized, wherein the structured text information is the target content.

In other embodiments, when the content recognition model is used to perform content recognition on the image to be recognized, when the number of the content recognition models is multiple, each content recognition model may be characterized to correspondingly perform different content recognition tasks, and at this time, the image to be recognized may be sequentially input into the multiple content recognition models according to the execution order among the multiple content recognition models, so as to obtain the target content output by each content recognition model, which is not limited.

In the embodiment, the image to be recognized is obtained, the content type of the image to be recognized is determined, the content recognition model related to the content type is determined, and the content recognition model is adopted to perform content recognition on the image to be recognized, so that the target content is obtained, therefore, the matched content recognition model is flexibly screened out according to the content type, the image recognition is assisted, the image to be recognized is subjected to targeted recognition processing, the content recognition efficiency for the image is improved, the flexibility of the content recognition is improved, and the content recognition effect is improved.

Fig. 2 is a schematic diagram according to a second embodiment of the present disclosure.

As shown in fig. 2, the content identification method includes:

s201: and acquiring an image to be identified.

For the description of S201, reference may be made to the description of the foregoing embodiments, which are not repeated herein.

S202: and identifying the main content from the image to be identified.

The content of the image to be recognized, which is taken as the recognition subject, may be referred to as subject content, and the subject content may be a ticket or a two-dimensional code in the image to be recognized, which is not limited herein.

In the embodiment of the present disclosure, the image to be recognized may include a content serving as a recognition subject, and may further include a content serving as a recognition background, and the content serving as the recognition subject may be extracted from the image to be recognized and may be used as a subject content, which is not limited to this.

In the embodiment of the present disclosure, when the subject content is identified from the image to be identified, a pre-established detection model may be used to detect and identify the subject content in the image to be identified, or the image to be identified may be matched with a detection template, and the subject content is matched from the image to be identified based on the detection template, which is not limited herein.

S203: a subject category of the subject content is determined.

For example, a plurality of candidate subject categories may be marked in advance, the semantics of the subject content and the semantics of each candidate subject category are determined, the plurality of similarities are ranked, and the candidate subject category corresponding to the maximum similarity is used as the subject category corresponding to the subject content, which is not limited herein.

For example, the subject category may be, for example, a bill category or a two-dimensional code category, which is not limited to this.

S204: when the subject category is a target vertical category, a plurality of candidate content types related to the target vertical category are obtained.

The target vertical category may be, for example, one of a bill category or a two-dimensional code category.

In the embodiment of the disclosure, after the subject category of the subject content is determined, the determined subject category may be matched in a preset subject category library, when the subject category is determined to be the target vertical category, a detection model may be used to obtain a plurality of candidate content types related to the target vertical category, and then a content type corresponding to the image to be recognized may be determined from the plurality of candidate content types.

The content identification method in the embodiment of the present disclosure may be compatible with most bill card sag classes, which may be regarded as a possible target sag class, and certainly, the design method for the target sag class is not limited thereto, and may also be any possible type of sag class in an actual application scene, and the target sag class may also be, for example, an image sag class, an information sag class, and the like, which is not limited thereto.

In the embodiment of the disclosure, a global process may be designed for a plurality of possible vertical classes, including an identification process that most vertical classes may use, including a plurality of selectable nodes, and different vertical classes may selectively pass through or not pass through the nodes according to personalized identification requirements. Then for each verticals, the flow contained in the graph that is desired to be executed can be found.

The main flow for most vertical core classes can be described as:

image preprocessing- > ticket body detection- > ticket body classification- > ticket four-direction correction- > field detection- > field identification- > post-processing strategy module- > return results.

The image preprocessing, the ticket body detection, the ticket body classification, the ticket four-direction correction, the field detection, the field identification, the post-processing strategy module and the like can all correspond to corresponding content identification models, so that the corresponding content identification models can correspondingly provide content identification services such as the image preprocessing, the ticket body detection, the ticket body classification, the ticket four-direction correction, the field detection, the field identification, the post-processing strategy and the like.

For example, if the target vertical class is a ticket card vertical class, the corresponding candidate content type may be, for example, a ticket body classification type, specifically, a face type, a fingerprint type, an iris type, a field type, a two-dimensional code type, and the like, which is not limited thereto.

The data related to the person contained in the image to be recognized is obtained under the condition of meeting the relevant laws and regulations, for example, the data can be from a public data set, or can be obtained from an authorized organization after being authorized by the relevant organization.

S205: a content type corresponding to the image to be recognized is determined from the plurality of candidate content types.

After obtaining the plurality of candidate content types related to the target vertical class, a content type corresponding to the image to be recognized may be determined from the plurality of candidate content types.

In the embodiment of the disclosure, when determining the content type corresponding to the image to be recognized from the multiple candidate content types, a detection algorithm may be used to determine a candidate content type matching the content of the image to be recognized from the multiple candidate content types, and the matching candidate content type may be used as the content type of the image to be recognized.

The content type of the image to be recognized may be any one or a combination of a plurality of types, such as a face type, a fingerprint type, an iris type, a field type, and a two-dimensional code type, and is not limited thereto.

The main content is identified from the image to be identified, then the main category of the main content is determined, when the main category is the target vertical category, multiple candidate content types related to the target vertical category are obtained, and the content type corresponding to the image to be identified is determined from the multiple candidate content types, so that the content type corresponding to the image to be identified can be determined from the multiple candidate content types according to the main category of the main content of the image to be identified, the content type identification efficiency aiming at the image is improved, the flexibility of the content type identification is improved, and the content type identification efficiency is improved.

Alternatively, in some embodiments, upon determining the content type corresponding to the image to be recognized from among the plurality of candidate content types, the image to be recognized may be initially recognized to obtain an initial content, a candidate content type matching the initial content may be determined from a plurality of candidate content types, and the matching candidate content type may be taken as the content type corresponding to the image to be recognized, thereby effectively improving the matching effect of the candidate content types, assisting in quickly determining the candidate content types matched with the image to be recognized and taking the candidate content types as the content types of the image to be recognized, when the matching is assisted based on the initial content of the image to be recognized, the matching result can be effectively matched with the content of the image to be recognized, the matching accuracy of the candidate content type is effectively improved, and the recognition accuracy of the content type of the image to be recognized is greatly guaranteed.

The initial content may be content included in an image area, a character area, and the like in the main content of the image to be recognized, and the initial content may be, for example, two-dimensional code information in the image to be recognized, or may be a stamp image in a ticket image, or may be date or remark information in the ticket information, which is not limited to this.

In the embodiment of the disclosure, an initial image may be initially identified by using a pre-established initial image identification model, initial content is obtained through processing, then a candidate content type matched with the initial content is determined from a plurality of candidate content types, and the matched candidate content type is used as a content type corresponding to an image to be identified.

S206: a content recognition model associated with the content type is determined.

For example, the content identification model related to the content type may be, for example, a plurality of content identification models capable of respectively providing content identification services such as image preprocessing, ticket body detection, ticket body classification, ticket four-direction correction, field detection, field identification, post-processing policy, and the like, which is not limited thereto.

S207: and respectively adopting a plurality of content identification models to identify the content of the image to be identified so as to obtain a plurality of reference contents.

The content recognition models can respectively provide different content recognition services, and can be used for recognizing initial contents of different content types.

In the embodiment of the disclosure, after determining the content type corresponding to the image to be recognized from the multiple candidate content types, the area where the initial content of the multiple content types in the image to be recognized is located is detected, and then the image to be recognized may be input into multiple content recognition models for content recognition, for example, the text area in the image to be recognized may be subjected to content recognition by using optical character recognition, so as to extract text information as the reference content.

In the embodiment of the present disclosure, when the number of the content identification models is multiple, different content identification models may have corresponding execution orders and be located on different or the same execution branches, and specific examples are as follows:

after the ticket is corrected in four directions (the image to be recognized corresponds to a ticket image), in addition to field detection, three optional branches can be configured, including: seal detection and identification (extracting seal information in a ticket), head portrait detection (for example, portrait information needs to be extracted in a vertical type such as an identity card and the like), and quality control (judging whether the picture is a low-quality picture such as fuzzy, reflective, incomplete and the like); after field detection (i.e. each detected field box and the category corresponding to each box), in addition to normal field identification, two optional branches can be configured, including: two-dimensional code identification (the detection box is a two-dimensional code category), and whether a field returned or not directly returns whether the field exists or not (in the scene, for some fields, a user only concerns whether the field exists or not, and does not concern what the specific content of the field is); after field detection, if the field is identified, three optional branches can be configured, including: two single model identification branches, namely, only one identification model is used for completing identification in the field; and a dual-model recognition branch, namely, the field can be recognized twice by one recognition model and then by the other recognition model (usually in a Chinese-English mixed field, the recognition is carried out by a Chinese model first, and the recognition is carried out by an English model for supplement and correction).

In the embodiment of the present disclosure, when the content recognition is performed on the image to be recognized by using the plurality of content recognition models, the recognition processing logic may be executed according to the execution sequence configured in advance and corresponding to each content recognition model and the branch where the content recognition model is located, which is not limited in this regard.

S208: target content is generated from the plurality of reference content.

In the embodiment of the disclosure, after content recognition is performed on an image to be recognized by adopting a plurality of content recognition models to obtain a plurality of reference contents, target content is generated according to the plurality of reference contents.

In the disclosed embodiment, when the target content is generated according to a plurality of reference contents, a post-processing operation after model identification may be performed on the reference contents, for example, some manual operations may be introduced to adjust and collate part of the identification contents in the reference contents, so as to obtain the structured text information as the target content.

The content recognition method comprises the steps of respectively adopting a plurality of content recognition models to recognize the content of an image to be recognized to obtain a plurality of reference contents, generating target contents according to the plurality of reference contents, respectively recognizing and processing various contents in the image to be recognized to obtain reference contents of different content types, recognizing diversified contents from the image to be recognized, guaranteeing the completeness and comprehensiveness of the content recognition, and supporting the utilization of post-processing to process the preliminarily recognized reference contents, so that the accuracy of the content recognition processing can be improved, and the content recognition effect is improved.

In the embodiment, by acquiring the image to be recognized, recognizing the image to be recognized to obtain the main content, then determining the main category of the main content, acquiring a plurality of candidate content types related to the target vertical class when the main category is the target vertical class, and determining the content type corresponding to the image to be recognized from the plurality of candidate content types, the content type corresponding to the image to be recognized can be determined from the plurality of candidate content types according to the main category of the main content of the image to be recognized, the content type recognition efficiency for the image is improved, the flexibility of the content type recognition is improved, the content type recognition efficiency is improved, after the image to be recognized is subjected to content recognition by respectively adopting a plurality of content recognition models to obtain a plurality of reference contents, the target content is generated according to a plurality of reference contents, and the plurality of contents in the image to be recognized can be respectively subjected to recognition processing, therefore, the reference contents of different content types can be obtained, diversified contents can be recognized from the image to be recognized, the completeness and comprehensiveness of content recognition are guaranteed, the primarily recognized reference contents are processed by utilizing post-processing operation, the accuracy of content recognition processing can be improved, and the content recognition effect is improved.

Fig. 3 is a schematic diagram according to a third embodiment of the present disclosure.

As shown in fig. 3, the content identification method includes:

s301: and acquiring an image to be identified.

S302: and identifying the main content from the image to be identified.

S303: a subject category of the subject content is determined.

S304: when the subject category is a target vertical category, a plurality of candidate content types related to the target vertical category are obtained.

S305: a content type corresponding to the image to be recognized is determined from the plurality of candidate content types.

For the description of S301 to S305, reference may be made to the description of the above embodiments, which is not repeated herein.

S306: and determining a plurality of reference recognition models matched with the target verticals from a recognition model topological network, wherein the recognition model topological network is composed of a plurality of candidate recognition models, and the reference recognition models belong to the candidate recognition models.

The recognition model topology network is composed of candidate recognition models which can be respectively related to a plurality of vertical classes, and the candidate recognition models can recognize images of the corresponding vertical classes.

For example, as shown in fig. 4, fig. 4 is a schematic view of a recognition model topology network according to an embodiment of the present disclosure, the multiple candidate recognition models may include a field detection recognition model, an avatar detection recognition model, a stamp detection recognition model, a quality control recognition model, and the like, the avatar recognition model may be used to extract portrait information required to extract a ticket image and the like, the stamp detection recognition model may be used to extract stamp information in the ticket image, and the quality control recognition model is used to determine whether the image to be recognized has phenomena such as blur, reflection, and incomplete image.

Optionally, in some embodiments, when determining a plurality of reference recognition models matching the target vertical class from the recognition model topology network, global configuration information of the target vertical class may be determined, where the global configuration information includes: a plurality of model identifications and a plurality of execution orders corresponding to the plurality of model identifications, respectively, determining a plurality of candidate recognition models corresponding to the plurality of model identifications, respectively, from the recognition model topology network, and configuring the execution sequence of the corresponding candidate recognition models according to the execution sequence, and taking the configured candidate recognition models as reference recognition models, thereby realizing the self-adaptive configuration of the candidate identification model related to the target vertical class, completing the model configuration required by the target vertical class and the definition of the whole content identification process based on the global configuration information, therefore, the targeted identification processing can be carried out on the image to be identified, the rapid development capability of the identification model on the newly added verticals is enhanced, the independent identification process development on the target verticals is not needed, therefore, the method can concentrate on the performance tuning of the recognition model and reduce the image recognition processing cost.

The global configuration information is used to represent processing nodes that a target vertical class passes through and skipped processing nodes in the recognition model topology network, for example, if some images to be processed of the target vertical class contain two-dimensional code information, the configuration of a two-dimensional code detection recognition model may be added to the global configuration information of the target vertical class, and in other target vertical classes, for some fields in the images to be recognized of the vertical class, a recognition model may be configured for returning whether the field exists, and the configuration of the detection field recognition model may be added to the global configuration information of the target vertical class.

The model identifier is used to uniquely identify the candidate recognition model, and the model identifier may be, for example, a name identifier, and the like of the candidate recognition model, which is not limited herein.

Wherein, the execution order can be used for representing the input and output order of the images to be recognized for the target verticals provided to each candidate recognition model.

In some embodiments, the execution order of the candidate recognition models may be that the image to be recognized passes through two single model recognition branches to complete recognition, that is, the image to be recognized is input into two content recognition models respectively for recognition, and the two content recognition models configure a parallel execution order, which is not limited in this regard.

In other embodiments, the image to be recognized may pass through a dual-model branch, where the dual-model branch is to pass through a recognition model and then pass through another recognition model for secondary recognition, for example, when the field of the image to be recognized of the target vertical class contains a mixed field of chinese and english, the image to be recognized may pass through a chinese processing model and then pass through an english recognition model for auxiliary recognition to obtain the target content.

In the embodiment of the present disclosure, after the global configuration information of the target verticals is determined, a plurality of candidate recognition models respectively corresponding to a plurality of model identifiers may be determined from the recognition model topology network according to the model identifiers in the global configuration information, an execution sequence of the corresponding candidate recognition models may be configured according to an execution sequence of the models in the global configuration information, and the configured candidate recognition models are used as reference recognition models for recognizing the image to be recognized.

In other embodiments, field level information configuration may be performed on global configuration information of some target verticals, where the field level information configuration may be further refined configuration of the global configuration information, and the field level information may be, for example, whether the field is two-dimensional code information, a type of field identification logic, and the like, which is not limited to this.

S307: a reference recognition model related to the content type is determined from among the plurality of reference recognition models and is used as the content recognition model.

In the embodiment of the disclosure, after the candidate recognition model is configured as the reference recognition model according to the global configuration information, a reference recognition model related to the content type may be determined from among the plurality of reference recognition models according to the model identifier and may be used as the content recognition model, and the image to be recognized may be recognized by using the content recognition model.

In the implementation of the disclosure, a plurality of reference recognition models matched with the target verticals are determined from a recognition model topology network, wherein the recognition model topology network is composed of a plurality of candidate recognition models, the reference recognition models belong to the plurality of candidate recognition models, and a reference recognition model related to the content type is determined from the plurality of reference recognition models and is used as a content recognition model, so that the candidate recognition models can be selected from the recognition model topology network as the reference recognition models aiming at images to be recognized of different target verticals, and the images to be recognized are input into the content recognition models for recognition according to the execution sequence, thereby realizing the self-adaptive configuration of the content recognition models, effectively improving the flexibility and configuration effect of the configuration of the content recognition models, and effectively enriching the application scenes of the content recognition method.

S308: and if the content type is a field type, acquiring an auxiliary recognition model corresponding to the field type, wherein the related reference recognition model and the auxiliary recognition model are jointly used as the content recognition model.

The auxiliary identification model is used for assisting in correcting the identification content obtained by the reference identification model.

In the embodiment of the disclosure, when it is detected that the content type is the field type, an auxiliary recognition model corresponding to the field type is obtained, and the reference recognition model and the auxiliary recognition model are used together as the content recognition model.

For example, when the content type is a field type which is a Chinese-English mixed field, the reference recognition model may be a Chinese recognition model, the auxiliary recognition model may be an English recognition model, and the Chinese recognition model and the English recognition model are used together as the content recognition model to recognize the content of the field.

In this embodiment, if the content type is a field type, an auxiliary recognition model corresponding to the field type is obtained, where the relevant reference recognition model and the auxiliary recognition model are jointly used as the content recognition model, so that the recognition content of the reference recognition model can be further recognized and adjusted by using the auxiliary recognition model, and the image recognition processing effect is improved.

S309: and performing content identification on the image to be identified by adopting a content identification model to obtain target content.

For the description of S309, reference may be specifically made to the description of the foregoing embodiment, which is not repeated herein.

In the embodiment, by acquiring an image to be recognized, recognizing a main content from the image to be recognized, determining a main category of the main content, acquiring a plurality of candidate content types related to a target vertical class when the main category is the target vertical class, determining a content type corresponding to the image to be recognized from the plurality of candidate content types, determining a plurality of reference recognition models matched with the target vertical class from a recognition model topology network, determining a reference recognition model related to the content type from the plurality of reference recognition models and using the reference recognition model as a content recognition model, selecting the candidate recognition model as the reference recognition model from the recognition model topology network according to the image to be recognized of different target vertical classes, inputting the image to be recognized into the content recognition model according to an execution sequence for recognition, and realizing the self-adaptive configuration of the content recognition model, the flexibility and the configuration effect of the configuration of the content identification model are effectively improved, the application scenes of the content identification method are effectively enriched, and if the content type is the field type, the auxiliary identification model corresponding to the field type is obtained, so that the identification content of the reference identification model can be further identified and adjusted by using the auxiliary identification model, and the image identification processing effect is improved.

Fig. 5 is a schematic diagram according to a fourth embodiment of the present disclosure.

As shown in fig. 5, the content recognition apparatus 50 includes:

an obtaining module 501, configured to obtain an image to be identified;

a first determining module 502, configured to determine a content type of an image to be recognized;

a second determining module 503, configured to determine a content identification model related to the content type;

the identifying module 504 is configured to perform content identification on the image to be identified by using a content identification model to obtain target content.

In some embodiments of the present disclosure, as shown in fig. 6, fig. 6 is a schematic diagram according to a fifth embodiment of the present disclosure, the content recognition apparatus 60 includes: the obtaining module 601, the first determining module 602, the second determining module 603, and the identifying module 604, wherein the identifying module 604 includes:

an identification submodule 6041, configured to perform content identification on the image to be identified by using a plurality of content identification models, respectively, so as to obtain a plurality of reference contents;

a generating sub-module 6042 for generating the target content from the plurality of reference contents.

In some embodiments of the present disclosure, the first determining module 602 is specifically configured to:

identifying main content from the image to be identified;

determining a subject category of the subject content;

when the main body category is a target vertical category, acquiring a plurality of candidate content types related to the target vertical category; and

a content type corresponding to the image to be recognized is determined from the plurality of candidate content types.

In some embodiments of the present disclosure, the first determining module 602 is further configured to:

carrying out initial identification on an image to be identified to obtain initial content;

and determining a candidate content type matched with the initial content from the plurality of candidate content types, and taking the matched candidate content type as the content type corresponding to the image to be identified.

In some embodiments of the present disclosure, the second determining module 603 is specifically configured to:

determining a plurality of reference recognition models matched with the target verticals from a recognition model topological network, wherein the recognition model topological network is composed of a plurality of candidate recognition models, and the reference recognition models belong to the plurality of candidate recognition models;

a reference recognition model related to the content type is determined from among the plurality of reference recognition models and is used as the content recognition model.

In some embodiments of the present disclosure, the second determining module 603 is further configured to:

determining global configuration information of a target vertical class, wherein the global configuration information comprises: a plurality of model identifications and a plurality of execution orders respectively corresponding to the plurality of model identifications;

determining a plurality of candidate recognition models respectively corresponding to the plurality of model identifications from the recognition model topology network; and

and configuring the execution sequence of the corresponding candidate recognition models according to the execution sequence, and taking the configured candidate recognition models as reference recognition models.

after determining a reference recognition model related to a content type from among a plurality of reference recognition models and as a content recognition model, when the content type is a field type, acquiring an auxiliary recognition model corresponding to the field type, wherein the related reference recognition model and the auxiliary recognition model are collectively used as the content recognition model.

It is understood that the content identification device 60 in fig. 6 of the present embodiment and the content identification device 50 in the foregoing embodiment, the obtaining module 601 and the obtaining module 501 in the foregoing embodiment, the first determining module 602 and the first determining module 502 in the foregoing embodiment, the second determining module 603 and the second determining module 503 in the foregoing embodiment, and the identifying module 604 and the identifying module 504 in the foregoing embodiment may have the same functions and structures.

It should be noted that the foregoing explanation of the content identification method is also applicable to the content identification apparatus of the present embodiment, and is not repeated herein.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 701 executes the respective methods and processes described above, such as the content recognition method. For example, in some embodiments, the content identification method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM702 and/or communications unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the content recognition method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the content recognition method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A content recognition method, comprising:

acquiring an image to be identified;

determining the content type of the image to be recognized;

determining a content recognition model associated with the content type; and

and adopting the content identification model to identify the content of the image to be identified so as to obtain target content.

2. The method of claim 1, wherein the number of the content recognition models is multiple, and the performing content recognition on the image to be recognized by using the content recognition models to obtain target content comprises:

respectively adopting a plurality of content identification models to identify the content of the image to be identified so as to obtain a plurality of reference contents;

generating the target content from the plurality of reference contents.

3. The method of claim 1, wherein the determining the content type of the image to be recognized comprises:

identifying and obtaining main content from the image to be identified;

determining a subject category of the subject content;

and determining a content type corresponding to the image to be recognized from the plurality of candidate content types.

4. The method of claim 3, wherein the determining a content type corresponding to the image to be recognized from the plurality of candidate content types comprises:

performing initial identification on the image to be identified to obtain initial content;

and determining a candidate content type matched with the initial content from the plurality of candidate content types, and taking the matched candidate content type as the content type corresponding to the image to be recognized.

5. The method of claim 4, wherein the determining a content recognition model related to the content type comprises:

determining a plurality of reference recognition models matched with the target vertical class from a recognition model topological network, wherein the recognition model topological network is composed of a plurality of candidate recognition models, and the reference recognition models belong to the candidate recognition models;

determining a reference recognition model related to the content type from among the plurality of reference recognition models and as the content recognition model.

6. The method of claim 5, wherein said determining a plurality of reference recognition models from a recognition model topology network that match the target vertical class comprises:

determining global configuration information of the target vertical class, wherein the global configuration information comprises: a plurality of model identifications and a plurality of execution orders respectively corresponding to the plurality of model identifications;

and configuring the execution sequence of the corresponding candidate recognition model according to the execution sequence, and taking the configured candidate recognition model as the reference recognition model.

7. The method of claim 5, further comprising, after said determining a reference recognition model related to the content type from among the plurality of reference recognition models and as the content recognition model:

if the content type is a field type, acquiring an auxiliary recognition model corresponding to the field type, wherein the related reference recognition model and the auxiliary recognition model are jointly used as the content recognition model.

8. A content recognition apparatus comprising:

the acquisition module is used for acquiring an image to be identified;

the first determination module is used for determining the content type of the image to be identified;

a second determination module for determining a content recognition model associated with the content type;

and the identification module is used for identifying the content of the image to be identified by adopting the content identification model so as to obtain the target content.

9. The apparatus of claim 8, the number of content recognition models being a plurality, wherein the recognition module comprises:

the identification submodule is used for carrying out content identification on the image to be identified by respectively adopting the plurality of content identification models so as to obtain a plurality of reference contents;

and the generation sub-module is used for generating the target content according to the plurality of reference contents.

10. The apparatus of claim 8, wherein the first determining module is specifically configured to:

identifying and obtaining main content from the image to be identified;

determining a subject category of the subject content;

11. The apparatus of claim 10, wherein the first determining module is further configured to:

12. The apparatus of claim 11, wherein the second determining module is specifically configured to:

13. The apparatus of claim 12, wherein the second determining means is further configured to:

14. The apparatus of claim 12, wherein the second determining means is further configured to:

after the reference recognition model related to the content type is determined from the plurality of reference recognition models and is used as the content recognition model, when the content type is a field type, an auxiliary recognition model corresponding to the field type is obtained, wherein the related reference recognition model and the auxiliary recognition model are used as the content recognition model together.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.

17. A computer program product comprising a computer program which, when being executed by a processor, carries out the steps of the method according to any one of claims 1-7.