CN115471779A - Image recognition method and device, computer-readable storage medium and electronic equipment - Google Patents

Image recognition method and device, computer-readable storage medium and electronic equipment Download PDF

Info

Publication number
CN115471779A
CN115471779A CN202211274337.2A CN202211274337A CN115471779A CN 115471779 A CN115471779 A CN 115471779A CN 202211274337 A CN202211274337 A CN 202211274337A CN 115471779 A CN115471779 A CN 115471779A
Authority
CN
China
Prior art keywords
target
image
size
recognition model
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211274337.2A
Other languages
Chinese (zh)
Inventor
刘乙赛
罗涛
施佳子
李艳宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202211274337.2A priority Critical patent/CN115471779A/en
Publication of CN115471779A publication Critical patent/CN115471779A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image identification method, an image identification device, a computer readable storage medium and electronic equipment, and relates to the technical field of artificial intelligence. Wherein, the method comprises the following steps: acquiring at least one video frame, wherein the video frame at least comprises an image to be identified; inputting an image to be recognized into a target recognition model, and outputting a plurality of recognition results, wherein the target recognition model is obtained by adding a target network module into a first recognition model, the target network module is used for expanding network branches of the first recognition model, and the recognition results represent the probability that the image to be recognized belongs to the category corresponding to the recognition results; and determining a target class to which the image to be recognized belongs according to the recognition results, and determining the image to be recognized as a target image under the condition that the target class meets a preset condition, wherein the target image at least comprises a target graphic logo. The invention solves the technical problem of low accuracy of the image recognition model in the prior art for recognizing the graphic logo.

Description

Image recognition method and device, computer-readable storage medium and electronic equipment
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an image identification method, an image identification device, a computer-readable storage medium and electronic equipment.
Background
Financial institutions have promoted a series of activities for users to participate in order to promote the interest of their own application programs, for example, financial institutions have promoted Augmented Reality (AR) lottery activities on their own application programs (e.g., mobile banking APP), and users can participate in lottery activities by scanning a personal financial institution's graphic logo (e.g., financial institution's icon) through the application programs, so as to achieve the effects of enhancing user experience and improving user retention rate.
In order to meet the above requirements, a series of image recognition models for recognizing a graphic logo have appeared, but since the models need to be applied to a mobile terminal, certain requirements are imposed on the parameter amount and performance of the models. At present, although the image recognition model adopted in the prior art has small parameter quantity, the recognition effect is poor, the performance of the model is reduced, and the problem of low recognition accuracy of the graphic logo exists.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides an image identification method, an image identification device, a computer readable storage medium and electronic equipment, which at least solve the technical problem that an image identification model in the prior art has low identification accuracy on a graphic logo.
According to an aspect of an embodiment of the present invention, there is provided an image recognition method including: acquiring at least one video frame, wherein the video frame at least comprises an image to be identified; inputting an image to be recognized into a target recognition model, and outputting a plurality of recognition results, wherein the target recognition model is obtained by adding a target network module into a first recognition model, the target network module is used for expanding network branches of the first recognition model, and the recognition results represent the probability that the image to be recognized belongs to the category corresponding to the recognition results; and determining a target class to which the image to be recognized belongs according to the multiple recognition results, and determining the image to be recognized as a target image under the condition that the target class meets a preset condition, wherein the target image at least comprises a target graphic logo.
Further, the image recognition method further includes: acquiring a target sample data set; training the initial recognition model according to the target sample data set to obtain a trained recognition model; and carrying out fusion processing on the target layer network structure of the trained recognition model to obtain the target recognition model.
Further, the image recognition method further includes: acquiring a first identification model, wherein the first identification model at least comprises an inverted residual error structure, and the inverted residual error structure at least comprises a residual error structure and a series connection structure; adding a first network module and a second network module between the point-by-point convolution layer of the residual error structure and the depth convolution layer of the residual error structure to obtain a first target initial network module, wherein the convolution kernel size of the depth convolution layer of the residual error structure is a first size, the first network module consists of a batch normalization layer, and the second network module consists of a batch normalization layer and a depth convolution layer of which the convolution kernel size is a second size; adding a third network module in the series structure to obtain a second target initial network module, wherein the third network module at least comprises a first network branch and a second network branch, the size of a convolution kernel of a depth convolution layer of the first network branch is a third size, the size of a convolution kernel of a depth convolution layer of the second network branch is a fourth size, and the first size, the second size, the third size and the fourth size are different in size; and generating an initial recognition model based on the first target initial network module and the second target initial network module.
Further, the image recognition method further includes: converting the batch normalization layer in the first network module to obtain a first depth convolution layer, wherein the convolution kernel size of the first depth convolution layer is a first size; converting the depth convolution layer with the convolution kernel size of the second size to obtain a second depth convolution layer, wherein the convolution kernel size of the second depth convolution layer is the first size; performing fusion processing on the batch normalization layer and the second depth convolution layer in the second network module to obtain a third depth convolution layer, wherein the convolution kernel size of the third depth convolution layer is the first size; and performing fusion processing on the first depth convolution layer, the third depth convolution layer and the depth convolution layer of the residual error structure to obtain the target identification model.
Further, the image recognition method further includes: acquiring a sample data set; and carrying out image enhancement processing on the sample data set to obtain a target sample data set, wherein the image enhancement processing at least comprises random confusion enhancement processing and background replacement enhancement processing.
Further, the image recognition method further includes: and converting the target recognition model into a target file, and integrating the target file into a target platform, wherein the format of the target file is a file format which can be recognized by the target platform.
Further, the image recognition method further includes: responding to a page jump instruction, and displaying a preset page, wherein the preset page is used for guiding the target object to participate in the activity on the preset page.
According to another aspect of the embodiments of the present invention, there is also provided an image recognition apparatus including: the device comprises an acquisition module, a recognition module and a recognition module, wherein the acquisition module is used for acquiring at least one video frame, and the video frame at least comprises an image to be recognized; the processing module is used for inputting the image to be recognized into a target recognition model and outputting a plurality of recognition results, wherein the target recognition model is obtained by adding a target network module into a first recognition model, the target network module is used for expanding network branches of the first recognition model, and the recognition results represent the probability that the image to be recognized belongs to the category corresponding to the recognition results; and the determining module is used for determining a target class to which the image to be recognized belongs according to the multiple recognition results, and determining the image to be recognized as a target image under the condition that the target class meets a preset condition, wherein the target image at least comprises a target graphic logo.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to execute the above-mentioned image recognition method when running.
According to another aspect of embodiments of the present invention, there is also provided an electronic device, including one or more processors; a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method for performing image recognition as described above, wherein the program is arranged to perform the method for performing image recognition as described above when executed.
In the embodiment of the invention, a mode of improving a first recognition model to recognize a target image is adopted, at least one video frame is firstly obtained, then an image to be recognized is input into the target recognition model, a plurality of recognition results are output, a target class to which the image to be recognized belongs is determined according to the plurality of recognition results, and the image to be recognized is determined as the target image under the condition that the target class meets a preset condition. The target image at least comprises a target graphic logo, the video frame at least comprises an image to be recognized, the target recognition model is obtained by adding a target network module into the first recognition model, the target network module is used for expanding network branches of the first recognition model, and the recognition result represents the probability that the image to be recognized belongs to the category corresponding to the recognition result.
In the process, an accurate data basis is provided for the subsequent identification of the target image by acquiring at least one video frame; by adding the target network module into the first recognition model, the first recognition model is improved, and the target recognition model can be obtained; the target image can be recognized through the target recognition model, so that the target graphic logo can be recognized, the recognition precision of the model can be improved on the premise of reducing the size of the model, and the performance of the model can be improved.
Therefore, the technical scheme of the invention achieves the aim of accurately identifying the target graphic logo, thereby realizing the technical effect of improving the identification accuracy of the image identification model on the graphic logo, and further solving the technical problem of low identification accuracy of the image identification model on the graphic logo in the prior art.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow diagram of an alternative image recognition method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an alternative residual structure according to an embodiment of the invention;
FIG. 3 is a schematic diagram of an alternative series configuration in accordance with embodiments of the present invention;
FIG. 4 is a training diagram of an alternative modified residual structure according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an alternative modified residual structure according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an alternative modified series configuration in accordance with embodiments of the present invention;
FIG. 7 is a schematic diagram of an alternative image recognition arrangement according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of an alternative electronic device according to an embodiment of the invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the related information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) related to the present invention are information and data authorized by the user or sufficiently authorized by each party. For example, an interface is provided between the system and the relevant user or institution, and before obtaining the relevant information, an obtaining request needs to be sent to the user or institution through the interface, and after receiving the consent information fed back by the user or institution, the relevant information needs to be obtained.
Example 1
In accordance with an embodiment of the present invention, there is provided a method embodiment of an image recognition method, it being noted that the steps illustrated in the flowchart of the drawings may be carried out in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be carried out in an order different than here.
Fig. 1 is a flow chart of an alternative image recognition method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
step S101, at least one video frame is obtained, wherein the video frame at least comprises an image to be identified.
In the above steps, at least one video frame may be obtained by an application system, a processor, an electronic device, or the like. Alternatively, the at least one video frame may be obtained by the image recognition system, for example, when the user participates in the lottery activity through an application program of a financial institution, after clicking a scanning function of the application program, the image recognition system may obtain the at least one video frame in real time through a mobile phone camera of the user.
It should be noted that, in the above process, by acquiring at least one video frame, an accurate data basis is provided for the subsequent identification of the target image.
Step S102, inputting the image to be recognized into a target recognition model, and outputting a plurality of recognition results, wherein the target recognition model is obtained by adding a target network module into a first recognition model, the target network module is used for expanding network branches of the first recognition model, and the recognition results represent the probability that the image to be recognized belongs to the category corresponding to the recognition results.
In the above step, the image to be recognized may be an image in at least one video frame obtained by the mobile phone camera of the user in real time, and the recognition result may correspond to multiple categories, for example, a text category, an animal category, and the like. Alternatively, the first recognition model may be a MobileNet v2 model. The basic unit of the mobile phone network model is a deep separable convolution, wherein the deep separable convolution comprises a deep convolution (DW) and a pointwise convolution (PW). The MobileNet v2 model is characterized in that an inverse residual error structure is added on the basis of the MobileNet model, PW convolution is firstly used for carrying out dimension increasing operation, then DW convolution of 3 x 3 is used for extracting the features of each channel, and finally PW convolution is used for carrying out feature dimension reduction. Alternatively, fig. 2 is a schematic diagram of an alternative residual structure according to an embodiment of the present invention, and fig. 3 is a schematic diagram of an alternative series structure according to an embodiment of the present invention. Optionally, the inverse residual structure at least includes a residual structure and a series structure, as shown in fig. 2 and fig. 3, where the step size of the residual structure is 1, and the step size of the series structure is 2.
Optionally, in this embodiment, after the user clicks a scanning function of the application program, the image to be recognized may be input into the target recognition model through the image recognition system, and a plurality of recognition results are output. Alternatively, the recognition results correspond to a plurality of categories. For example, financial institution a launches an Augmented Reality (AR) lottery on a self-application program (e.g., mobile banking APP), a user scans a nearby graphical logo (e.g., a logo of the financial institution) of financial institution a through the application program, the image recognition system inputs an image to be recognized into the object recognition model, and outputs recognition results of 4 categories, namely, a logo category of financial institution a, a chinese character category corresponding to the logo of financial institution a, a logo category of other financial institutions, and other categories. Specifically, the output recognition result is a probability value of 4 categories, for example, after recognition is performed by the target recognition model, the probability that the image to be recognized belongs to the logo category of the financial institution a is output to be 0.7, the probability that the image to be recognized belongs to the chinese character category corresponding to the logo of the financial institution a is output to be 0.1, the probability that the image to be recognized belongs to the logo categories of other financial institutions is output to be 0.1, and the probability that the image to be recognized belongs to the other categories is output to be 0.1.
It should be noted that, in the above process, an improvement process of the first recognition model is implemented; the target image can be recognized through the target recognition model, so that the target graphic logo can be recognized, and the recognition precision of the model can be improved on the premise of reducing the size of the model.
Step S103, determining a target class to which the image to be recognized belongs according to the multiple recognition results, and determining the image to be recognized as a target image under the condition that the target class meets a preset condition, wherein the target image at least comprises a target graphic logo.
In the foregoing step, the target category may be a logo category of the financial institution a, the target graphic logo may be a logo of the financial institution a, and the preset condition may be that a probability and a frequency of the target recognition model recognizing that the image to be recognized belongs to the target category exceed a preset threshold, for example, when the probability of recognizing that the image to be recognized belongs to the logo category of the financial institution a is continuously greater than 0.8 for ten times, it is determined that the image to be recognized is the target image, that is, the logo of the financial institution a is recognized.
Based on the schemes defined in steps S101 to S103, it can be known that, in the embodiment of the present invention, a manner of modifying the first recognition model to recognize the target image is adopted, at least one video frame is obtained first, then the image to be recognized is input into the target recognition model, a plurality of recognition results are output, the target category to which the image to be recognized belongs is determined according to the plurality of recognition results, and the image to be recognized is determined as the target image when the target category meets the preset condition. The target image at least comprises a target graphic logo, the video frame at least comprises an image to be recognized, the target recognition model is obtained by adding a target network module into the first recognition model, the target network module is used for expanding network branches of the first recognition model, and the recognition result represents the probability that the image to be recognized belongs to the category corresponding to the recognition result.
It is easy to notice that in the above process, by acquiring at least one video frame, an accurate data base is provided for the subsequent identification of the target image; by adding the target network module into the first recognition model, the first recognition model is improved, and the target recognition model can be obtained; the target image can be recognized through the target recognition model, so that the target graphic logo can be recognized, the recognition precision of the model can be improved on the premise of reducing the size of the model, and the performance of the model can be improved.
Therefore, the technical scheme of the invention achieves the aim of accurately identifying the target graphic logo, thereby realizing the technical effect of improving the identification accuracy of the image identification model on the graphic logo, and further solving the technical problem of low identification accuracy of the image identification model on the graphic logo in the prior art.
In an alternative embodiment, the object recognition model is generated by: acquiring a target sample data set; training the initial recognition model according to the target sample data set to obtain a trained recognition model; and carrying out fusion processing on the target layer network structure of the trained recognition model to obtain the target recognition model.
Optionally, the target sample data set may be obtained by performing image enhancement on the sample data set, and the initial recognition model may be a model obtained by improving a first recognition model, i.e., a MobileNet v2 model. Optionally, the initial recognition model is trained through the target sample data set, so that a recognition model with a converged effect, that is, a trained recognition model, can be obtained, wherein a model training method widely applied in the field can be selected by a developer, and is not described herein any more.
Specifically, in an optional embodiment, before training an initial recognition model according to a target sample data set to obtain a trained recognition model, a first recognition model is obtained, then a first network module and a second network module are added between a point-by-point convolution layer of a residual structure and a deep convolution layer of the residual structure to obtain a first target initial network module, then a third network module is added to a series structure to obtain a second target initial network module, and then the initial recognition model is generated based on the first target initial network module and the second target initial network module. The first identification model at least comprises a reverse residual error structure, the reverse residual error structure at least comprises a residual error structure and a series structure, the size of a convolution kernel of a depth convolution layer of the residual error structure is a first size, the first network module consists of a batch normalization layer, the second network module consists of a batch normalization layer and a depth convolution layer of which the size of the convolution kernel is a second size, the third network module at least comprises a first network branch and a second network branch, the size of the convolution kernel of the depth convolution layer of the first network branch is a third size, the size of the convolution kernel of the depth convolution layer of the second network branch is a fourth size, and the first size, the second size, the third size and the fourth size are different.
Alternatively, the first size may refer to a convolution kernel size of 3 × 3, the second size may refer to a convolution kernel size of 1 × 1, the third size may refer to a convolution kernel size of 5 × 5, and the fourth size may refer to a convolution kernel size of 7 × 7. Specifically, the improvement on the first recognition model, namely the MobileNet v2 model, is mainly the improvement on a residual structure and a series structure.
Optionally, fig. 4 is a training schematic diagram of an optional improved residual structure according to an embodiment of the present invention, and as shown in fig. 4, the first network module may be a batch normalization layer, that is, a BN layer, and the second network module may be composed of a BN layer and a depth convolution layer with a convolution kernel size of a second size, that is, a DW1 × 1 layer.
Optionally, a BN layer, and a DW1 layer are added between the point-by-point convolution layer of the residual structure of the first identification model, that is, the PW layer, and the depth convolution layer of the residual structure, that is, the DW layer, and the DW convolution layer (with a convolution kernel size of 3 × 3), the BN layer, and the DW1 × 1 layer shown in fig. 4 are used as the first target initial network module, thereby implementing improvement on the residual structure of the MobileNet v2 model.
Optionally, fig. 6 is a schematic diagram of an optional improved serial structure according to the embodiment of the present invention, and as shown in fig. 6, the third network module at least includes a first network branch and a second network branch, where the first network branch may be a branch where DW5 × 5 is located, and the second network branch may be a branch where DW7 × 7 is located. Optionally, the branch in which DW3 × 3, the branch in which DW5 × 5, and the branch in which DW7 × 7 are shown in fig. 6 are used as a second target initial network module, so that the improvement of the tandem structure of the MobileNet v2 model is realized. Specifically, 3 parallel branches may be used to perform feature extraction, that is, multi-scale feature extraction of 3 × 3, 5 × 5, and 7 × 7 may be performed separately, then an output consistency mechanism may be used to ensure that output shapes are consistent, and finally, the multi-scale feature extraction results are subjected to additive fusion to obtain a final result.
It should be noted that, in the above process, by adding network branches, feature maps of different scales can be fused, so as to improve the learning ability of the model.
Specifically, in an optional embodiment, in the process of performing fusion processing on a target layer network structure of a trained recognition model to obtain a target recognition model, a batch normalization layer in a first network module is converted to obtain a first depth convolutional layer, then a depth convolutional layer with a convolutional kernel size of a second size is converted to obtain a second depth convolutional layer, then the batch normalization layer in the second network module and the second depth convolutional layer are subjected to fusion processing to obtain a third depth convolutional layer, and then the first depth convolutional layer, the third depth convolutional layer and the depth convolutional layer with a residual error structure are subjected to fusion processing to obtain the target recognition model. The convolution kernel size of the first depth convolution layer is the first size, the convolution kernel size of the second depth convolution layer is the first size, and the convolution kernel size of the third depth convolution layer is the first size.
Optionally, in this embodiment, a model structure adopted in the training process is as shown in fig. 4, and after the training, the model is subjected to fusion processing to obtain a schematic diagram of an improved residual structure as shown in fig. 5. Specifically, the BN layer shown in fig. 4 is first converted into one DW3 × 3 layer, i.e., a first depth convolution layer, then the DW1 × 1 layer shown in fig. 4 is converted into one DW3 × 3 layer, i.e., a second depth convolution layer, and the BN layer connected to the original DW1 × 1 layer shown in fig. 4 and the DW3 × 3 layer are fused into one DW3 × 3 layer, i.e., a third depth convolution layer. To this end, 3 parallel DW3 × 3 layers were obtained, respectively: the depth convolution layer of the residual structure, i.e., the DW convolution as shown in fig. 4; DW3 x 3 layers, i.e. first depth convolution layers, converted from BN layers as shown in fig. 4; the third depth convolution layer is DW3 × 3 layers converted and fused from the BN layer and the DW1 × 1 layer shown in fig. 4. Further, the 3 DW3 × 3 layers are fused into one DW3 × 3 convolution layer, that is, the DW convolution shown in fig. 5, to obtain the target identification model in actual deployment. The above-mentioned network layer conversion and fusion method can be selected by developers, and will not be described herein.
It should be noted that, in the above process, a network model with a multi-branch topological structure is used during training, and a model after fusion processing is used during actual deployment, so that the network size can be reduced, the calculation amount in the actual recognition process is reduced, and the target recognition model is more suitable for the mobile terminal without losing the network precision.
Optionally, in this embodiment, a transfer learning method may be adopted to avoid learning the model from zero, so that the model obtains a better initialization weight parameter. The transfer learning method is a method widely applied in the art, and is not described herein again.
In an optional embodiment, before the target sample data set is obtained, the sample data set is obtained; and carrying out image enhancement processing on the sample data set to obtain a target sample data set, wherein the image enhancement processing at least comprises random confusion enhancement processing and background replacement enhancement processing.
Optionally, in addition to the common image enhancement modes such as rotation, translation, cropping, and noise addition, a random aliasing enhancement processing mode and a background replacement enhancement processing mode are also adopted in this embodiment. The random confusion enhancement processing refers to non-overlapping random combination of n images in the sample data set to obtain a plurality of new images, so that the data set is expanded, the distribution of training images is enriched, and the model has higher generalization capability. Specifically, the number n of original images included in each line of the generated image may be set in a program script, so that the number n of the original images that are needed in common is obtained, then n × n images are randomly selected from the sample data set, and each image is sequentially subjected to enhancement such as rotation and whitening and is pasted on a blank drawing board to generate a new image, and the generated new image is saved for training. Wherein, the background replacement enhancement processing may be to manually mark the mask coordinates of the logo of the financial institution a. Optionally, an individual logo mark is deducted according to the mask coordinates, then the deducted logo is used for random background replacement, specifically, the logo is copied to a random position under a random background, and a series of operations such as random size transformation, random illumination enhancement, random scaling and cutting are carried out on the logo, so that an enhanced sample data set with any number is generated for training. And obtaining a target sample data set.
It should be noted that, through the above process, the training sample data set is expanded, so that the diversity and randomness of the samples are enriched, and the identification accuracy of the model is further improved.
In an optional embodiment, after a target layer network structure of the trained recognition model is subjected to fusion processing to obtain a target recognition model, the target recognition model is converted into a target file, and the target file is integrated into a target platform, wherein the format of the target file is a file format which can be recognized by the target platform.
Optionally, the target platform may be an application program in a mobile terminal such as a mobile phone, for example, a mobile banking APP of the financial institution a. Alternatively, the target recognition model may be converted to a lite-formatted file, i.e., the target file, which is then integrated into the target platform.
It should be noted that the target recognition model can be flexibly applied to each platform by converting the target recognition model into the target file, so that the universality of the model is improved.
In an optional embodiment, after the image to be recognized is determined to be the target image, responding to a page jump instruction, and displaying a preset page, wherein the preset page is used for guiding the target object to participate in activities on the preset page.
Alternatively, after determining that the image to be recognized is the target image, for example, after determining that the image scanned by the user through the application program, that is, the image to be recognized is the logo of the target image, that is, the financial institution a, a lottery page, that is, a preset page, may be skipped in the application program, so that the user can participate in the lottery activity. It should be noted that, through the above process, the user experience is enhanced, so that the retention rate of the user can be improved.
Therefore, the technical scheme of the invention achieves the aim of accurately identifying the target graphic logo, thereby realizing the technical effect of improving the identification accuracy of the image identification model on the graphic logo, and further solving the technical problem of low identification accuracy of the image identification model on the graphic logo in the prior art.
Example 2
According to an embodiment of the present invention, an embodiment of an image recognition apparatus is provided, where fig. 7 is a schematic diagram of an alternative image recognition apparatus according to an embodiment of the present invention, as shown in fig. 7, the apparatus includes: an obtaining module 701, configured to obtain at least one video frame, where the video frame at least includes an image to be identified; a processing module 702, configured to input an image to be recognized into a target recognition model, and output a plurality of recognition results, where the target recognition model is obtained by adding a target network module to a first recognition model, the target network module is configured to expand network branches of the first recognition model, and the recognition result represents a probability that the image to be recognized belongs to a category corresponding to the recognition result; the determining module 703 is configured to determine a target category to which the image to be recognized belongs according to the multiple recognition results, and determine that the image to be recognized is a target image when the target category meets a preset condition, where the target image at least includes a target graphic logo.
It should be noted that the acquiring module 701, the processing module 702, and the determining module 703 correspond to steps S101 to S103 in the foregoing embodiment, and the three modules are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to the disclosure in embodiment 1.
Optionally, the image recognition apparatus further includes: the first acquisition module is used for acquiring a target sample data set; the training module is used for training the initial recognition model according to the target sample data set to obtain a trained recognition model; and the first processing module is used for carrying out fusion processing on the target layer network structure of the trained recognition model to obtain the target recognition model.
Optionally, the image recognition apparatus further includes: the second obtaining module is used for obtaining a first identification model, wherein the first identification model at least comprises an inverse residual error structure, and the inverse residual error structure at least comprises a residual error structure and a serial structure; the second processing module is used for adding a first network module and a second network module between the point-by-point convolution layer of the residual error structure and the depth convolution layer of the residual error structure to obtain a first target initial network module, wherein the convolution kernel size of the depth convolution layer of the residual error structure is a first size, the first network module is composed of a batch normalization layer, and the second network module is composed of a batch normalization layer and a depth convolution layer with the convolution kernel size of a second size; the third processing module is used for adding a third network module in the series structure to obtain a second target initial network module, wherein the third network module at least comprises a first network branch and a second network branch, the size of a convolution kernel of a depth convolution layer of the first network branch is a third size, the size of a convolution kernel of a depth convolution layer of the second network branch is a fourth size, and the first size, the second size, the third size and the fourth size are different; and the generating module is used for generating an initial recognition model based on the first target initial network module and the second target initial network module.
Optionally, the first processing module includes: the first conversion module is used for carrying out conversion processing on the batch normalization layer in the first network module to obtain a first depth convolution layer, wherein the convolution kernel size of the first depth convolution layer is a first size; the second conversion module is used for carrying out conversion processing on the depth convolution layer with the convolution kernel size of a second size to obtain a second depth convolution layer, wherein the convolution kernel size of the second depth convolution layer is the first size; the first fusion module is used for fusing the batch normalization layer and the second depth convolution layer in the second network module to obtain a third depth convolution layer, wherein the convolution kernel size of the third depth convolution layer is the first size; and the second fusion module is used for carrying out fusion processing on the first depth convolution layer, the third depth convolution layer and the depth convolution layer of the residual error structure to obtain the target identification model.
Optionally, the image recognition apparatus further includes: the third acquisition module is used for acquiring the sample data set; and the fourth processing module is used for carrying out image enhancement processing on the sample data set to obtain a target sample data set, wherein the image enhancement processing at least comprises random confusion enhancement processing and background replacement enhancement processing.
Optionally, the image recognition apparatus further includes: and the fifth processing module is used for converting the target recognition model into a target file and integrating the target file into the target platform, wherein the format of the target file is a file format which can be recognized by the target platform.
Optionally, the image recognition apparatus further includes: and the response module is used for responding to the page jump instruction and displaying a preset page, wherein the preset page is used for guiding the target object to participate in the activity on the preset page.
Example 3
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned image recognition method when running.
Example 4
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, wherein fig. 8 is a schematic diagram of an alternative electronic device according to the embodiments of the present invention, as shown in fig. 8, the electronic device includes one or more processors; a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running the programs, wherein the program is arranged to perform the image recognition method described above when run. The processor executes the program to realize the following steps: acquiring at least one video frame, wherein the video frame at least comprises an image to be identified; inputting an image to be recognized into a target recognition model, and outputting a plurality of recognition results, wherein the target recognition model is obtained by adding a target network module into a first recognition model, the target network module is used for expanding network branches of the first recognition model, and the recognition results represent the probability that the image to be recognized belongs to the category corresponding to the recognition results; and determining a target class to which the image to be recognized belongs according to the multiple recognition results, and determining the image to be recognized as a target image under the condition that the target class meets a preset condition, wherein the target image at least comprises a target graphic logo.
Optionally, the following steps are also implemented when the processor executes the program: acquiring a target sample data set; training the initial recognition model according to the target sample data set to obtain a trained recognition model; and carrying out fusion processing on the target layer network structure of the trained recognition model to obtain the target recognition model.
Optionally, the processor executes the program to further implement the following steps: acquiring a first identification model, wherein the first identification model at least comprises an inverse residual error structure, and the inverse residual error structure at least comprises a residual error structure and a series connection structure; adding a first network module and a second network module between the point-by-point convolution layer of the residual error structure and the depth convolution layer of the residual error structure to obtain a first target initial network module, wherein the convolution kernel size of the depth convolution layer of the residual error structure is a first size, the first network module consists of a batch normalization layer, and the second network module consists of a batch normalization layer and a depth convolution layer of which the convolution kernel size is a second size; adding a third network module in the series structure to obtain a second target initial network module, wherein the third network module at least comprises a first network branch and a second network branch, the size of a convolution kernel of a depth convolution layer of the first network branch is a third size, the size of a convolution kernel of a depth convolution layer of the second network branch is a fourth size, and the first size, the second size, the third size and the fourth size are different in size; and generating an initial recognition model based on the first target initial network module and the second target initial network module.
Optionally, the following steps are also implemented when the processor executes the program: converting the batch normalization layer in the first network module to obtain a first depth convolution layer, wherein the convolution kernel size of the first depth convolution layer is a first size; converting the depth convolution layer with the convolution kernel size of the second size to obtain a second depth convolution layer, wherein the convolution kernel size of the second depth convolution layer is the first size; performing fusion processing on the batch normalization layers and the second depth convolution layers in the second network module to obtain a third depth convolution layer, wherein the convolution kernel size of the third depth convolution layer is the first size; and performing fusion processing on the first depth convolution layer, the third depth convolution layer and the depth convolution layer of the residual error structure to obtain the target identification model.
Optionally, the processor executes the program to further implement the following steps: acquiring a sample data set; and performing image enhancement processing on the sample data set to obtain a target sample data set, wherein the image enhancement processing at least comprises random confusion enhancement processing and background replacement enhancement processing.
Optionally, the processor executes the program to further implement the following steps: and converting the target recognition model into a target file, and integrating the target file into a target platform, wherein the format of the target file is a file format which can be recognized by the target platform.
Optionally, the following steps are also implemented when the processor executes the program: responding to a page jump instruction, and displaying a preset page, wherein the preset page is used for guiding the target object to participate in the activity on the preset page.
The device herein may be a server, a PC, a PAD, a mobile phone, etc.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described in detail in a certain embodiment.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or may not be executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be an indirect coupling or communication connection through some interfaces, units or modules, and may be electrical or in other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is substantially or partly contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. An image recognition method, comprising:
acquiring at least one video frame, wherein the video frame at least comprises an image to be identified;
inputting the image to be recognized into a target recognition model, and outputting a plurality of recognition results, wherein the target recognition model is obtained by adding a target network module into a first recognition model, the target network module is used for expanding network branches of the first recognition model, and the recognition results represent the probability that the image to be recognized belongs to the category corresponding to the recognition results;
and determining a target class to which the image to be recognized belongs according to the recognition results, and determining the image to be recognized as a target image under the condition that the target class meets a preset condition, wherein the target image at least comprises a target graphic logo.
2. The method of claim 1, wherein the target recognition model is generated by:
acquiring a target sample data set;
training an initial recognition model according to the target sample data set to obtain a trained recognition model;
and carrying out fusion processing on the target layer network structure of the trained recognition model to obtain the target recognition model.
3. The method of claim 2, wherein before training an initial recognition model according to the target sample data set, resulting in a trained recognition model, the method further comprises:
acquiring the first identification model, wherein the first identification model at least comprises an inverse residual error structure, and the inverse residual error structure at least comprises a residual error structure and a series connection structure;
adding a first network module and a second network module between the point-by-point convolution layer of the residual error structure and the depth convolution layer of the residual error structure to obtain a first target initial network module, wherein the convolution kernel size of the depth convolution layer of the residual error structure is a first size, the first network module consists of a batch normalization layer, and the second network module consists of the batch normalization layer and the depth convolution layer with the convolution kernel size of a second size;
adding a third network module to the series structure to obtain a second target initial network module, wherein the third network module at least comprises a first network branch and a second network branch, the size of a convolution kernel of a depth convolution layer of the first network branch is a third size, the size of a convolution kernel of a depth convolution layer of the second network branch is a fourth size, and the first size, the second size, the third size and the fourth size are different in size;
generating the initial recognition model based on the first target initial network module and the second target initial network module.
4. The method according to claim 3, wherein fusing the target layer network structure of the trained recognition model to obtain the target recognition model comprises:
performing conversion processing on the batch normalization layer in the first network module to obtain a first depth convolution layer, wherein the convolution kernel size of the first depth convolution layer is the first size;
performing conversion processing on the depth convolution layer with the convolution kernel size of a second size to obtain a second depth convolution layer, wherein the convolution kernel size of the second depth convolution layer is the first size;
performing fusion processing on the batch normalization layer and the second depth convolution layer in the second network module to obtain a third depth convolution layer, wherein the convolution kernel size of the third depth convolution layer is the first size;
and performing fusion processing on the first depth convolution layer, the third depth convolution layer and the depth convolution layer of the residual error structure to obtain the target identification model.
5. The method of claim 2, wherein prior to obtaining the target sample data set, the method further comprises:
acquiring a sample data set;
and performing image enhancement processing on the sample data set to obtain the target sample data set, wherein the image enhancement processing at least comprises random confusion enhancement processing and background replacement enhancement processing.
6. The method of claim 2, wherein after fusing the target layer network structure of the trained recognition model to obtain the target recognition model, the method further comprises:
and converting the target recognition model into a target file, and integrating the target file into a target platform, wherein the format of the target file is a file format which can be recognized by the target platform.
7. The method of claim 1, wherein after determining that the image to be recognized is a target image, the method further comprises:
responding to a page jump instruction, and displaying a preset page, wherein the preset page is used for guiding a target object to participate in the activity on the preset page.
8. An image recognition apparatus, characterized by comprising:
the device comprises an acquisition module, a recognition module and a recognition module, wherein the acquisition module is used for acquiring at least one video frame, and the video frame at least comprises an image to be recognized;
the processing module is used for inputting the image to be recognized into a target recognition model and outputting a plurality of recognition results, wherein the target recognition model is obtained by adding a target network module into a first recognition model, the target network module is used for expanding network branches of the first recognition model, and the recognition results represent the probability that the image to be recognized belongs to the category corresponding to the recognition results;
and the determining module is used for determining a target class to which the image to be recognized belongs according to the plurality of recognition results, and determining the image to be recognized as a target image under the condition that the target class meets a preset condition, wherein the target image at least comprises a target image logo.
9. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is arranged to execute the image recognition method as claimed in any one of claims 1 to 7 when executed.
10. An electronic device, wherein the electronic device comprises one or more processors; memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement for running a program, wherein the program is arranged to, when run, perform the image recognition method of any of claims 1 to 7.
CN202211274337.2A 2022-10-18 2022-10-18 Image recognition method and device, computer-readable storage medium and electronic equipment Pending CN115471779A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211274337.2A CN115471779A (en) 2022-10-18 2022-10-18 Image recognition method and device, computer-readable storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211274337.2A CN115471779A (en) 2022-10-18 2022-10-18 Image recognition method and device, computer-readable storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN115471779A true CN115471779A (en) 2022-12-13

Family

ID=84336695

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211274337.2A Pending CN115471779A (en) 2022-10-18 2022-10-18 Image recognition method and device, computer-readable storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN115471779A (en)

Similar Documents

Publication Publication Date Title
CN110458918B (en) Method and device for outputting information
CN108010112B (en) Animation processing method, device and storage medium
US20210227152A1 (en) Method and apparatus for generating image
CN111275784B (en) Method and device for generating image
Zhang et al. Viscode: Embedding information in visualization images using encoder-decoder network
CN112839223B (en) Image compression method, image compression device, storage medium and electronic equipment
Syahidi et al. Bandoar: real-time text based detection system using augmented reality for media translator banjar language to Indonesian with smartphone
CN109388725A (en) The method and device scanned for by video content
CN106447756B (en) Method and system for generating user-customized computer-generated animations
CN115393872B (en) Method, device and equipment for training text classification model and storage medium
CN110232726A (en) The generation method and device of intention material
CN108921138B (en) Method and apparatus for generating information
CN110533020A (en) A kind of recognition methods of text information, device and storage medium
CN117252947A (en) Image processing method, image processing apparatus, computer, storage medium, and program product
CN116628250A (en) Image generation method, device, electronic equipment and computer readable storage medium
CN111107264A (en) Image processing method, image processing device, storage medium and terminal
CN115471779A (en) Image recognition method and device, computer-readable storage medium and electronic equipment
CN112836467B (en) Image processing method and device
CN113409423A (en) License plate image generation method, system, device and storage medium
CN114283422A (en) Handwritten font generation method and device, electronic equipment and storage medium
CN112835807A (en) Interface identification method and device, electronic equipment and storage medium
CN112449249A (en) Video stream processing method and device, electronic equipment and storage medium
CN113496225B (en) Image processing method, image processing device, computer equipment and storage medium
CN115937338B (en) Image processing method, device, equipment and medium
US20210319064A1 (en) Method and apparatus for providing storytelling data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination