CN116863116A - Image recognition method, device, equipment and medium based on artificial intelligence - Google Patents

Image recognition method, device, equipment and medium based on artificial intelligence Download PDF

Info

Publication number
CN116863116A
CN116863116A CN202310798703.2A CN202310798703A CN116863116A CN 116863116 A CN116863116 A CN 116863116A CN 202310798703 A CN202310798703 A CN 202310798703A CN 116863116 A CN116863116 A CN 116863116A
Authority
CN
China
Prior art keywords
image
target
preset
information
identified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310798703.2A
Other languages
Chinese (zh)
Inventor
张倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202310798703.2A priority Critical patent/CN116863116A/en
Publication of CN116863116A publication Critical patent/CN116863116A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/68Food, e.g. fruit or vegetables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images
    • G06V2201/034Recognition of patterns in medical or anatomical images of medical instruments

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention relates to the technical fields of artificial intelligence and intelligent medical treatment, and discloses an image identification method, device, equipment and medium based on artificial intelligence, wherein the method comprises the following steps: acquiring image information and source information of an image to be identified, determining a target image type of the image to be identified based on the source information, and determining a target tag set of the image to be identified according to the target image type and preset identification information; inputting the image information and the target label set into a first preset model, and determining the similarity value of each label in the image information and the target label set to obtain a similarity set; inputting the image information into a second preset model to obtain the number of types corresponding to the image to be identified; and selecting a target similarity value according to the category number to finish the identification of the image to be identified. The number of the types of the targets to be identified in the images to be identified is determined, the number of the target labels is determined according to the number of the types, and each image to be identified is targeted, so that the identification process is more intelligent, and the effect is better.

Description

Image recognition method, device, equipment and medium based on artificial intelligence
Technical Field
The invention relates to the technical field of artificial intelligence and intelligent medical treatment, in particular to an image identification method, device, equipment and medium based on artificial intelligence.
Background
Along with the development of artificial intelligence, the application of identifying pictures by using a picture identification model is wider and wider, so that the method can be applied to life, and more work which needs to identify images can be completed by using the artificial intelligence. For example, when it is desired to identify the food material contained in a picture from a picture containing one or more foods (e.g., corn, rice), this can be accomplished by artificial intelligence; for another example, in the field of intelligent medical, the task of managing medical devices may be accomplished through artificial intelligence when it is necessary to distinguish the medical device contained in a picture from a picture containing one or more medical devices (e.g., a scalpel, a tourniquet); also for example, in the field of equipment maintenance, the task of performing supervision/logistics of running equipment maintenance may be accomplished by artificial intelligence when it is necessary to identify a maintenance instrument contained in a picture from a picture containing one or more maintenance instruments (e.g., a stylus, a screwdriver); the current idea is to consider the process of recognizing an image as a process of target detection.
The current general object detection framework needs to generate candidate boxes (ROI) to generate a possible region of interest (ROI), then performs deletion and recombination on a series of candidate boxes containing objects, so that each object is defined by a single box (box), finally extracts features from the region of interest, and performs subsequent classification or regression through various neural networks. However, in the prior art, since explicit computing features in the region are required, the resolution of the picture is often required for object detection. Meanwhile, the process of generating the candidate region is often time-consuming, the speed is low during training and detection, and high computing resources are required. Meanwhile, the processes of generating candidate frames, deleting and reorganizing a series of candidate frames and the like often bring about a series of errors, and the ROI features to characterize the image feature data can be regarded as features of region (region) granularity, possibly bringing about some noise loss.
Disclosure of Invention
In view of the above, the invention provides an image recognition method, device, equipment and medium based on artificial intelligence, which are used for solving the problems of complex recognition process, large calculation amount and inaccurate recognition in the prior art.
To achieve one or a part or all of the above or other objects, the present invention provides an image recognition method based on artificial intelligence, comprising: acquiring image information and source information of an image to be identified, determining a target image type of the image to be identified based on the source information, and determining a target tag set of the image to be identified according to the target image type and preset identification information, wherein the preset identification information comprises different image types and preset tag sets corresponding to the image types;
inputting the image information and the target tag set into a first preset model to determine the similarity value of each tag in the image information and the target tag set, and obtaining a similarity set, wherein the first preset model comprises a preset coding algorithm, an image coding algorithm and a similarity calculation algorithm;
inputting the image information into a second preset model to obtain the number of types of targets to be identified in the images to be identified, wherein the second preset model comprises a first basic model for obtaining deep semantic information of the images to be identified and a second basic model for classifying;
selecting the similarity value with the largest similarity value and the number of the similarity values being the category number from the similarity set, taking the selected similarity value as a target similarity value, taking a label corresponding to the target similarity value as a target label, and completing the identification of the image to be identified based on the target label.
In another aspect, the present application provides an artificial intelligence based image recognition apparatus, the apparatus comprising:
the data acquisition module is used for acquiring image information and source information of an image to be identified, determining a target image type of the image to be identified based on the source information, and determining a target tag set of the image to be identified according to the target image type and preset identification information, wherein the preset identification information comprises different image types and preset tag sets corresponding to the image types;
the first computing module is used for inputting the image information and the target tag set into a first preset model to determine the similarity value of each tag in the image information and the target tag set, so as to obtain a similarity set, wherein the first preset model comprises a preset encoding algorithm, an image encoding algorithm and a similarity computing algorithm;
the second calculation module is used for inputting the image information into a second preset model to obtain the number of types of targets to be identified in the images to be identified, and the second preset model comprises a first basic model used for obtaining deep semantic information of the images to be identified and a second basic model used for classifying;
The identification module is used for selecting the similarity value with the maximum similarity value and the maximum number of the similarity values from the similarity set, taking the selected similarity value as a target similarity value, taking a label corresponding to the target similarity value as a target label, and completing the identification of the image to be identified based on the target label.
In another aspect, the present application provides an electronic device, including: a processor, a memory, and a bus, the memory storing machine-readable instructions executable by the processor, the processor in communication with the memory via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing: acquiring image information and source information of an image to be identified, determining a target image type of the image to be identified based on the source information, and determining a target tag set of the image to be identified according to the target image type and preset identification information, wherein the preset identification information comprises different image types and preset tag sets corresponding to the image types; inputting the image information and the target tag set into a first preset model to determine the similarity value of each tag in the image information and the target tag set, and obtaining a similarity set, wherein the first preset model comprises a preset coding algorithm, an image coding algorithm and a similarity calculation algorithm; inputting the image information into a second preset model to obtain the number of types of targets to be identified in the images to be identified, wherein the second preset model comprises a first basic model for obtaining deep semantic information of the images to be identified and a second basic model for classifying; selecting the similarity value with the largest similarity value and the number of the similarity values being the category number from the similarity set, taking the selected similarity value as a target similarity value, taking a label corresponding to the target similarity value as a target label, and completing the identification of the image to be identified based on the target label.
In another aspect, the present application provides a computer readable storage medium having a computer program stored thereon, the computer program when executed by a processor performing: acquiring image information and source information of an image to be identified, determining a target image type of the image to be identified based on the source information, and determining a target tag set of the image to be identified according to the target image type and preset identification information, wherein the preset identification information comprises different image types and preset tag sets corresponding to the image types; inputting the image information and the target tag set into a first preset model to determine the similarity value of each tag in the image information and the target tag set, and obtaining a similarity set, wherein the first preset model comprises a preset coding algorithm, an image coding algorithm and a similarity calculation algorithm; inputting the image information into a second preset model to obtain the number of types of targets to be identified in the images to be identified, wherein the second preset model comprises a first basic model for obtaining deep semantic information of the images to be identified and a second basic model for classifying; selecting the similarity value with the largest similarity value and the number of the similarity values being the category number from the similarity set, taking the selected similarity value as a target similarity value, taking a label corresponding to the target similarity value as a target label, and completing the identification of the image to be identified based on the target label.
The implementation of the embodiment of the invention has the following beneficial effects:
determining a target image type of an image to be identified based on source information by acquiring the image information and the source information of the image to be identified, and determining a target tag set of the image to be identified according to the target image type and preset identification information, wherein the preset identification information comprises different image types and preset tag sets corresponding to the image types; inputting the image information and the target tag set into a first preset model to determine the similarity value of each tag in the image information and the target tag set, and obtaining a similarity set, wherein the first preset model comprises a preset coding algorithm, an image coding algorithm and a similarity calculation algorithm; inputting the image information into a second preset model to obtain the number of types of targets to be identified in the images to be identified, wherein the second preset model comprises a first basic model for obtaining deep semantic information of the images to be identified and a second basic model for classifying; selecting the similarity value with the largest similarity value and the number of the similarity values being the category number from the similarity set, taking the selected similarity value as a target similarity value, taking a label corresponding to the target similarity value as a target label, and completing the identification of the image to be identified based on the target label. The similarity value of the image to be identified and the preset label is obtained through encoding, complex interaction between the image and the preset label is avoided, the number of types of objects to be identified in the image to be identified is determined, and the number of the object labels is determined according to the number of types.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Wherein:
FIG. 1 is an application scenario diagram of an image recognition method based on artificial intelligence provided by an embodiment of the present application;
FIG. 2 is a flow chart of an image recognition method based on artificial intelligence provided by an embodiment of the application;
FIG. 3 is a schematic structural diagram of an image recognition device based on artificial intelligence according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a computer device according to an embodiment of the present application;
fig. 5 is a schematic diagram of another configuration of a computer device according to an embodiment of the present application.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a storage medium according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The image recognition method based on artificial intelligence provided by the embodiment of the invention can be applied to an application environment as shown in fig. 1, wherein a client communicates with a server through a network. The server side can acquire image information and source information of an image to be identified, determine a target image type of the image to be identified based on the source information, and determine a target tag set of the image to be identified according to the target image type and preset identification information, wherein the preset identification information comprises different image types and preset tag sets corresponding to the image types; inputting the image information and the target tag set into a first preset model to determine the similarity value of each tag in the image information and the target tag set, and obtaining a similarity set, wherein the first preset model comprises a preset coding algorithm, an image coding algorithm and a similarity calculation algorithm; inputting the image information into a second preset model to obtain the number of types of targets to be identified in the images to be identified, wherein the second preset model comprises a first basic model for obtaining deep semantic information of the images to be identified and a second basic model for classifying; selecting the similarity value with the largest similarity value and the number of the similarity values being the category number from the similarity set, taking the selected similarity value as a target similarity value, taking a label corresponding to the target similarity value as a target label, and completing the identification of the image to be identified based on the target label. In the invention, the similarity value of the image to be identified and the preset label is obtained through encoding, so that complex interaction between the image and the preset label is avoided, meanwhile, the number of types of objects to be identified in the image to be identified is determined, and the number of the object labels is determined according to the number of types. The clients may be, but are not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server may be implemented by a stand-alone server or a server cluster formed by a plurality of servers. The present invention will be described in detail with reference to specific examples.
In order to reduce the calculation pressure of the server, the image recognition method based on artificial intelligence provided by the embodiment of the invention can also be applied to the client in fig. 1, namely, the image information and the source information of the image to be recognized are obtained, the target image type of the image to be recognized is determined based on the source information, and the target tag set of the image to be recognized is determined according to the target image type and preset recognition information, wherein the preset recognition information comprises different image types and preset tag sets corresponding to various image types; inputting the image information and the target tag set into a first preset model to determine the similarity value of each tag in the image information and the target tag set, and obtaining a similarity set, wherein the first preset model comprises a preset coding algorithm, an image coding algorithm and a similarity calculation algorithm; inputting the image information into a second preset model to obtain the number of types of targets to be identified in the images to be identified, wherein the second preset model comprises a first basic model for obtaining deep semantic information of the images to be identified and a second basic model for classifying; selecting the similarity value with the largest similarity value and the number of the similarity values being the category number from the similarity set, taking the selected similarity value as a target similarity value, taking a label corresponding to the target similarity value as a target label, and completing the identification of the image to be identified based on the target label.
As shown in fig. 2, an embodiment of the present application provides an image recognition method based on artificial intelligence, including:
s101, acquiring image information and source information of an image to be identified, determining a target image type of the image to be identified based on the source information, and determining a target tag set of the image to be identified according to the target image type and preset identification information, wherein the preset identification information comprises different image types and preset tag sets corresponding to the image types;
the image recognition method provided by the embodiment of the application can be applied to image recognition devices or image recognition engines in various scenes, the image recognition process is usually realized through a server, and the server for image recognition can perform data transmission with a client of a user in real time. For example, after receiving image recognition request information of a client, a server obtains image information of an image to be recognized according to the image recognition request information and records source information of the image to be recognized, when the image to be recognized is recognized, labels corresponding to the image to be recognized need to be combined, and labels corresponding to the images to be recognized of different image types are different, so that the image type of the image to be recognized is determined according to the source information of the image to be recognized, for example, when the source information of the image to be recognized is a medical field, namely, when the image to be recognized is obtained through a camera arranged in a region to which the medical field belongs or from a database of the medical field, a target label set corresponding to the medical field is selected.
For example, preset identification information is constructed according to different services and scenes corresponding to the services, for example, for the service of identifying food materials, a first tag set is constructed according to the types of the food materials; for the service of medical instrument identification, constructing a second tag set according to the type of the medical instrument; for the service identified by the maintenance tool, constructing a third tag set according to the type of the maintenance tool; constructing initial identification information according to the first tag set, the second tag set and the third tag set; and associating the initial identification information with a scene corresponding to the service to obtain preset identification information.
The source information comprises identification information of an image acquisition device for acquiring the image to be identified, identification information of a database for storing the image to be identified and the like, and when the source information is the identification information of the database for storing the image to be identified, the image acquisition device for acquiring the image to be identified can be determined according to the storage rule of the database.
The target image types of the image to be identified include, but are not limited to, food materials and equipment, and the preset information includes a label set of the food materials, namely a first label set, and a label set of the equipment, namely a second label set.
S102, inputting the image information and the target tag set into a first preset model to determine a similarity value of each tag in the image information and the target tag set, and obtaining a similarity set, wherein the first preset model comprises a preset coding algorithm, an image coding algorithm and a similarity calculation algorithm;
the image information of the image to be identified and the target tag set corresponding to the image to be identified are input into a first preset model, the first preset model represents the image information as first dimension data (enabling), the first preset model represents each tag in the target tag set as second dimension data (enabling), and similarity of the first dimension data and the second dimension data is calculated, wherein the similarity represents association degree of the first dimension data and the second dimension data, namely, the similarity represents corresponding degree of the image information of the image to be identified and each tag in the target tag set.
The image information of the image to be identified is an actual picture of the image to be identified, for example, a picture containing one or more foods, a picture containing one or more medical instruments, etc. The preset coding algorithm comprises a text coding algorithm and an image coding algorithm. The preset encoding algorithm and the image encoding algorithm are respectively realized through a preset encoder and an image encoder.
The similarity set includes a similarity value for each tag in the set of image information and the target tag set.
S103, inputting the image information into a second preset model to obtain the number of types of objects to be identified in the image to be identified, wherein the second preset model comprises a first basic model for acquiring deep semantic information of the image to be identified and a second basic model for classifying;
the method includes the steps that deep semantic information of an image to be identified is obtained through a first basic model in the second preset model, and the objects to be identified in the image to be identified are classified according to the deep semantic information of the image to be identified through the second basic model in the second preset model, so that the number of types of the objects to be identified in the image to be identified is obtained.
Taking a picture containing food as an example, the category number represents the number of categories of food materials contained in the picture to be identified.
The deep semantic information is texture and color of the image content in the image to be identified, and the deep semantic information of the image to be identified is category information of the image content.
S104, selecting a target similarity value with the maximum similarity value and the number of the categories from the similarity set, taking a label corresponding to the target similarity value as a target label, and completing the identification of the image to be identified based on the target label.
The number of types of the targets to be identified in the image to be identified determined according to the second preset model is exemplified, and a corresponding number of target tags are selected from the similarity set.
For example, when a corresponding number of target labels are selected in the similarity set, the similarity values in the similarity set may be arranged in order from the top to the bottom, and a number of similarity values of a top class is selected, for example, the number of classes is 4, and the first 4 similarity values in the similarity set are selected. And selecting a corresponding number of target labels from the similarity set according to the maximum value, for example, selecting the target similarity with the maximum similarity value in the similarity set, removing the selected target similarity from the similarity set after the selection is completed, selecting the maximum target similarity value again from the similarity set with the target similarity removed, removing the selected target similarity from the similarity set after the selection is completed, and repeating the selection process for a plurality of times to obtain a plurality of target similarity values of the category.
The similarity value of the image to be identified and the preset label is obtained through encoding, complex interaction between the image and the preset label is avoided, the number of types of objects to be identified in the image to be identified is determined, and the number of the object labels is determined according to the number of types.
In a possible implementation manner, the step of determining the image type of the image to be identified based on the source information includes:
determining identification information of a target data acquisition device for acquiring the image to be identified based on the source information;
determining a data acquisition area corresponding to the target data acquisition device based on the identification information and the distribution data of the preset data acquisition device, and determining the area type of the data acquisition area according to the functional department to which the data acquisition area belongs;
and taking the region type as the image type of the image to be identified.
The image to be identified may be an image directly sent to the server by the data acquisition device, or may be an image extracted from the database, where the data acquisition device is a target data acquisition device when the image to be identified is an image directly sent to the server by the data acquisition device, and the identification information of the target data acquisition device is obtained, and the attribute information of the image to be identified is obtained when the image to be identified is an image extracted from the database, and the identification information of the target data acquisition device for collecting the image to be identified is determined according to the attribute information.
The position of the target data acquisition device is determined based on the identification information and distribution data of the preset data acquisition device, and then a data acquisition area corresponding to the target data acquisition device is judged according to the position of the target data acquisition device, wherein the distribution data of the preset data acquisition device is a distribution diagram of the data acquisition device, an installation planning diagram of the data acquisition device and the like.
The method includes determining a region type of the data acquisition region according to a functional department to which the data acquisition region belongs, for example, if the functional department to which the data acquisition region belongs is a department responsible for medical instrument management, the region type of the data acquisition region is a medical instrument type, and further determining an image type of the image to be identified is a medical instrument type; the functional department to which the data acquisition area belongs is a department responsible for food material management, and the area type of the data acquisition area is a food material type, so that the image type of the image to be identified is determined to be the food material type.
In one possible implementation manner, the step of determining the region type of the data acquisition region according to the functional department to which the data acquisition region belongs includes:
When at least two functional departments to which the data acquisition area belongs exist, acquiring target request information, wherein the target request information is used for initiating the step of acquiring the image information and the source information of the image to be identified;
determining account information corresponding to the target request information according to the target request information and log data of an image recognition process;
determining a target function department to which the target request information belongs based on the account information;
and determining the region type of the data acquisition region according to the target functional department.
For example, when there are at least two functional departments to which the data acquisition area belongs, for example, when the data acquisition area, for example, a warehouse, simultaneously places medical equipment and maintenance tools, the functional departments to which the data acquisition area belongs are the medical departments and the maintenance departments, target request information for initiating the current image recognition process is acquired, and when the target request information is initiated by account information of the medical departments, the image type of the image to be recognized generated based on the data acquisition area is determined as the medical equipment type. The problem of inaccurate identification caused by the mutual influence of different departments when identifying the same image to be identified is avoided.
The log data is the procedural event record data generated by the recording system, and in this example, the log data is the record from the start of the identification of the image identification event generated by the recording system where the image identification process is located to the end of the identification.
In one possible implementation manner, before the step of inputting the image information and the target tag set into a first preset model to obtain the similarity value of each tag in the image information and the target tag set, the method further includes:
acquiring a data form of the target tag set;
and determining a preset coding algorithm in the first preset model aiming at the data form of the target tag set.
For example, the data form of the tag set constructed in real work may be a text form, an image form, or the like, so that the preset encoding algorithm in the first preset model is determined for the data form of the tag set, for example, when the data form of the tag set constructed in real work is a text form, the text encoding algorithm is selected as the preset encoding algorithm in the first preset model.
The applicability of the image recognition method of the present application is enhanced by selecting a preset encoding algorithm in the first preset model to adapt to the tag sets of different data forms.
In a possible implementation manner, the inputting the image information and the target tag set into a first preset model determines a similarity value of each tag in the image information and the target tag set, so as to obtain a similarity set, where the first preset model includes a preset encoding algorithm, an image encoding algorithm and a similarity calculating algorithm, and the method includes the steps of:
inputting the target tag set into a preset coding algorithm in the first preset model so that the preset coding algorithm carries out coding characterization on each tag in the target tag set to obtain a first low-dimensional vector;
inputting the image information into an image coding algorithm in the first preset model so that the image coding algorithm carries out coding characterization on the image information to obtain a second low-dimensional vector;
inputting the first low-dimensional vector and the second low-dimensional vector into a similarity calculation algorithm in the first preset model to calculate cosine similarity of the first low-dimensional vector and the second low-dimensional vector through the similarity calculation algorithm;
and taking the cosine similarity as a similarity value of the image information and the labels in the target label set.
The first preset model adopts a contrast learning-based large-scale image-text pre-training model (CLIP), and codes and characterizes the text of the target tag set (labels) into a first low-dimensional vector through a text coder;
the image information, i.e. the actual picture, is characterized by the image encoder encoding as a second low-dimensional vector.
Cosine similarity of the second low-dimensional vector and the first low-dimensional vector is calculated (cosine similarity).
For example, the first low-dimensional vector a is a low-dimensional vector with 512 dimensions, the second low-dimensional vector B is also a low-dimensional vector with 512 dimensions, and the inner product of the first low-dimensional vector and the second low-dimensional vector is calculated to obtain a numerical value with 1 dimension, namely, cosine similarity, which is specifically:
A=【0.3,0.4,0.5】,B=【0.2,0.1,0.6】
the cosine similarity of a and B is=0.3×0.2+0.4×0.1+0.5×0.6=0.4.
The large-scale image-text pre-training model based on contrast learning adopts an open-source Chinese-based Taiyi multi-mode model, the implementation of the preset coding algorithm adopts a text coder with the model number of Taiyi-326M, and the implementation of the image coding algorithm adopts an image coder with the model number of clip-vit-large-patch14.
In a possible implementation manner, before the step of inputting the image information into a second preset model to obtain the number of types of objects to be identified in the image to be identified, the second preset model includes a first basic model for obtaining deep semantic information of the image to be identified and a second basic model for classifying, the method further includes:
Constructing a second basic model for classification based on the full connection layer and a preset classifier;
connecting the full connection layer of the second basic model with a first basic model for acquiring deep semantic information of the image to be identified to obtain an initial model;
training the initial model according to a preset data set to obtain the second preset model.
Illustratively, a two-layer fully connected layer (MLP) is used, after which the classification function is implemented by a connection classifier (softmax), wherein the two-layer fully connected layer formula uses an activation function (relu) to obtain a second basic model, in particular:
wherein W represents the weight, T represents the current round, x represents the input data, b represents the bias, and 1 and 2 represent the first fully connected layer and the second fully connected layer.
The first base model employs a model (ViT, vision Transformer) with encoders that can be trained in parallel and master global information.
In a possible implementation manner, the step of training the initial model according to a preset data set to obtain the second preset model includes:
acquiring first initial parameters of the first basic model and second initial parameters of the second basic model;
Acquiring data pairs obtained through different labeling modes according to a preset proportion, constructing a training set according to the data pairs, and training the second basic model in the initial model based on the training set to obtain second target parameters of the second basic model;
updating the second basic model according to the second target parameters, and keeping the first initial parameters of the first basic model unchanged;
and obtaining the second preset model based on the updated second basic model and the first basic model.
For example, in training the initial model, taking food material recognition as an example, collecting nearly 20000 pictures containing food from a network, labeling tags in 20000 pictures by Named Entity Recognition (NER), obtaining image-tag data pairs, and counting the number of tags of each picture; in addition, 5000 additional pictures are manually marked, and the number of labels is obtained, namely the total number of the labels of 25000 pictures and images is used as a training set to train the initial model.
Illustratively, the portion vision transformer is frozen during the training process, i.e., the first initial parameters of the first base model are kept unchanged, and only the second base model is trained and the parameters are updated.
Illustratively, after two layers of neuro-linguistics (NLP), each tag's final score is obtained via a normalized index (softmax) function.
In one possible embodiment, as shown in fig. 3, the present application provides an artificial intelligence based image recognition apparatus, the apparatus comprising:
the data acquisition module 201 is configured to acquire image information and source information of an image to be identified, determine a target image type of the image to be identified based on the source information, and determine a target tag set of the image to be identified according to the target image type and preset identification information, where the preset identification information includes different image types and preset tag sets corresponding to the image types;
a first calculation module 202, configured to input the image information and the target tag set into a first preset model, to determine a similarity value of each tag in the image information and the target tag set, to obtain a similarity set, where the first preset model includes a preset encoding algorithm, an image encoding algorithm, and a similarity calculation algorithm;
the second computing module 203 is configured to input the image information into a second preset model to obtain the number of types of objects to be identified in the image to be identified, where the second preset model includes a first basic model for obtaining deep semantic information of the image to be identified and a second basic model for classifying the image to be identified;
The identifying module 204 is configured to select, from the similarity set, a similarity value having a maximum similarity value and the number of similarity values being the number of categories, use the selected similarity value as a target similarity value, use a label corresponding to the target similarity value as a target label, and complete identification of the image to be identified based on the target label.
In a possible implementation manner, the data acquisition module 201 is configured to:
determining identification information of a target data acquisition device for acquiring the image to be identified based on the source information;
determining a data acquisition area corresponding to the target data acquisition device based on the identification information and the distribution data of the preset data acquisition device, and determining the area type of the data acquisition area according to the functional department to which the data acquisition area belongs;
and taking the region type as the image type of the image to be identified.
In a possible implementation manner, the data acquisition module 201 is configured to:
when at least two functional departments to which the data acquisition area belongs exist, acquiring target request information, wherein the target request information is used for initiating the step of acquiring the image information and the source information of the image to be identified;
Determining account information corresponding to the target request information according to the target request information and log data of an image recognition process;
determining a target function department to which the target request information belongs based on the account information;
and determining the region type of the data acquisition region according to the target functional department.
In a possible implementation manner, the first computing module 202 is configured to:
acquiring a data form of the target tag set;
and determining a preset coding algorithm in the first preset model aiming at the data form of the target tag set.
In a possible implementation manner, the first computing module 202 is configured to:
inputting the target tag set into a preset coding algorithm in the first preset model so that the preset coding algorithm carries out coding characterization on each tag in the target tag set to obtain a first low-dimensional vector;
inputting the image information into an image coding algorithm in the first preset model so that the image coding algorithm carries out coding characterization on the image information to obtain a second low-dimensional vector;
inputting the first low-dimensional vector and the second low-dimensional vector into a similarity calculation algorithm in the first preset model to calculate cosine similarity of the first low-dimensional vector and the second low-dimensional vector through the similarity calculation algorithm;
And taking the cosine similarity as a similarity value of the image information and the labels in the target label set.
In a possible implementation manner, the second computing module 203 is configured to:
constructing a second basic model for classification based on the full connection layer and a preset classifier;
connecting the full connection layer of the second basic model with a first basic model for acquiring deep semantic information of the image to be identified to obtain an initial model;
training the initial model according to a preset data set to obtain the second preset model.
In a possible implementation manner, the second computing module 203 is configured to:
acquiring first initial parameters of the first basic model and second initial parameters of the second basic model;
acquiring data pairs obtained through different labeling modes according to a preset proportion, constructing a training set according to the data pairs, and training the second basic model in the initial model based on the training set to obtain second target parameters of the second basic model;
updating the second basic model according to the second target parameters, and keeping the first initial parameters of the first basic model unchanged;
Based on the updated second base model and the first base model.
The invention provides an image recognition device, which obtains a similarity value of an image to be recognized and a preset label through encoding, avoids complex interaction between the image and the preset label, determines the number of types of objects to be recognized in the image to be recognized, and determines the number of the object labels according to the number of types.
For specific limitations of the image recognition apparatus, reference may be made to the above limitations of the image recognition method, and no further description is given here. The respective modules in the image recognition apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes non-volatile and/or volatile storage media and internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is for communicating with an external client via a network connection. The computer program, when executed by a processor, performs functions or steps of a server side of an image recognition method based on artificial intelligence.
In one embodiment, a computer device is provided, which may be a client, the internal structure of which may be as shown in FIG. 5. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is for communicating with an external server via a network connection. The computer program is executed by a processor to perform functions or steps of a client side of an artificial intelligence based image recognition method.
In one possible implementation, as shown in fig. 6, an embodiment of the present application provides an electronic device 300, including: comprising a memory 310, a processor 320 and a computer program 311 stored on the memory 310 and executable on the processor 320, the processor 320 implementing, when executing the computer program 311: acquiring image information and source information of an image to be identified, determining a target image type of the image to be identified based on the source information, and determining a target tag set of the image to be identified according to the target image type and preset identification information, wherein the preset identification information comprises different image types and preset tag sets corresponding to the image types; inputting the image information and the target tag set into a first preset model to determine the similarity value of each tag in the image information and the target tag set, and obtaining a similarity set, wherein the first preset model comprises a preset coding algorithm, an image coding algorithm and a similarity calculation algorithm; inputting the image information into a second preset model to obtain the number of types of targets to be identified in the images to be identified, wherein the second preset model comprises a first basic model for obtaining deep semantic information of the images to be identified and a second basic model for classifying; selecting the similarity value with the largest similarity value and the number of the similarity values being the category number from the similarity set, taking the selected similarity value as a target similarity value, taking a label corresponding to the target similarity value as a target label, and completing the identification of the image to be identified based on the target label.
The similarity value of the image to be identified and the preset label is obtained through encoding, complex interaction between the image and the preset label is avoided, the number of types of objects to be identified in the image to be identified is determined, and the number of the object labels is determined according to the number of types.
In one possible implementation, as shown in fig. 7, an embodiment of the present application provides a computer-readable storage medium 400 having a computer program 411 stored thereon, the computer program 411, when executed by a processor, implementing: acquiring image information and source information of an image to be identified, determining a target image type of the image to be identified based on the source information, and determining a target tag set of the image to be identified according to the target image type and preset identification information, wherein the preset identification information comprises different image types and preset tag sets corresponding to the image types; inputting the image information and the target tag set into a first preset model to determine the similarity value of each tag in the image information and the target tag set, and obtaining a similarity set, wherein the first preset model comprises a preset coding algorithm, an image coding algorithm and a similarity calculation algorithm; inputting the image information into a second preset model to obtain the number of types of targets to be identified in the images to be identified, wherein the second preset model comprises a first basic model for obtaining deep semantic information of the images to be identified and a second basic model for classifying; selecting the similarity value with the largest similarity value and the number of the similarity values being the category number from the similarity set, taking the selected similarity value as a target similarity value, taking a label corresponding to the target similarity value as a target label, and completing the identification of the image to be identified based on the target label.
The similarity value of the image to be identified and the preset label is obtained through encoding, complex interaction between the image and the preset label is avoided, the number of types of objects to be identified in the image to be identified is determined, and the number of the object labels is determined according to the number of types.
The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium may be, for example, but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
It will be appreciated by those of ordinary skill in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be centralized on a single computing device, or distributed over a network of computing devices, or they may alternatively be implemented in program code executable by a computer device, such that they are stored in a memory device and executed by the computing device, or they may be separately fabricated as individual integrated circuit modules, or multiple modules or steps within them may be fabricated as a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of numerous obvious changes, rearrangements and substitutions without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.
The foregoing disclosure is illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.

Claims (10)

1. An image recognition method based on artificial intelligence, comprising:
acquiring image information and source information of an image to be identified, determining a target image type of the image to be identified based on the source information, and determining a target tag set of the image to be identified according to the target image type and preset identification information, wherein the preset identification information comprises different image types and preset tag sets corresponding to the image types;
inputting the image information and the target tag set into a first preset model to determine the similarity value of each tag in the image information and the target tag set, and obtaining a similarity set, wherein the first preset model comprises a preset coding algorithm, an image coding algorithm and a similarity calculation algorithm;
inputting the image information into a second preset model to obtain the number of types of targets to be identified in the images to be identified, wherein the second preset model comprises a first basic model for obtaining deep semantic information of the images to be identified and a second basic model for classifying;
Selecting the similarity value with the largest similarity value and the number of the similarity values being the category number from the similarity set, taking the selected similarity value as a target similarity value, taking a label corresponding to the target similarity value as a target label, and completing the identification of the image to be identified based on the target label.
2. The artificial intelligence based image recognition method of claim 1, wherein the step of determining the image type of the image to be recognized based on the source information comprises:
determining identification information of a target data acquisition device for acquiring the image to be identified based on the source information;
determining a data acquisition area corresponding to the target data acquisition device based on the identification information and the distribution data of the preset data acquisition device, and determining the area type of the data acquisition area according to the functional department to which the data acquisition area belongs;
and taking the region type as the image type of the image to be identified.
3. The image recognition method according to claim 2, wherein the step of determining the region type of the data acquisition region according to the functional department to which the data acquisition region belongs comprises:
When at least two functional departments to which the data acquisition area belongs exist, acquiring target request information, wherein the target request information is used for initiating the step of acquiring the image information and the source information of the image to be identified;
determining account information corresponding to the target request information according to the target request information and log data of an image recognition process;
determining a target function department to which the target request information belongs based on the account information;
and determining the region type of the data acquisition region according to the target functional department.
4. The image recognition method based on artificial intelligence according to claim 1, further comprising, before the step of inputting the image information and the target tag set into a first preset model to obtain a similarity value between the image information and each tag in the target tag set:
acquiring a data form of the target tag set;
and determining a preset coding algorithm in the first preset model aiming at the data form of the target tag set.
5. The image recognition method based on artificial intelligence according to claim 1, wherein the step of inputting the image information and the target tag set into a first preset model to determine a similarity value of each tag in the image information and the target tag set, and obtaining a similarity set, the first preset model including a preset encoding algorithm, an image encoding algorithm and a similarity calculating algorithm includes:
Inputting the target tag set into a preset coding algorithm in the first preset model so that the preset coding algorithm carries out coding characterization on each tag in the target tag set to obtain a first low-dimensional vector;
inputting the image information into an image coding algorithm in the first preset model so that the image coding algorithm carries out coding characterization on the image information to obtain a second low-dimensional vector;
inputting the first low-dimensional vector and the second low-dimensional vector into a similarity calculation algorithm in the first preset model to calculate cosine similarity of the first low-dimensional vector and the second low-dimensional vector through the similarity calculation algorithm;
and taking the cosine similarity as a similarity value of the image information and the labels in the target label set.
6. The image recognition method based on artificial intelligence according to claim 1, wherein before the step of inputting the image information into a second preset model to obtain the number of kinds of objects to be recognized in the image to be recognized, the second preset model includes a first basic model for obtaining deep semantic information of the image to be recognized and a second basic model for classification, the method further includes:
Constructing a second basic model for classification based on the full connection layer and a preset classifier;
connecting the full connection layer of the second basic model with a first basic model for acquiring deep semantic information of the image to be identified to obtain an initial model;
training the initial model according to a preset data set to obtain the second preset model.
7. The artificial intelligence based image recognition method of claim 6, wherein the training the initial model according to a preset data set to obtain the second preset model comprises:
acquiring first initial parameters of the first basic model and second initial parameters of the second basic model;
acquiring data pairs obtained through different labeling modes according to a preset proportion, constructing a training set according to the data pairs, and training the second basic model in the initial model based on the training set to obtain second target parameters of the second basic model;
updating the second basic model according to the second target parameters, and keeping the first initial parameters of the first basic model unchanged;
and obtaining the second preset model based on the updated second basic model and the first basic model.
8. An artificial intelligence based image recognition device, the device comprising:
the data acquisition module is used for acquiring image information and source information of an image to be identified, determining a target image type of the image to be identified based on the source information, and determining a target tag set of the image to be identified according to the target image type and preset identification information, wherein the preset identification information comprises different image types and preset tag sets corresponding to the image types;
the first computing module is used for inputting the image information and the target tag set into a first preset model to determine the similarity value of each tag in the image information and the target tag set, so as to obtain a similarity set, wherein the first preset model comprises a preset encoding algorithm, an image encoding algorithm and a similarity computing algorithm;
the second calculation module is used for inputting the image information into a second preset model to obtain the number of types of targets to be identified in the images to be identified, and the second preset model comprises a first basic model used for obtaining deep semantic information of the images to be identified and a second basic model used for classifying;
the identification module is used for selecting the similarity value with the maximum similarity value and the maximum number of the similarity values from the similarity set, taking the selected similarity value as a target similarity value, taking a label corresponding to the target similarity value as a target label, and completing the identification of the image to be identified based on the target label.
9. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the artificial intelligence based image recognition method of any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, performs the steps of the artificial intelligence based image recognition method according to any one of claims 1 to 7.
CN202310798703.2A 2023-06-30 2023-06-30 Image recognition method, device, equipment and medium based on artificial intelligence Pending CN116863116A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310798703.2A CN116863116A (en) 2023-06-30 2023-06-30 Image recognition method, device, equipment and medium based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310798703.2A CN116863116A (en) 2023-06-30 2023-06-30 Image recognition method, device, equipment and medium based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN116863116A true CN116863116A (en) 2023-10-10

Family

ID=88233374

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310798703.2A Pending CN116863116A (en) 2023-06-30 2023-06-30 Image recognition method, device, equipment and medium based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN116863116A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117373016A (en) * 2023-10-20 2024-01-09 农芯(南京)智慧农业研究院有限公司 Tobacco leaf baking state judging method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117373016A (en) * 2023-10-20 2024-01-09 农芯(南京)智慧农业研究院有限公司 Tobacco leaf baking state judging method, device, equipment and storage medium
CN117373016B (en) * 2023-10-20 2024-04-30 农芯(南京)智慧农业研究院有限公司 Tobacco leaf baking state judging method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109104620B (en) Short video recommendation method and device and readable medium
US11341186B2 (en) Cognitive video and audio search aggregation
WO2021139191A1 (en) Method for data labeling and apparatus for data labeling
CN111324774B (en) Video duplicate removal method and device
CN108776808A (en) A kind of method and apparatus for detecting ladle corrosion defect
US10679054B2 (en) Object cognitive identification solution
US20170185913A1 (en) System and method for comparing training data with test data
CN110708285B (en) Flow monitoring method, device, medium and electronic equipment
CN111522979B (en) Picture sorting recommendation method and device, electronic equipment and storage medium
CN116863116A (en) Image recognition method, device, equipment and medium based on artificial intelligence
CN112508078A (en) Image multitask multi-label identification method, system, equipment and medium
CN115905528A (en) Event multi-label classification method and device with time sequence characteristics and electronic equipment
CN114418124A (en) Method, device, equipment and storage medium for generating graph neural network model
CN114372532A (en) Method, device, equipment, medium and product for determining label marking quality
CN113850077A (en) Topic identification method, device, server and medium based on artificial intelligence
CN114329004A (en) Digital fingerprint generation method, digital fingerprint generation device, data push method, data push device and storage medium
CN116578925B (en) Behavior prediction method, device and storage medium based on feature images
CN113705293A (en) Image scene recognition method, device, equipment and readable storage medium
CN114241411B (en) Counting model processing method and device based on target detection and computer equipment
CN115116080A (en) Table analysis method and device, electronic equipment and storage medium
CN113792569A (en) Object identification method and device, electronic equipment and readable medium
CN113723093B (en) Personnel management policy recommendation method and device, computer equipment and storage medium
CN113723554B (en) Model scheduling method, device, electronic equipment and computer readable storage medium
CN113850207B (en) Micro-expression classification method and device based on artificial intelligence, electronic equipment and medium
WO2024088031A1 (en) Data acquisition method and apparatus, and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination