CN115546566A - Intelligent body interaction method, device, equipment and storage medium based on article identification - Google Patents

Intelligent body interaction method, device, equipment and storage medium based on article identification Download PDF

Info

Publication number
CN115546566A
CN115546566A CN202211483266.7A CN202211483266A CN115546566A CN 115546566 A CN115546566 A CN 115546566A CN 202211483266 A CN202211483266 A CN 202211483266A CN 115546566 A CN115546566 A CN 115546566A
Authority
CN
China
Prior art keywords
article
result
identification
intelligent agent
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211483266.7A
Other languages
Chinese (zh)
Inventor
陈阳
石翔飞
舒会
顾宇鑫
鲍方毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Xinzhi Cosmos Technology Co ltd
Original Assignee
Hangzhou Xinzhi Cosmos Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Xinzhi Cosmos Technology Co ltd filed Critical Hangzhou Xinzhi Cosmos Technology Co ltd
Priority to CN202211483266.7A priority Critical patent/CN115546566A/en
Publication of CN115546566A publication Critical patent/CN115546566A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The embodiment of the invention discloses an intelligent agent interaction method, device, equipment and storage medium based on article identification. The method comprises the following steps: acquiring an article image; carrying out target detection on the article image to obtain a detection result; identifying the type of the article on the detection result to obtain an article identification result; and injecting the intelligent agent according to the article identification result to obtain the article injected into the intelligent agent, so that the user interacts with the article injected into the intelligent agent. By implementing the method of the embodiment of the invention, the object can be really stood at the view angle of the object to interact with the user.

Description

Intelligent body interaction method, device, equipment and storage medium based on article identification
Technical Field
The invention relates to AR technology, in particular to an intelligent agent interaction method, device, equipment and storage medium based on article identification.
Background
In an animated movie, the article can remember what activities you have participated in together, what you like, give you advice when you are hesitant, and soothe you in their way when you are refractory. In real life, two key problems need to be solved to realize the function, namely, how to identify and interact with real objects by means of the existing hardware; secondly, how to make the article really stand at the view point of the article to carry out open domain conversation with people, namely, the conversation is not limited to a specific professional field, and the existing examples of the article which can participate in the actual life are interactive intelligent bodies such as customer service robots and the like.
Currently, common interactive agents such as customer service robots are generally directly embedded into mobile phone applications and web pages to interact with users. The customer service robot is usually realized by means of a knowledge graph, a natural language processing technology based on intention recognition and the like, so that the customer service robot can make a response within a limited knowledge range, and cannot respond after exceeding the limited knowledge range, namely, open domain conversation cannot be carried out.
Therefore, there is a need to design a new method for interacting with a user when an object is actually standing at the view of the object.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an intelligent agent interaction method, an intelligent agent interaction device, intelligent agent interaction equipment and a storage medium based on article identification.
In order to realize the purpose, the invention adopts the following technical scheme: an intelligent agent interaction method based on article identification comprises the following steps:
acquiring an article image;
carrying out target detection on the article image to obtain a detection result;
identifying the type of the article on the detection result to obtain an article identification result;
and injecting the intelligent agent according to the article identification result to obtain the article injected into the intelligent agent, so that the user interacts with the article injected into the intelligent agent.
The further technical scheme is as follows: the target detection of the article image to obtain a detection result includes:
and carrying out target detection on the article image by adopting a target detection model so as to obtain a detection result.
The further technical scheme is as follows: the target detection model is realized by combining a non-maximum suppression algorithm through modes of model pruning, channel pruning and Anchor reduction on the basis of a YOLOV5 model.
The further technical scheme is as follows: the identifying the type of the article to the detection result to obtain the article identification result includes:
extracting visual features of the articles from the detection result to obtain an extraction result;
performing feature retrieval according to the extraction result to obtain a retrieval result;
and determining the type of the article according to the retrieval result to obtain an article identification result.
The further technical scheme is as follows: the extracting the visual features of the article to the detection result to obtain an extraction result comprises:
and extracting the article characteristics of the detection result by adopting an identification network based on MobileNetv2 to obtain an extraction result.
The further technical scheme is as follows: the performing feature retrieval according to the extraction result to obtain a retrieval result includes:
and establishing a feature retrieval engine through an open source FAISS framework, selecting a product quantization optimization index, and retrieving a target article similar to the extraction result to obtain a retrieval result.
The further technical scheme is as follows: determining the type of the article according to the retrieval result to obtain an article identification result, wherein the method comprises the following steps:
screening the search results meeting the requirements to obtain screening results;
and eliminating the boundary box by adopting a non-maximum suppression algorithm on the screening result to obtain an article identification result.
The invention also provides an intelligent agent interaction device based on article identification, which comprises:
an image acquisition unit for acquiring an article image;
the object detection unit is used for carrying out object detection on the object image to obtain a detection result;
the identification unit is used for identifying the type of the article on the detection result to obtain an article identification result;
and the injection unit is used for injecting the intelligent agent according to the article identification result so as to obtain the article injected into the intelligent agent, and the user can interact with the article injected into the intelligent agent.
The invention also provides a computer device, which comprises a memory and a processor, wherein the memory is stored with a computer program, and the processor executes the computer program to realize the method.
The invention also provides a storage medium storing a computer program which, when executed by a processor, implements the method described above.
Compared with the prior art, the invention has the beneficial effects that: according to the invention, the object detection and the object type identification are carried out on the object image, and the intelligent body is injected according to the type obtained by identification, so that the object has the function of the intelligent body, and the object really stands at the view angle of the object to interact with the user.
The invention is further described below with reference to the accompanying drawings and specific embodiments.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of an intelligent agent interaction method based on item identification according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of an intelligent agent interaction method based on article identification according to an embodiment of the present invention;
fig. 3 is a schematic sub-flow diagram of an intelligent agent interaction method based on article identification according to an embodiment of the present invention;
FIG. 4 is a sub-flow diagram of an intelligent agent interaction method based on item identification according to an embodiment of the present invention;
FIG. 5 is a schematic flow chart of a target detection model according to an embodiment of the present invention;
FIG. 6 is a schematic block diagram of an intelligent agent interaction device based on item identification according to an embodiment of the present invention;
fig. 7 is a schematic block diagram of an identification unit of an intelligent agent interaction device based on item identification according to an embodiment of the present invention;
FIG. 8 is a schematic block diagram of certain sub-units of an intelligent agent interaction device based on item identification provided by an embodiment of the present invention;
fig. 9 is a schematic block diagram of a computer device provided in an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1 and fig. 2, fig. 1 is a schematic view of an application scenario of an intelligent agent interaction method based on item identification according to an embodiment of the present invention. Fig. 2 is a schematic flow chart of an intelligent agent interaction method based on item identification according to an embodiment of the present invention. The intelligent agent interaction method based on article identification is applied to a server. The server performs data interaction with the terminal, shoots images of articles through the terminal, performs article detection and type identification on the images, and injects corresponding intelligent agents according to the identified types, so that the articles really stand at the view angle of the articles to interact with users.
Fig. 2 is a schematic flow chart of an intelligent agent interaction method based on item identification according to an embodiment of the present invention. As shown in fig. 2, the method includes the following steps S110 to S140.
And S110, acquiring an article image.
In this embodiment, the article image refers to an image formed by shooting an article to be interacted by using a terminal or a device with a camera.
And S120, carrying out target detection on the article image to obtain a detection result.
In this embodiment, the detection result refers to a bounding box and position information formed by detecting the position of the object in the object image.
Specifically, a target detection model is adopted to perform target detection on the article image so as to obtain a detection result.
Specifically, the target detection model is realized by combining a non-maximum suppression algorithm through model pruning, channel pruning and Anchor reduction on the basis of a YOLOV5 model.
In this embodiment, as shown in fig. 5, the input picture obtains a final prediction detection frame through a model backbone network, and specifically, the detection frame prediction is performed after the processing by the model neck module. In the example, a single-stage target detection model with YOLOV5 as a reference is selected; on the basis of YOLOV5, channel pruning is carried out by a mode of model pruning, channel pruning and Anchor reduction based on MobileNetv2 (a lightweight identification network) by respectively adopting 0.5 and 0.33 as the pruning proportion from two dimensions of channel depth and width, so that the time consumption of model reasoning is greatly reduced on the premise of reducing performance loss as much as possible. In an application scene of a product, an object to be detected is often a natural object with a common size, the probability of the occurrence of a large-size object or a small-size object is low, the network structure is properly cut by combining the application scene, only the intermediate features of three different-size features output by the original YOLOV5 are reserved, the Anchor on the maximum feature graph and the minimum feature graph in the feature pyramid is cut, and the model reasoning efficiency is effectively improved by cutting the Anchor. On the premise of ensuring the precision, the model reasoning speed is greatly improved, and finally, a commonly-used NMS (Non-maximum Suppression algorithm, non Max Suppression) is used as a post-processing mode, redundant bounding boxes are eliminated, and the most appropriate article with the highest confidence coefficient is found.
And S130, identifying the type of the article according to the detection result to obtain an article identification result.
In this embodiment, the item identification result refers to a result of type identification of an item.
In an embodiment, referring to fig. 3, the step S130 may include steps S131 to S133.
S131, extracting visual features of the articles according to the detection result to obtain an extraction result.
In this embodiment, all the extraction results are visual features of the article.
Specifically, the object visual features of the detection results are extracted by using a MobileNetv2 identification network, and extraction results are obtained. MobileNetv2 is a very effective feature extractor for device-level object recognition and detection tasks commonly used in the industry, and can still maintain excellent model performance on the premise of less parameters.
In the embodiment, based on the COCO data set and the self-established data set, the self-established data set is subjected to data enhancement operation (rotation, scene mapping, noise and mirror image), and a MobileNetv2 network structure is used as a backbone network of a backhaul to train an object recognition network, so that a feature extraction model with high inference performance and high representation capability is obtained.
Based on the network structure of the MobileNet v2, the network structure is corrected and optimized properly, and the task of identifying the object type is realized.
And S132, performing feature retrieval according to the extraction result to obtain a retrieval result.
In this embodiment, the search result refers to a result formed by comparing the extraction result with the corresponding features of each type.
Specifically, a feature retrieval engine is established through an open source FAISS framework, a product quantization optimization index is selected, and a target object similar to the extraction result is retrieved to obtain a retrieval result.
Specifically, the retrieval result includes several belt-type articles whose features are similar to the extraction result.
Based on a visual feature cluster extracted from a MobileNetv2 network structure, in order to give consideration to the requirements of retrieval speed and accuracy, an inverted list and product quantization are selected as an index strategy, a feature retrieval engine is constructed through an open-source FAISS framework, candidate classification results of topK (K before scoring) are queried through calculating feature similarity in a retrieval stage, and the article class with the most occurrence times in K is selected as a classification result; and if the occurrence times are the same, selecting the category with the highest similarity as a result.
The FAISS is a clustering and similarity search library open to the Facebook AI team, and provides efficient similarity search and clustered vector search capabilities. The method mainly uses an optimized query method of product quantization, and the principle is to decompose an original vector into Cartesian products of a plurality of low-dimensional vectors and quantize the low-dimensional features obtained by decomposition, and the method is far superior to clustering and direct quantization index in performance and efficiency.
And S133, determining the type of the article according to the retrieval result to obtain an article identification result.
In one embodiment, referring to FIG. 4, the step S133 can include steps S1331 to S1332.
And S1331, screening the search results meeting the requirements to obtain a screening result.
In this embodiment, the screening result refers to a search result with feature similarity arranged in top N names, where N may be determined according to actual situations.
And S1332, eliminating the bounding box of the screening result by adopting a non-maximum suppression algorithm to obtain an article identification result.
In this embodiment, a topK sorting rule is designed according to the relationship between the article category and the root category, so as to obtain a label of the most similar picture as a final recognition result, and then, a non-maximum suppression algorithm is used to eliminate redundant bounding boxes, so as to find the most suitable article recognition result with the highest confidence level.
And S140, injecting the intelligent agent according to the article identification result to obtain the article injected into the intelligent agent, so that the user can interact with the article injected into the intelligent agent.
In this embodiment, self-awareness is injected into the item based on the type of the item, allowing the item to stand at its own perspective to interact with the user. A mental frame and base Model (Foundation Model) based on brain elicitation. The base model is a large artificial intelligence model that is trained on a large amount of unlabeled data in a large scale, forming a model that can adapt to a wide range of downstream tasks. Early examples of base models were large Pre-trained language models, including BERT (Bidirectional Encoder retrieval from Transformers, a Pre-trained model proposed by the Google AI institute in 2018 and 10 months), and GPT-3 (generic Pre-trained Transformer 3, an autoregressive language model proposed by the Open AI team some multimodal base models were subsequently produced), including DALL-E (a picture-making model proposed by the Open AI in 1 month 2021), flamino (a visual language model based on small sample learning proposed by the DeepMind team in 4 months 2022) and Florence (a visual base model newly proposed by Microsoft in 11 months 2021). The brain initiative framework uses the base model as a general computing unit, incorporating the design of brain regions (including memory, perception, cognitive, etc.), and can give different "smart" to self-cognizably ".
In particular, the base model is a large artificial intelligence model that is trained on a large amount of unlabeled data, forming a model that can accommodate a wide range of downstream tasks. Early examples of base models were large pre-trained language models, including BERT and GPT-3. Subsequently, a number of multimodal fundamental models were produced, including DALL-E, flamingo, and Florence. The base model is pre-trained by autoregressive based on open-source internet data and the like, using a transform model structure of decoder only (using only a transform encoder module). The brain inspiring mental recognition framework takes a base model as a general computing unit, and constructs a plurality of brain areas: memory brain area, perception brain area, cognitive brain area. Wherein the memory brain area is responsible for storing and using the knowledge, setting, interactive process and the like of the intelligent agent. Includes a long memory module and a short memory module. Each module also includes relationship memory, plot memory, semantic memory, etc. Through the fusion of the base model and the prompt sentence, the memory module can support the precipitation of memory in the interaction process of the user and the intelligent body and recall the memory related to the conversation from the long-term memory.
The perception brain area is mainly responsible for receiving input of visual information and understanding the visual information so as to be used as content materials for subsequent information integration processing. Specifically, the perception brain region firstly maps the visual space to the unified semantic space through an image encoder of a Clip model, and then obtains a symbolization result which can be received by a subsequent process through a multilayer perceptron. And the cognitive brain area receives the symbolic information and forms a composite prompt sentence by combining the object type identification result obtained in the previous step.
In addition, the agent may also have knowledge about the item, such as price, place of purchase, etc. The user may query for information related to the merchandise by interacting with the agent.
When interacting, the intelligent agent can interact with the intelligent agent in a voice or text mode. The user can detect and identify any item that is being photographed by the camera. The self-cognition of the intelligent agent is influenced by the result of the article identification, and the intelligent agent can interact with the user at an article visual angle, so that novel conversation experience is brought to the user; by means of article identification, a merchant can inject key commodity information into an article through a back-end system, and when a user interacts with the article, the key commodity information can be conveyed to the user to guide the user to purchase. The user can carry out conversation interaction with any article nearby, and the article can stand at the view point of the article to carry out open-domain conversation with the user.
For the actual interaction process of the item injection agent with the user, for example, the first step, AR scan: in the AR scan page, the local AI target detection model of the applet will detect the subject object in the camera window at all times. When the user clicks the button of 'connect the different dimension', the conversation can be opened to chat with the AI virtual icon.
The second step: and (4) starting a conversation, wherein the articles in the yellow detection frame in the last step can be subjected to background image matting service to remove a disordered background, and the main body significant image is accurately scratched out and displayed at the front end of the applet. Meanwhile, the article image in the yellow detection frame is finely classified, and in this example, the article image is accurately recognized as a "mouse".
When the session is opened, the service backend configures the personality category (in this case, "do work") of the AI virtual mind, the rarity (in this case, "R") and the background picture of the session, and the facial expressions (eyes, eyebrows, mouth) of the virtual mind according to the rules configured in the background. The session is opened after all resources are ready.
After opening the session, the AI system will open based on the item classification (in this case "mouse") and the random. The user can type or voice input and virtually read chat.
The third step: and sharing, screen capture in a conversation page, and the App automatically generates a poster according to the current conversation content for sharing by a user. The two-dimensional code at the lower right corner in the picture supports other users to enter a general mobilization byte applet after the WeChat App is scanned, and the user can have a conversation with the shared virtual center.
The fourth step: in the search record, the "search record" button is clicked on the lower right corner of the AR scan page, and the page goes to the "search record" page. The exploration log-on page records the last four times of the heart chat history. Clicking the corresponding article picture, and restarting the conversation of the virtual mind.
According to the intelligent body interaction method based on the article identification, the object detection and the article type identification are carried out on the article image, and the intelligent body is injected according to the type obtained by the identification, so that the article has the function of the intelligent body, and the interaction between the article and a user when the article really stands at the view angle of the article is realized.
Fig. 6 is a schematic block diagram of an article identification-based agent interaction device 300 according to an embodiment of the present invention. As shown in fig. 6, the present invention also provides an intelligent agent interaction apparatus 300 based on item identification, corresponding to the above intelligent agent interaction method based on item identification. The intelligent agent interaction device 300 based on item identification comprises a unit for executing the intelligent agent interaction method based on item identification, and the device can be configured in a server. Specifically, referring to fig. 6, the intelligent agent interaction device 300 based on item identification includes an image acquisition unit 301, an object detection unit 302, an identification unit 303, and an injection unit 304.
An image acquisition unit 301 for acquiring an article image; a target detection unit 302, configured to perform target detection on the article image to obtain a detection result; an identifying unit 303, configured to perform item type identification on the detection result to obtain an item identification result; and the injection unit 304 is configured to inject the agent according to the item identification result to obtain an item injected into the agent, so that the user interacts with the item injected into the agent.
In an embodiment, the object detection unit 302 is configured to perform object detection on the article image by using an object detection model to obtain a detection result.
In one embodiment, as shown in fig. 7, the recognition unit 303 includes a feature extraction sub-unit 3031, a feature retrieval sub-unit 3032, and a determination sub-unit 3033.
A feature extraction subunit 3031, configured to perform article visual feature extraction on the detection result to obtain an extraction result; a feature retrieval subunit 3032, configured to perform feature retrieval according to the extraction result to obtain a retrieval result; a determining subunit 3033, configured to determine the type of the item according to the search result, so as to obtain an item identification result.
In an embodiment, the feature extraction subunit 3031 is configured to perform article feature extraction on the detection result by using an identification network based on MobileNetv2 to obtain an extraction result.
The feature retrieval subunit 3032 is configured to establish a feature retrieval engine through an open source FAISS framework, select a product quantization optimization index, and retrieve a target article similar to the extraction result to obtain a retrieval result.
In one embodiment, as shown in fig. 8, the determining subunit 3033 includes a screening module 30331 and a cancellation module 30332.
A screening module 30331, configured to screen a search result that meets the requirement, so as to obtain a screening result; a eliminating module 30332, configured to eliminate the bounding box by using a non-maximum suppression algorithm for the screening result, so as to obtain an item identification result.
It should be noted that, as will be clearly understood by those skilled in the art, the detailed implementation process of the above-mentioned intelligent agent interaction device 300 and each unit based on article identification may refer to the corresponding description in the foregoing method embodiments, and for convenience and brevity of description, no further description is provided herein.
The above-mentioned intelligent agent interaction device 300 based on article identification may be implemented in the form of a computer program which can be run on a computer apparatus as shown in fig. 9.
Referring to fig. 9, fig. 9 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a server, wherein the server may be an independent server or a server cluster composed of a plurality of servers.
Referring to fig. 9, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
The non-volatile storage medium 503 may store an operating system 5031 and computer programs 5032. The computer program 5032 comprises program instructions that, when executed, cause the processor 502 to perform an item identification based agent interaction method.
The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.
The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 can be enabled to perform an article identification-based agent interaction method.
The network interface 505 is used for network communication with other devices. Those skilled in the art will appreciate that the configuration shown in fig. 9 is a block diagram of only a portion of the configuration associated with the present application and does not constitute a limitation of the computer device 500 to which the present application may be applied, and that a particular computer device 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Wherein the processor 502 is configured to run the computer program 5032 stored in the memory to perform the steps of:
acquiring an article image; carrying out target detection on the article image to obtain a detection result; identifying the type of the article on the detection result to obtain an article identification result; and injecting the intelligent agent according to the article identification result to obtain the article injected into the intelligent agent, so that the user interacts with the article injected into the intelligent agent.
In an embodiment, when the processor 502 implements the step of performing the target detection on the article image to obtain the detection result, the following steps are specifically implemented:
and carrying out target detection on the article image by adopting a target detection model to obtain a detection result.
The target detection model is realized by combining a non-maximum suppression algorithm through model pruning, channel pruning and Anchor reduction on the basis of a YOLOV5 model.
In an embodiment, when implementing the step of performing item type identification on the detection result to obtain an item identification result, the processor 502 specifically implements the following steps:
extracting visual features of the articles from the detection result to obtain an extraction result; performing feature retrieval according to the extraction result to obtain a retrieval result; and determining the type of the article according to the retrieval result to obtain an article identification result.
In an embodiment, when the step of performing the visual feature extraction on the detection result to obtain the extraction result is implemented by the processor 502, the following steps are specifically implemented:
and extracting the article characteristics of the detection result by adopting an identification network based on MobileNetv2 to obtain an extraction result.
In an embodiment, when implementing the step of performing feature retrieval according to the extraction result to obtain a retrieval result, the processor 502 specifically implements the following steps:
and establishing a feature retrieval engine through an open source FAISS framework, selecting a product quantization optimization index, and retrieving a target object similar to the extraction result to obtain a retrieval result.
In an embodiment, when the processor 502 implements the step of determining the type of the item according to the retrieval result to obtain the item identification result, the following steps are specifically implemented:
screening the search results meeting the requirements to obtain screening results; and eliminating the bounding box by adopting a non-maximum suppression algorithm on the screening result to obtain an article identification result.
It should be understood that in the embodiment of the present Application, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. The computer program includes program instructions, and the computer program may be stored in a storage medium, which is a computer-readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.
Accordingly, the present invention also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program, wherein the computer program, when executed by a processor, causes the processor to perform the steps of:
acquiring an article image; carrying out target detection on the article image to obtain a detection result; identifying the type of the article on the detection result to obtain an article identification result; and injecting the intelligent agent according to the article identification result to obtain the article injected into the intelligent agent, so that the user interacts with the article injected into the intelligent agent.
In an embodiment, when the processor executes the computer program to implement the step of performing the target detection on the article image to obtain the detection result, the following steps are specifically implemented:
and carrying out target detection on the article image by adopting a target detection model to obtain a detection result.
The target detection model is realized by combining a non-maximum suppression algorithm in a mode of model pruning, channel pruning and Anchor reduction on the basis of a YOLOV5 model.
In an embodiment, when the processor executes the computer program to implement the step of performing the article type identification on the detection result to obtain the article identification result, the following steps are specifically implemented:
extracting visual features of the articles from the detection result to obtain an extraction result; performing feature retrieval according to the extraction result to obtain a retrieval result; and determining the type of the article according to the retrieval result to obtain an article identification result.
In an embodiment, when the processor executes the computer program to implement the step of performing the visual feature extraction on the detection result to obtain the extraction result, the following steps are specifically implemented:
and extracting the article characteristics of the detection result by adopting an identification network based on MobileNetv2 to obtain an extraction result.
In an embodiment, when the processor executes the computer program to implement the step of performing the feature search according to the extraction result to obtain the search result, the following steps are specifically implemented:
and establishing a feature retrieval engine through an open source FAISS framework, selecting a product quantization optimization index, and retrieving a target article similar to the extraction result to obtain a retrieval result.
In an embodiment, when the processor executes the computer program to implement the step of determining the type of the item according to the search result to obtain the item identification result, the processor specifically implements the following steps:
screening the search results meeting the requirements to obtain screening results; and eliminating the boundary box by adopting a non-maximum suppression algorithm on the screening result to obtain an article identification result.
The storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, which can store various computer readable storage media.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.
The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be merged, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. An intelligent agent interaction method based on article identification is characterized by comprising the following steps:
acquiring an article image;
carrying out target detection on the article image to obtain a detection result;
carrying out article type identification on the detection result to obtain an article identification result;
and injecting the intelligent agent according to the article identification result to obtain the article injected into the intelligent agent, so that the user interacts with the article injected into the intelligent agent.
2. The intelligent agent interaction method based on item identification according to claim 1, wherein the performing target detection on the item image to obtain a detection result comprises:
and carrying out target detection on the article image by adopting a target detection model so as to obtain a detection result.
3. The method of claim 2, wherein the target detection model is implemented by combining a non-maximum suppression algorithm through model pruning, channel pruning and Anchor reduction based on a YOLOV5 model.
4. The method for interacting with agent based on item identification according to claim 1, wherein the identifying the type of the item to the detection result to obtain the item identification result comprises:
extracting visual features of the articles from the detection result to obtain an extraction result;
performing feature retrieval according to the extraction result to obtain a retrieval result;
and determining the type of the article according to the retrieval result to obtain an article identification result.
5. The intelligent agent interaction method based on item identification according to claim 4, wherein the item visual feature extraction is performed on the detection result to obtain an extraction result, and the method comprises the following steps:
and extracting the article characteristics of the detection result by adopting an identification network based on MobileNetv2 to obtain an extraction result.
6. The intelligent agent interaction method based on item identification according to claim 4, wherein the performing feature search according to the extraction result to obtain a search result comprises:
and establishing a feature retrieval engine through an open source FAISS framework, selecting a product quantization optimization index, and retrieving a target object similar to the extraction result to obtain a retrieval result.
7. The intelligent agent interaction method based on item identification according to claim 4, wherein the determining the item type according to the retrieval result to obtain an item identification result comprises:
screening the search results meeting the requirements to obtain a screening result;
and eliminating the boundary box by adopting a non-maximum suppression algorithm on the screening result to obtain an article identification result.
8. Intelligent body interaction device based on article discernment, its characterized in that includes:
an image acquisition unit for acquiring an article image;
the target detection unit is used for carrying out target detection on the article image to obtain a detection result;
the identification unit is used for identifying the type of the article on the detection result to obtain an article identification result;
and the injection unit is used for injecting the intelligent agent according to the article identification result so as to obtain the article injected into the intelligent agent, so that the user can interact with the article injected into the intelligent agent.
9. A computer device, characterized in that it comprises a memory, on which a computer program is stored, and a processor, which when executing the computer program, implements the method according to any one of claims 1 to 7.
10. A storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 7.
CN202211483266.7A 2022-11-24 2022-11-24 Intelligent body interaction method, device, equipment and storage medium based on article identification Pending CN115546566A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211483266.7A CN115546566A (en) 2022-11-24 2022-11-24 Intelligent body interaction method, device, equipment and storage medium based on article identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211483266.7A CN115546566A (en) 2022-11-24 2022-11-24 Intelligent body interaction method, device, equipment and storage medium based on article identification

Publications (1)

Publication Number Publication Date
CN115546566A true CN115546566A (en) 2022-12-30

Family

ID=84720066

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211483266.7A Pending CN115546566A (en) 2022-11-24 2022-11-24 Intelligent body interaction method, device, equipment and storage medium based on article identification

Country Status (1)

Country Link
CN (1) CN115546566A (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180165518A1 (en) * 2016-12-12 2018-06-14 X Development Llc Object recognition tool
CN110084253A (en) * 2019-05-05 2019-08-02 厦门美图之家科技有限公司 A method of generating object detection model
WO2020164282A1 (en) * 2019-02-14 2020-08-20 平安科技(深圳)有限公司 Yolo-based image target recognition method and apparatus, electronic device, and storage medium
CN111739525A (en) * 2019-03-25 2020-10-02 本田技研工业株式会社 Agent device, control method for agent device, and storage medium
US20210089040A1 (en) * 2016-02-29 2021-03-25 AI Incorporated Obstacle recognition method for autonomous robots
CN113674341A (en) * 2021-08-20 2021-11-19 深圳技术大学 Robot visual identification and positioning method, intelligent terminal and storage medium
EP3933698A1 (en) * 2020-07-03 2022-01-05 Honda Research Institute Europe GmbH Method and control unit for generating stylized motion of an object, such as a robot or a virtual avatar
CN113963197A (en) * 2021-09-29 2022-01-21 北京百度网讯科技有限公司 Image recognition method and device, electronic equipment and readable storage medium
CN114049557A (en) * 2021-11-10 2022-02-15 中国天楹股份有限公司 Garbage sorting robot visual identification method based on deep learning
CN114494857A (en) * 2021-12-30 2022-05-13 中航华东光电(上海)有限公司 Indoor target object identification and distance measurement method based on machine vision
CN114821272A (en) * 2022-06-28 2022-07-29 上海蜜度信息技术有限公司 Image recognition method, image recognition system, image recognition medium, electronic device, and target detection model
CN114898054A (en) * 2022-04-12 2022-08-12 上海交通大学 Visual positioning method and system
CN115302411A (en) * 2022-05-05 2022-11-08 长沙矿冶研究院有限责任公司 Surface cleaning system based on image recognition and control method thereof

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210089040A1 (en) * 2016-02-29 2021-03-25 AI Incorporated Obstacle recognition method for autonomous robots
US20180165518A1 (en) * 2016-12-12 2018-06-14 X Development Llc Object recognition tool
WO2020164282A1 (en) * 2019-02-14 2020-08-20 平安科技(深圳)有限公司 Yolo-based image target recognition method and apparatus, electronic device, and storage medium
CN111739525A (en) * 2019-03-25 2020-10-02 本田技研工业株式会社 Agent device, control method for agent device, and storage medium
CN110084253A (en) * 2019-05-05 2019-08-02 厦门美图之家科技有限公司 A method of generating object detection model
EP3933698A1 (en) * 2020-07-03 2022-01-05 Honda Research Institute Europe GmbH Method and control unit for generating stylized motion of an object, such as a robot or a virtual avatar
CN113674341A (en) * 2021-08-20 2021-11-19 深圳技术大学 Robot visual identification and positioning method, intelligent terminal and storage medium
CN113963197A (en) * 2021-09-29 2022-01-21 北京百度网讯科技有限公司 Image recognition method and device, electronic equipment and readable storage medium
CN114049557A (en) * 2021-11-10 2022-02-15 中国天楹股份有限公司 Garbage sorting robot visual identification method based on deep learning
CN114494857A (en) * 2021-12-30 2022-05-13 中航华东光电(上海)有限公司 Indoor target object identification and distance measurement method based on machine vision
CN114898054A (en) * 2022-04-12 2022-08-12 上海交通大学 Visual positioning method and system
CN115302411A (en) * 2022-05-05 2022-11-08 长沙矿冶研究院有限责任公司 Surface cleaning system based on image recognition and control method thereof
CN114821272A (en) * 2022-06-28 2022-07-29 上海蜜度信息技术有限公司 Image recognition method, image recognition system, image recognition medium, electronic device, and target detection model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
MINDVERSE心识宇宙: "全新AI角色生成引擎-MindOS发布", 《微信公众号:MINDVERSE心识宇宙》 *
于涛等: "基于隐Markov模型的图像方位识别", 《东北大学学报(自然科学版)》 *
心识宇宙MINDVERSE: "因为一杯水得罪了一个老板!真的栓Q家人吗!", 《抖音小视频》 *
陈阳: "双机器人协同控制研究综述", 《计算机系统应用》 *

Similar Documents

Publication Publication Date Title
US9965717B2 (en) Learning image representation by distilling from multi-task networks
JP6662876B2 (en) Avatar selection mechanism
CN110209897B (en) Intelligent dialogue method, device, storage medium and equipment
CN114556333A (en) Smart camera enabled by assistant system
US9754585B2 (en) Crowdsourced, grounded language for intent modeling in conversational interfaces
US8131750B2 (en) Real-time annotator
US9798949B1 (en) Region selection for image match
CN113656582B (en) Training method of neural network model, image retrieval method, device and medium
CN104537341B (en) Face picture information getting method and device
CN110580516B (en) Interaction method and device based on intelligent robot
US20210185273A1 (en) Personalized Automatic Video Cropping
US10770072B2 (en) Cognitive triggering of human interaction strategies to facilitate collaboration, productivity, and learning
CN113806588B (en) Method and device for searching video
CN112912873A (en) Dynamically suppressing query replies in a search
CN114341839A (en) Interactive visual search engine
CN110648170A (en) Article recommendation method and related device
CN114268747A (en) Interview service processing method based on virtual digital people and related device
CN111383138B (en) Restaurant data processing method, device, computer equipment and storage medium
CN110910898B (en) Voice information processing method and device
CN116501960B (en) Content retrieval method, device, equipment and medium
CN112528004A (en) Voice interaction method, voice interaction device, electronic equipment, medium and computer program product
CN112446214A (en) Method, device and equipment for generating advertisement keywords and storage medium
CN107688623A (en) A kind of search method in kind, device, equipment and storage medium
CN113868453B (en) Object recommendation method and device
US20210166685A1 (en) Speech processing apparatus and speech processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination