CN112464894A

CN112464894A - Interaction method and device and computer equipment

Info

Publication number: CN112464894A
Application number: CN202011474943.XA
Authority: CN
Inventors: 顾在旺; 程骏; 庞建新
Original assignee: Ubtech Robotics Corp
Current assignee: Ubtech Robotics Corp
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2021-03-09
Anticipated expiration: 2040-12-14
Also published as: CN112464894B

Abstract

The embodiment of the invention discloses an interaction method, an interaction device and computer equipment, wherein the interaction method comprises the following steps: outputting prompt information of an object to be identified; calling a corresponding object recognition sub-model according to the category of the object to be recognized; and inputting the initial image obtained by shooting into the object recognition sub-model, and detecting whether the initial image contains the object to be recognized. According to the interaction scheme provided by the embodiment, the corresponding object recognition sub-model is called according to the category of the object to be recognized, and the shot and acquired initial image is input into the called object recognition sub-model, so that whether the initial image contains the object to be recognized can be detected, and therefore the process of recognizing the object to be recognized by using the object recognition sub-model of the corresponding category to recognize the object to be recognized is completed. Therefore, the calculation amount of the objects of the corresponding types during the recognition of the pictures is greatly reduced, the efficiency of recognizing the objects by the pictures is improved, and the whole process is simplified.

Description

Interaction method and device and computer equipment

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an interaction method, an interaction device, and a computer device.

Background

In recent years, with the rapid development of Artificial Intelligence (AI), a number of AI-based applications have been developed, including web-side learning. Specifically, the method for learning things by looking at the picture is that a user opens a browser and inputs a website for interaction. The application will give a prompt word, such as a mouse, from the system thesaurus, and then the user will start to search for the object mouse, if the user finds the object, the user will display the found object mouse in front of the screen camera, and the application will make a real-time judgment by calling the picture captured by the camera: whether the object in the picture is a prompt word given at the beginning of the application. If the algorithm judges that the picture captured by the camera is the keyword given by the initial picture, the user is judged to find the object, a word is continuously given from the system word stock, and the user has a rushing interest. Meanwhile, the application and the game can enable the user to share the score of the user on the social network site, and more ginseng comes in. Therefore, a detection algorithm with high accuracy and multiple identification types becomes a key for the application of the picture recognition objects.

The method for recognizing the object by using the image is that an application first starts a camera, then transmits an image captured by the camera into the application, and then calls an object detection algorithm to output the object class and coordinate information in the image. The application then performs further processing on this information, for example, determining whether it is consistent with the keyword from which the application started, and thus continuing the subsequent determination.

In order to enable the user to know the object by the picture to have a better user experience, the setting algorithm can identify the types as much as possible, so that the user can choose to skip when the object is not around. In order to obtain a better training result, the depth of the model generally needs to be increased, and one problem caused by the increase of the depth of the model is that the prediction time of the model is increased, the waiting time of a user is increased, and the recognition accuracy is reduced.

Therefore, the prior application of the image recognition object has the technical problems of low recognition accuracy and long recognition waiting time.

Disclosure of Invention

The embodiment of the disclosure provides an interaction method, an interaction device and computer equipment, so as to solve at least part of technical problems.

In a first aspect, an embodiment of the present disclosure provides an interaction method, where the method includes:

outputting prompt information of an object to be identified;

calling a corresponding object recognition sub-model according to the category of the object to be recognized;

and inputting the initial image obtained by shooting into the object recognition sub-model, and detecting whether the initial image contains the object to be recognized.

According to a specific embodiment of the present disclosure, before the step of outputting the prompt information of the object to be recognized, the method further includes:

acquiring a basic image of a current scene;

identifying feature data of all objects contained in the basic image, wherein the object to be identified is any one of the all objects;

and correspondingly storing the prompt information and the category of each object.

According to a specific embodiment of the present disclosure, the step of inputting the initial image obtained by shooting into the object recognition sub-model and detecting whether the initial image includes the object to be recognized includes:

inputting the initial image into the object recognition sub-model, and acquiring a central point position thermodynamic diagram and an object height and width characteristic diagram corresponding to the initial image;

searching for a candidate object matched with the central point position thermodynamic diagrams corresponding to the initial image according to the prestored central point position thermodynamic diagrams of all the objects;

judging whether the height and width characteristic diagram of the candidate object is matched with the object height and width characteristic diagram corresponding to the initial image;

and if the height-width feature map of the candidate object is matched with the object height-width feature map corresponding to the initial image, determining that the initial image contains the object to be identified.

According to a specific embodiment of the present disclosure, the step of searching for the candidate object matching the central point position thermodynamic diagrams corresponding to the initial image according to the central point position thermodynamic diagrams of all objects stored in advance includes:

traversing each pixel point in the central point position thermodynamic diagram corresponding to the initial image according to the central point position thermodynamic diagrams of all the objects, and determining the confidence coefficient of the central point position thermodynamic diagram of each object;

finding out the candidate object with the highest confidence coefficient;

and if the confidence coefficient of the candidate object is higher than a preset threshold value, determining that the candidate object is a matched candidate object.

According to a specific embodiment of the present disclosure, the step of inputting the initial image into the object recognition sub-model and obtaining a central point position thermodynamic diagram and an object height and width characteristic diagram corresponding to the initial image includes:

carrying out standardization preprocessing on the initial image;

coding the initial image after the standardization preprocessing to obtain a corresponding characteristic matrix;

and decoding the feature matrix to obtain a central point position thermodynamic diagram and an object height and width feature diagram corresponding to the initial image.

According to a specific embodiment of the present disclosure, the step of subjecting the initial image to a normalization preprocessing includes:

cutting the initial image into a preset size;

and normalizing the cut initial image.

According to a specific embodiment of the present disclosure, the object recognition sub-model is obtained in advance by:

acquiring a preset number of object sample images of each category;

and respectively inputting the object sample images of all categories into a basic neural network, and training to obtain the object identification submodels of corresponding categories.

According to a specific embodiment of the present disclosure, after the step of detecting whether the object to be recognized is included in the initial image, the method further includes:

and if the initial image contains the object to be recognized, determining that the object to be recognized is recognized successfully.

In a second aspect, an embodiment of the present disclosure provides an interaction apparatus, including:

the output module is used for outputting prompt information of the object to be recognized;

the calling module is used for calling the corresponding object recognition submodel according to the category of the object to be recognized;

and the detection module is used for inputting the initial image obtained by shooting into the object recognition sub-model and detecting whether the initial image contains the object to be recognized.

In a third aspect, the disclosed embodiments provide a computer device, including a memory and a processor, where the memory is connected to the processor, the memory is used for storing a computer program, and the processor runs the computer program to make the computer device execute the interaction method of any one of the first aspect.

In a fourth aspect, the present disclosure provides a computer-readable storage medium storing a computer program for use in the computer device according to the third aspect.

According to the interaction method, the interaction device and the computer device provided by the embodiment of the disclosure, the object recognition sub-models for recognizing different types of objects are pre-loaded in the computer device. When interaction is carried out, prompt information of an object to be recognized is output firstly, a corresponding object recognition sub-model can be called according to the category of the object to be recognized, the shot and obtained initial image is input into the called object recognition sub-model, whether the initial image contains the object to be recognized or not can be detected, and therefore the interaction process of recognizing the object of the corresponding category by using the object recognition sub-model of the corresponding category is completed. Therefore, the calculation amount of the objects corresponding to the multiple classes for picture viewing identification is greatly reduced, the interaction efficiency is improved, and the overall process is simplified.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings required to be used in the embodiments will be briefly described below, and it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope of the present invention. Like components are numbered similarly in the various figures.

Fig. 1 is a flowchart illustrating an interaction method provided by an embodiment of the present disclosure;

FIG. 2 is a partial flow diagram illustrating an interaction method provided by an embodiment of the present disclosure;

fig. 3 and fig. 4 are schematic diagrams illustrating features involved in an interaction method provided by an embodiment of the present disclosure;

FIG. 5 is a process diagram illustrating an interaction method provided by an embodiment of the present disclosure;

fig. 6 shows a block diagram of an interaction device according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present invention, are only intended to indicate specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.

Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the present invention belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments of the present invention.

The picture recognition application should include the following two points: the identification accuracy is high, and the identification types are multiple. However, the conventional art-by-picture algorithm directly collects a training data set and uses a common target detection algorithm for training. However, the application of these pictures has the following problems:

1. the recognition accuracy is not high. Since there is a good user experience for the image recognition application, the algorithm can generally identify as many categories as possible, so that the user can choose to skip when the user does not have the object. But such a large number of objects is not an easy matter for training the model. If a better training result is desired, the depth of the model is generally increased, and one problem caused by the increase of the depth of the model is that the prediction time of the model is increased, the waiting time of a user is increased, and therefore poor experience is brought to the user.

2. The object to be recognized needs to be updated regularly in order to keep the user as much as possible. Then, the algorithm model needs to be retrained each time the object identification type is updated, and as the object type increases, the difficulty of model training and the training time increase in a geometric level.

3. Common object detection models are large, occupied storage space and occupied computing resources during operation are large, and because the algorithm divides a complete large model into a plurality of sub models, the object detection models need to be optimized, the size of the models is reduced, and the occupied resources during model operation are reduced. In order to detect the position Of an object, a conventional object detection algorithm first extracts features from an image, then each pixel point on the features is used as an Anchor point Anchor, each pixel point generates a plurality Of rectangular frames (generally set to 9) with different length ratios, and then Regions Of Interest (ROI) represented by all the rectangular frames are placed into a classifier for training and judgment, so as to obtain the category and position information Of the object to be detected in the image. However, this method has a disadvantage in that the length-width ratio of the rectangular frame needs to be manually set. The number of the rectangular frames is small, so that object detection omission can be caused, if the number of the rectangular frames is too large, the calculated amount is large, the detection time is too long, and the user experience is influenced.

Example 1

Referring to fig. 1, a schematic flowchart of an interaction method provided in the embodiment of the present disclosure is shown. As shown in fig. 1, the method mainly comprises the following steps:

s101, outputting prompt information of an object to be identified;

the interaction method provided by the embodiment can be applied to an image recognition application of a terminal or a webpage end, for example, an image recognition game application. The implementation of the entire scheme will be described below mainly from the perspective of application to a computer device.

Identification information associated with all identifiable objects, such as object names, categories, images, colors, shapes and other information, is stored in the computer device in advance. When an interaction process is started, the computer device searches for an object used for participating in interaction, which is defined as an object to be recognized, the object to be recognized is any one of all objects capable of interacting, and all objects capable of interacting are generally objects existing in a current scene.

The computer outputs prompt information of the object to be recognized, which is used for prompting the user about relevant characteristic information of the object to be recognized, and the prompt information is usually at least one of identification information of the object, such as the name of the object. The method for outputting the prompt information of the object to be recognized by the computer device may be voice playing and image displaying, and may also be other methods, without limitation.

S102, calling a corresponding object recognition sub-model according to the category of the object to be recognized;

the computer device is pre-loaded with a plurality of object recognition sub-models, the plurality of object recognition sub-models can be respectively used for recognizing different classes of objects, the classes involved in the method can comprise fruits, office supplies, plants, furniture, stationery, animals, flowers, vehicles and the like, and each class can contain dozens or hundreds of objects. Each object identifying sub-model is a neural network which is obtained through training and is provided with objects of the corresponding category, and the object identifying sub-models can be light-weight small models and are used for identifying various objects in one type. It should be noted that the object recognition submodels described herein are only used to distinguish an overall model containing multiple classes of object recognition submodels, and each submodel can be invoked and operated independently.

After the computer equipment outputs the prompt information of the object to be recognized, the corresponding object recognition sub-model can be called according to the category of the object to be recognized. For example, if the output prompt information of the object to be recognized is "apple", the category corresponding to the object to be recognized is fruit, and the computer device calls the object recognition sub-model corresponding to the fruit.

S103, inputting the initial image obtained by shooting into the object recognition sub-model, and detecting whether the initial image contains the object to be recognized.

The computer equipment inputs the acquired initial image into the called object recognizing submodel so that the object recognizing submodel detects whether the object to be recognized is included according to the initial image, the detection process mainly comprises feature detection, namely, whether feature information matched with the object to be recognized exists in the initial image is detected, and if the feature information exists, the object to be recognized can be determined to be detected.

According to a specific embodiment of the present disclosure, after the step of detecting whether the object to be recognized is included in the initial image, the method may further include:

In an image recognition process, if the initial image is detected to contain the object to be recognized, the result of the image recognition is determined to be successful, otherwise, the result of the image recognition is determined to be failed.

According to a specific embodiment of the present disclosure, before the step of outputting the prompt information of the object to be recognized, the method may further include:

acquiring a basic image of a current scene;

The present embodiment defines an interactive early preparation process. A computer device first acquires a base image of a current scene, such as an image of a current room. And then identifying the basic image of the current scene, identifying all objects contained in the basic image, and correspondingly storing the prompt information and the types of all the objects contained in the basic image. In this way, the computer device can perform an interaction process with any one of the objects as the object to be recognized.

The images in the current scene are preferentially collected, all objects contained in the images are identified, the objects to be identified can be selected from all the objects contained in the current scene to participate in the interaction process, and the success rate and convenience of interaction are guaranteed.

The interaction method provided by the embodiment of the disclosure mainly comprises the following steps: the application program firstly initializes N trained object recognition submodels; the application then randomly generates a word, such as a mouse, from within the system lexicon. By using an interaction algorithm, firstly, which sub-category the words generated by the system belong to is judged, and then, the object identification sub-type corresponding to the sub-category is called. The algorithm continuously processes the images captured by the camera and gives the result of the algorithm identifying these images, mainly identifying which category the object belongs to, and the coordinates of this object frame. And comparing the result obtained by the background algorithm with the word generated at the beginning of the application program, and if the word exists in the result given by the background, indicating that the image of the object searched by the user is correct and the progress is close to the next stage.

In the interaction method provided by the embodiment of the present disclosure, the object recognition sub-models for recognizing different types of objects are pre-loaded in the computer device. When interaction is carried out, prompt information of an object to be recognized is output firstly, a corresponding object recognition sub-model can be called according to the category of the object to be recognized, the shot and obtained initial image is input into the called object recognition sub-model, whether the initial image contains the object to be recognized or not can be detected, and therefore the interaction process of recognizing the object of the corresponding category by using the object recognition sub-model of the corresponding category is completed. Therefore, the calculation amount of the objects corresponding to the multiple classes for picture viewing identification is greatly simplified, the interaction efficiency is improved, and the overall process is simplified.

On the basis of the above embodiment, the embodiment further defines the process of detecting the object in the image by the object recognition sub-model, and the detection process of the scheme is different from the existing object detection algorithm in that the existing object detection algorithm needs to preset a plurality of object detection frame anchor points, and the setting of the object detection frame affects the accuracy and the recognition speed of the object recognition. The object detection method provided by the embodiment does not depend on the setting of the object detection anchors, so that the condition that the number and the length-width ratio of the anchors are set by experience is avoided.

Specifically, as shown in fig. 2, the step of inputting the initial image obtained by shooting into the object recognition sub-model and detecting whether the initial image includes the object to be recognized may specifically include:

s201, inputting the initial image into the object recognition sub-model, and acquiring a central point position thermodynamic diagram and an object height and width characteristic diagram corresponding to the initial image;

in the embodiment, the initial image is input into the object recognition sub-model corresponding to the category, and the thermodynamic diagram of the central point position and the height and width characteristic diagram of the object corresponding to the initial image are obtained, so that the characteristics of the object in the image can be accurately represented by comparing the two characteristic diagrams.

The central point position thermodynamic diagram is used for representing the probability that any point in the image is the central point of the object frame, and the value is (0-1). The probability that any point may be the center point of the object frame is high, the numerical value in the thermodynamic diagram at the position of the center point is larger, and the color is darker, and conversely, the probability is lower, the numerical value in the thermodynamic diagram at the position of the center point is smaller, and the color is lighter.

The object height and width feature map means that when each pixel point in the image may be a central point of the object detection frame, the length and width of the corresponding object detection frame are reflected in the region of the feature map.

Further, the step of inputting the initial image into the object recognition sub-model to obtain a central point position thermodynamic diagram and an object height-width characteristic diagram corresponding to the initial image may include:

carrying out standardization preprocessing on the initial image;

Considering that the initial image is an image captured by a camera, the size of the image captured by different cameras is different, for example, 640 × 480, 1920 × 1080, and is determined according to the resolution of the camera. In order to ensure the accuracy of subsequent model detection, the initial image is subjected to standardized preprocessing and processed into an image meeting the preset specification.

In a specific implementation, the step of performing normalization preprocessing on the initial image may include:

cutting the initial image into a preset size;

and normalizing the cut initial image.

The original image with different specifications can be firstly cut into a square, and the size of the original image is changed from the original size to the 480 × 480 square image after cutting.

In addition, the algorithm obtains each frame of image through the camera, and the value range of each frame of RGB three-channel image is 0-255. For calculation, the initial image may be normalized first. The normalization refers to mapping the original RGB channel values, namely the channel values of 0-255, into the range of 0-1 according to a unified rule for processing, so that the algorithm can process the image more conveniently and quickly. The specific method of operation is to subtract the minimum pixel value in the image from each pixel in the image and divide by the maximum pixel value in the image. The specific mathematical formula is as follows: (I)ⁱ-Iⁱ _min)/Iⁱ _max. Where N denotes the number of channels of an image, i.e. the dimensional information, e.g. N-3 for an RGB image. h denotes the height of the picture and w denotes the width of the picture.

The initial image is subjected to standard processing such as cutting and normalization and then is used as the input of the object recognition sub-model, so that the model operation amount can be reduced, and the detection precision is improved.

After the normalization process, a lightweight neural network N may be used_EExtracting an image

To obtain a set of high-dimensional features.

Then a decoder network N is used_DFor the height obtained aboveDecoding the dimensional feature F to obtain two groups of outputs: thermodynamic diagram of central point position

And object height and width feature map

Where c denotes the kind of object to be recognized.

Using lightweight neural networks N_ETo extract high-dimensional features in the image, the neural network N_EThe neural network model is not limited to a specific neural network model, and may be a mobilenet or a shufflent. The algorithm uses a neural network to extract high-dimensional features F within the image, which is essentially a matrix with image features. Decoder network N_DAnd is not limited to a specific neural network model, any network with the function of recovering and extracting the information in the high-dimensional features F can be used in the algorithm.

S202, searching for a candidate object matched with the central point position thermodynamic diagrams corresponding to the initial image according to the prestored central point position thermodynamic diagrams of all the objects;

the computer device is stored with a central point position thermodynamic diagram and a height-width characteristic diagram of all objects capable of participating in interaction in advance. In view of the fact that the central point position thermodynamic diagram can reflect the characteristics of the object more accurately, when detection is carried out, the central point position thermodynamic diagram corresponding to the initial image is compared with the central point position thermodynamic diagrams of other objects in a characteristic mode, the object matched with the central point position thermodynamic diagram of the initial image is found out, and the object is defined as a candidate object.

S203, judging whether the height and width characteristic diagram of the candidate object is matched with the object height and width characteristic diagram corresponding to the initial image;

and S204, if the height-width characteristic diagram of the candidate object is matched with the object height-width characteristic diagram corresponding to the initial image, determining that the initial image comprises the object to be recognized.

And comparing the height-width characteristic diagram of the candidate object with the object height-width characteristic diagram corresponding to the initial image, and judging whether the height-width characteristic diagram of the candidate object is matched with the object height-width characteristic diagram corresponding to the initial image. If the height-width feature map of the candidate object is matched with the object height-width feature map corresponding to the initial image, it can be determined that the initial image contains the object to be recognized output in the previous step.

Further, the step of searching for the candidate object matched with the central point position thermodynamic diagram corresponding to the initial image according to the prestored central point position thermodynamic diagrams of all the objects may specifically include:

finding out the candidate object with the highest confidence coefficient;

Thermodynamic diagram for center position

And traversing the positions of the width and the height to find out the category with the highest confidence coefficient. If the confidence of the category is higher than the preset threshold value at a certain position (x, y), it indicates that an object frame exists under the coordinate, and then the high-width feature map is displayed

The height and width of the corresponding object frame under the coordinate are found

The coordinates of the final object frame are

Thermodynamic diagram of central point position

Where h denotes the height of the image, w denotes the width of the image, and c denotes the class of the object to be recognizedOtherwise. The process of traversing is as follows: h W represents the size of the image, namely, each pixel point is judged on the whole image from top to bottom and from left to right. When each pixel point is judged, C channels are arranged at the position of each pixel point, and the possibility of C objects is shown. And selecting the highest probability as the category with the highest confidence.

Since each pixel in the image is likely to be the center point of the object, the magnitude of the thermodynamic diagram of the center point position should be consistent with the input of the image. Meanwhile, if a certain pixel in the image is the central point of the object detection frame, the object detection frame may belong to any one of C-class objects to be identified by the algorithm, that is, one point has C possibilities, that is, C numbers are required to represent the prediction results of the pixel point, each prediction probability result is between 0 and 1, and the probability sum of the C-class prediction results is equal to 1, so the central point thermodynamic diagram

The dimension of the matrix of (1) is HxWxC. Height profile

If a certain pixel in the image is the center point of the object frame, the height and width of the corresponding object detection frame at the pixel position can be found in the height feature map.

As shown in fig. 3 and 4: in a centre point thermodynamic diagram

Each pixel has C possibilities, i.e. there are C numbers to represent the prediction result of the pixel. The algorithm finds the largest prediction score among the C results, and the category corresponding to the largest prediction result is assumed to be category C_i. Secondly, it is compared whether this maximum prediction score is greater than a threshold set in advance, which is generally equal to 0.5. If it is greater than the threshold, the algorithm will consider the point to be C_iThe center point of this class of object detection box. Then is atHeight and width feature map

The height and width of the corresponding object frame (x, y) under the coordinate are found

Thus, the coordinates of the final object frame are

The first two points represent the coordinates of the center point, the second two points represent the height and width of the object detection box, and this detection box represents C_iThis category, and the confidence of the prediction is score.

According to another embodiment of the present disclosure, a training process for an identity model is specifically defined. Namely, the identification submodel can be obtained in advance by the following method:

acquiring a preset number of object sample images of each category;

The embodiment specifically defines a training process of an object recognition sub-model for recognizing a certain class of objects, and divides a sample data set for training into N different sub-classes, for example: office supplies, plants, animals, fruits, flowers, vehicles, each subclass can contain about 40 objects, and then the subclass object sample images are input into the basic neural network for training to obtain N object recognition models.

The neural network structure used for training is: a coder-decoder encoder-decoder. The principle of the encoder is that a neural network is obtained by combining a series of convolution and pooling, and the network is trained to extract features in the image, wherein the features can be information such as color, contour, texture and the like. These features are further processed and combined to obtain a high-dimensional feature F, which theoretically contains information of objects in the image, including location information and category information.

The principle of the decoder is that a series of deconvolution and linear interpolation operations are combined to obtain a neural network, and the network is trained to decode the high-dimensional feature F obtained in the encoder, so as to obtain the result, namely a central point position thermodynamic diagram and a wide-height feature diagram. Through analysis, the center point and the width and the height of the object detection frame are obtained.

The training process is consistent with the normal neural network algorithm training process, as shown in fig. 5, and mainly includes:

firstly, inputting an image, an encoder of a Convolutional Neural Network (CNN) acquires features, and the features pass through a decoder to obtain a set of results predicted by an algorithm, namely an object detection frame in the image. The algorithm then calculates the error between these predicted object detection boxes and the actual artificially marked object detection boxes, also known as the loss function loss. Parameters in the iterative encoder-decoder are updated through a back propagation algorithm, so that the result of the next prediction of the encoder-decoder is more accurate.

In the interaction method provided by the embodiment of the present disclosure, the submodels of a plurality of object categories are trained first. And then, when the interactive application is started, a prompt message is given, the algorithm firstly judges the class to which the prompt message belongs, and then calls a sub-small model corresponding to the class to which the prompt message of the object to be recognized belongs. The training time of the light-weight small model is short, the training difficulty is small, and the accuracy rate of the light-weight small model on the subclass can be higher than that of the large model. After the recognition prompt information is judged to belong to which category, the sub-model is called, the running time of the recognition algorithm is much faster than that of the recognition algorithm which directly calls the large model, and waiting of a user is reduced. In the whole model updating iteration process, if the algorithm is needed to be added to identify the fruit type, then only the small fruit type model needs to be trained iteratively, and the difficulty of subsequent iteration maintenance is reduced.

Example 2

Corresponding to the above method embodiment, as shown in fig. 6, an interactive apparatus 600 is provided in the embodiment of the present disclosure. As shown in fig. 6, the interaction device 600 includes:

the output module 601 is used for outputting prompt information of an object to be recognized;

the calling module 602 is configured to call a corresponding object recognition sub-model according to the category of the object to be recognized;

the detecting module 603 is configured to input the captured initial image into the object recognizing sub-model, and detect whether the initial image includes the object to be recognized.

In addition, the embodiment of the present disclosure further provides a computer device, which includes a memory and a processor, where the memory is connected to the processor, the memory is used to store a computer program, and the processor runs the computer program to make the computer device execute the interaction method in any one of the above embodiments.

In addition, the present disclosure provides a computer-readable storage medium storing a computer program used in the computer device described above.

In summary, the interaction method, the interaction device, and the computer device provided by the embodiments of the present disclosure use the object recognition sub-model of the corresponding category to recognize the interaction process of the object of the corresponding category. Therefore, the calculation amount of the objects corresponding to the multiple classes for picture viewing identification is greatly simplified, the interaction efficiency is improved, and the overall process is simplified. For specific implementation processes of the provided interaction apparatus and the computer device, reference may be made to the specific implementation processes of the interaction method provided in the foregoing embodiments, and details are not repeated here.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, each functional module or unit in each embodiment of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention or a part of the technical solution that contributes to the prior art in essence can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a smart phone, a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.

Claims

1. An interactive method, characterized in that the method comprises:

outputting prompt information of an object to be identified;

2. The method of claim 1, wherein the step of outputting the prompt for the object to be recognized is preceded by the method further comprising:

acquiring a basic image of a current scene;

3. The method of claim 2, wherein the step of inputting the initial image obtained by shooting into the object recognition model and detecting whether the object to be recognized is included in the initial image comprises:

4. The method according to claim 3, wherein the step of searching for the candidate object matching the center point position thermodynamic diagrams corresponding to the initial image according to the prestored center point position thermodynamic diagrams of all the objects comprises:

finding out the candidate object with the highest confidence coefficient;

5. The method of claim 3, wherein the step of inputting the initial image into the object recognition model to obtain a center point position thermodynamic diagram and an object height-width characteristic diagram corresponding to the initial image comprises:

carrying out standardization preprocessing on the initial image;

6. The method of claim 5, wherein the step of subjecting the initial image to a normalization pre-processing comprises:

cutting the initial image into a preset size;

and normalizing the cut initial image.

7. The method of claim 1, wherein the identifier model is obtained in advance by:

acquiring a preset number of object sample images of each category;

8. The method according to claim 1, wherein after the step of detecting whether the object to be recognized is included in the initial image, the method further comprises:

9. An interactive apparatus, characterized in that the apparatus comprises:

10. A computer device comprising a memory and a processor, the memory being connected to the processor, the memory being configured to store a computer program, the processor being configured to execute the computer program to cause the computer device to perform the interaction method of any one of claims 1 to 8.

11. A computer-readable storage medium, characterized in that it stores a computer program for use in the computer device of claim 10.