CN112464894B - Interaction method and device and computer equipment - Google Patents

Interaction method and device and computer equipment Download PDF

Info

Publication number
CN112464894B
CN112464894B CN202011474943.XA CN202011474943A CN112464894B CN 112464894 B CN112464894 B CN 112464894B CN 202011474943 A CN202011474943 A CN 202011474943A CN 112464894 B CN112464894 B CN 112464894B
Authority
CN
China
Prior art keywords
initial image
identified
image
model
height
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011474943.XA
Other languages
Chinese (zh)
Other versions
CN112464894A (en
Inventor
顾在旺
程骏
庞建新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ubtech Robotics Corp
Original Assignee
Ubtech Robotics Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ubtech Robotics Corp filed Critical Ubtech Robotics Corp
Priority to CN202011474943.XA priority Critical patent/CN112464894B/en
Publication of CN112464894A publication Critical patent/CN112464894A/en
Application granted granted Critical
Publication of CN112464894B publication Critical patent/CN112464894B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • G06V20/36Indoor scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/68Food, e.g. fruit or vegetables

Abstract

The embodiment of the application discloses an interaction method, an interaction device and computer equipment, wherein the interaction method comprises the following steps: outputting prompt information of an object to be identified; invoking a corresponding recognition sub-model according to the category of the object to be recognized; inputting the initial image obtained by shooting into the recognition sub-model, and detecting whether the initial image contains the object to be recognized. According to the interaction scheme provided by the embodiment, whether the initial image contains the object to be identified or not can be detected by calling the corresponding object identification sub-model according to the type of the object to be identified and inputting the initial image obtained through shooting into the called object identification sub-model, so that the process of identifying the object with the view of the object of the corresponding type by using the object identification sub-model of the corresponding type is completed. Therefore, the calculated amount of the objects corresponding to the multiple types of objects for image recognition is greatly reduced, the efficiency of image recognition is improved, and the whole flow is simplified.

Description

Interaction method and device and computer equipment
Technical Field
The present application relates to the field of image processing technologies, and in particular, to an interaction method, an interaction device, and a computer device.
Background
In recent years, with the rapid development of artificial intelligence (Artificial Intelligence, abbreviated as AI), many AI-based applications have been developed, including web-side view recognition. Specifically, the object recognition method is to open a browser and input a website for interaction at the user. The application gives a prompt word, such as a mouse, from a system word library, then the user starts to find the object mouse, if the user finds the object, the user displays the found object mouse in front of the screen camera, and the application determines in real time by calling the picture captured by the camera: whether the object in the picture is a hint word given by the application at the beginning. If the algorithm judges that the key words given from the initial picture are in the pictures captured by the camera, the algorithm judges that the user finds the object and continues to give out a word from the system word stock, so that the user has a fun of rushing. Meanwhile, the application and the game can enable the user to share the score of the user on the social network site, and enable more people to enter. Therefore, a detection algorithm with high accuracy and multiple recognition types becomes a key for the application of the image recognition object.
The common method for viewing the object is that the application firstly starts the camera, then transmits the image captured by the camera into the application, and then the application invokes an object detection algorithm to output the object type and coordinate information in the image. The application then proceeds to further processing of the information, such as determining whether it matches the keyword from which the application began, to proceed with subsequent determinations.
In order to enable the application of the object recognition to have a better user experience, the setting algorithm can recognize as many kinds as possible, so that the user can select to skip when the object is not at hand. In order to obtain a better training result, the depth of the model needs to be increased generally, and one problem caused by increasing the depth of the model is that the prediction time of the model is increased, the waiting time of a user is increased, and the recognition accuracy is reduced.
Therefore, the existing application of the object recognition by the image has the technical problems of low recognition accuracy and long recognition waiting time.
Disclosure of Invention
The embodiment of the disclosure provides an interaction method, an interaction device and computer equipment, so as to solve at least part of the technical problems.
In a first aspect, an embodiment of the present disclosure provides an interaction method, including:
outputting prompt information of an object to be identified;
invoking a corresponding recognition sub-model according to the category of the object to be recognized;
inputting the initial image obtained by shooting into the recognition sub-model, and detecting whether the initial image contains the object to be recognized.
According to a specific embodiment of the present disclosure, before the step of outputting the prompt information of the object to be identified, the method further includes:
collecting a basic image of a current scene;
identifying characteristic data of all objects contained in the basic image, wherein the object to be identified is any one of the all objects;
and correspondingly storing the prompt information and the category of each object.
According to one specific embodiment of the disclosure, the step of inputting the initial image obtained by shooting into the recognition sub-model and detecting whether the initial image contains the object to be recognized includes:
inputting the initial image into the object recognition sub-model, and acquiring a central point position thermodynamic diagram and an object height-width characteristic diagram corresponding to the initial image;
searching candidate objects matched with the central point thermodynamic diagrams corresponding to the initial images according to the prestored central point thermodynamic diagrams of all the objects;
judging whether the height-width characteristic diagram of the candidate object is matched with the object height-width characteristic diagram corresponding to the initial image;
and if the height-width characteristic diagram of the candidate object is matched with the object height-width characteristic diagram corresponding to the initial image, determining that the initial image contains the object to be identified.
According to one specific embodiment of the disclosure, the step of searching for an alternative object matching the central point thermodynamic diagram corresponding to the initial image according to the prestored central point thermodynamic diagram of all objects includes:
traversing each pixel point in the central point thermodynamic diagram corresponding to the initial image according to the central point position thermodynamic diagram of all the objects, and determining the confidence coefficient of the central point position thermodynamic diagram of each object;
searching the candidate object with the highest confidence;
and if the confidence coefficient of the candidate object is higher than a preset threshold value, determining the candidate object as a matched candidate object.
According to a specific embodiment of the disclosure, the step of inputting the initial image into the recognition sub-model and obtaining a thermodynamic diagram of a center point position and an object height-width characteristic diagram corresponding to the initial image includes:
carrying out standardized pretreatment on the initial image;
coding the initial image after standardized pretreatment to obtain a corresponding feature matrix;
and decoding the feature matrix to obtain a central point position thermodynamic diagram and an object height-width feature diagram corresponding to the initial image.
According to one embodiment of the disclosure, the step of performing normalized preprocessing on the initial image includes:
cutting the initial image into a preset size;
and normalizing the cut initial image.
According to one specific embodiment of the disclosure, the knowledge sub-model is obtained in advance by:
acquiring a predetermined number of object sample images of each category;
and respectively inputting object sample images of all the classes into a basic neural network, and training to obtain a knowledge sub-model of the corresponding class.
According to one embodiment of the present disclosure, after the step of detecting whether the initial image includes the object to be identified, the method further includes:
and if the initial image contains the object to be identified, determining that the object to be identified is successfully identified.
In a second aspect, embodiments of the present disclosure provide an interaction apparatus, the apparatus comprising:
the output module is used for outputting prompt information of the object to be identified;
the calling module is used for calling the corresponding recognition sub-model according to the category of the object to be recognized;
the detection module is used for inputting the initial image obtained by shooting into the recognition sub-model and detecting whether the initial image contains the object to be recognized.
In a third aspect, embodiments of the present disclosure provide a computer device, including a memory and a processor, the memory being connected to the processor, the memory being configured to store a computer program, the processor being configured to cause the computer device to perform the interaction method of any one of the first aspects.
In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium storing a computer program for use in a computer device according to the third aspect.
The interaction method, the interaction device and the computer equipment provided by the embodiment of the disclosure are characterized in that the computer equipment is preloaded with the identifier models for identifying different types of objects. When interaction is carried out, prompt information of an object to be identified is output firstly, namely a corresponding object identification sub-model can be called according to the type of the object to be identified, the initial image obtained through shooting is input into the called object identification sub-model, and whether the initial image contains the object to be identified or not can be detected, so that an interaction flow of identifying the object of the corresponding type by utilizing the object identification sub-model of the corresponding type is completed. Therefore, the calculated amount of the objects corresponding to the multiple types of objects for identifying the images is greatly reduced, the interaction efficiency is improved, and the whole flow is simplified.
Drawings
In order to more clearly illustrate the technical solutions of the present application, the drawings that are required for the embodiments will be briefly described, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope of the present application. Like elements are numbered alike in the various figures.
Fig. 1 shows a flow diagram of an interaction method provided by an embodiment of the disclosure;
FIG. 2 illustrates a partial flow diagram of an interaction method provided by an embodiment of the present disclosure;
fig. 3 and fig. 4 show feature diagrams related to an interaction method provided by an embodiment of the present disclosure;
FIG. 5 shows a process schematic of an interaction method provided by an embodiment of the present disclosure;
fig. 6 shows a block diagram of an interaction device provided by an embodiment of the present disclosure.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments.
The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.
The terms "comprises," "comprising," "including," or any other variation thereof, are intended to cover a specific feature, number, step, operation, element, component, or combination of the foregoing, which may be used in various embodiments of the present application, and are not intended to first exclude the presence of or increase the likelihood of one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
Furthermore, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the application belong. The terms (such as those defined in commonly used dictionaries) will be interpreted as having a meaning that is the same as the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in connection with the various embodiments of the application.
The application of the visual recognition should include the following two points: the recognition accuracy is high, and the recognition types are multiple. However, the current common image recognition algorithm is to directly collect a training data set and train the training data set by using a common target detection algorithm. However, the problems associated with the use of these visual identifiers are mainly the following:
1. the identification accuracy is not high. Since the application of the object recognition can have a better user experience, the algorithm can recognize as many kinds as possible, so that the user can choose to skip when there is no object. But so many objects are not an easy matter for training of the model. If a better training result is desired, the depth of the model needs to be increased, and one problem caused by increasing the depth of the model is that the prediction time of the model is increased, and the waiting time of a user is increased, so that bad experience is brought to the user.
2. Viewing the object in order to be able to keep the user as far as possible, it is necessary to update the object type to be recognized regularly. Then, the algorithm model needs to be retrained every time the type of object identification is updated, and the difficulty and the time of model training can be increased in geometric level along with the increase of the type of the object.
3. The common object detection model is large, the occupied storage space and the occupied calculation resources are large in running, and because the algorithm splits a complete large model into more sub-models, the object detection model needs to be optimized, the size of the model is reduced, and the resources occupied in model operation are reduced. In order to detect the position of an object, the conventional object detection algorithm firstly extracts features on an image, then each pixel point on the features is used as an Anchor point Anchor, each pixel point can generate a plurality of rectangular frames (generally 9) with different length ratios, and then the regions of interest (Region Of Interest, abbreviated as ROI) represented by all the rectangular frames are placed in a classifier for training judgment, so that the type and position information of the object to be detected in the image are obtained. However, this method has a disadvantage in that the aspect ratio of the rectangular frame needs to be set manually. The number of the rectangular frames is small, so that object detection omission is caused, if the number of the rectangular frames is too large, the calculated amount is large, the detection time is too long, and the user experience is affected.
Example 1
Referring to fig. 1, a flow chart of an interaction method provided by an embodiment of the disclosure is shown. As shown in fig. 1, the method mainly comprises the following steps:
s101, outputting prompt information of an object to be identified;
the interaction method provided by the embodiment can be applied to a view object recognition application, such as a view object recognition game application, of a terminal or a webpage end. The implementation of the entire scheme will be described below mainly from the point of view of application to computer devices.
Identification information, such as object names, categories, images, colors, shapes and the like, associated with all identifiable objects is prestored in the computer equipment. When the interaction flow is started, the computer device firstly searches an object for participating in the interaction, which is defined as an object to be identified, wherein the object to be identified is any one of all objects capable of performing the interaction, and the all objects capable of performing the interaction are usually the objects existing in the current scene.
The computer outputs prompt information of the object to be identified, which is used for prompting the relevant characteristic information of the object to be identified for a user, wherein the prompt information is usually at least one of identification information of the object, such as an object name. The manner of outputting the prompt information of the object to be identified by the computer device may be voice playing and image displaying, but may be other manners, which is not limited.
S102, calling a corresponding recognition sub-model according to the category of the object to be recognized;
the computer device is preloaded with a plurality of object recognition sub-models, and the plurality of object recognition sub-models can be respectively used for recognizing different types of objects, wherein the types can comprise fruits, office supplies, plants, furniture, stationery, animals, flowers, vehicles and the like, and each type can comprise tens or hundreds of objects. Each recognition sub-model is a neural network which is obtained through training and is provided with a recognition object of a corresponding class, and the recognition sub-model can be a lightweight small model and is used for recognizing various objects in one type. It should be noted that, the object model is described herein only for distinguishing the overall model including multiple classes of object models, and each sub model may be invoked and run separately.
After the computer equipment outputs the prompt information of the object to be identified, the corresponding object identification sub-model can be called according to the category of the object to be identified. For example, if the output prompt information of the object to be identified is "apple", the category corresponding to the object to be identified is fruit, and the computer equipment invokes the recognition sub-model corresponding to the fruit.
S103, inputting the initial image obtained through shooting into the recognition sub-model, and detecting whether the initial image contains the object to be recognized.
The computer equipment inputs the obtained initial image into the invoked recognition object sub-model so that the recognition object sub-model detects whether the object to be recognized is contained according to the initial image, the detection process mainly comprises feature detection, namely whether feature information matched with the object to be recognized exists in the initial image or not is detected, and if the feature information matched with the object to be recognized exists, the object to be recognized can be determined to be detected.
According to a specific embodiment of the present disclosure, after the step of detecting whether the initial image includes the object to be identified, the method may further include:
and if the initial image contains the object to be identified, determining that the object to be identified is successfully identified.
In a process of identifying objects by using the image, if the object to be identified is detected to be included in the initial image, the result of identifying the object by using the image can be identified as successful, otherwise, the result of identifying the object by using the image is identified as failed.
According to a specific embodiment of the present disclosure, before the step of outputting the prompt information of the object to be identified, the method may further include:
collecting a basic image of a current scene;
identifying characteristic data of all objects contained in the basic image, wherein the object to be identified is any one of the all objects;
and correspondingly storing the prompt information and the category of each object.
This embodiment defines the pre-preparation process of the interaction. The computer device first captures a base image of the current scene, such as an image in the current room. And identifying the basic image of the current scene, identifying all the contained objects, and correspondingly storing the prompt information and the category of all the contained objects. Thus, the computer equipment can carry out interaction flow by taking any one of all the objects as the object to be identified.
The method has the advantages that the images in the current scene are collected preferentially, all the objects contained in the current scene are identified, the objects to be identified can be selected from all the objects contained in the current scene to participate in the interaction flow, and the success rate and convenience of interaction are guaranteed.
The interaction method provided by the embodiment of the disclosure mainly comprises the following steps when being implemented in particular: the application program firstly initializes N trained object recognition sub-models; the application then randomly generates a word, such as a mouse, from within the system lexicon. And firstly judging which sub-category the words generated by the system belong to by utilizing an interaction algorithm, and then calling the object identification sub-type corresponding to the sub-category. The algorithm continuously processes the images captured by the camera and gives the result of the algorithm identifying these images, mainly identifying which category the object is contained in, and the coordinates of this object frame. And comparing the result obtained by the background algorithm with a word generated by the application program at the beginning, and if the word exists in the result given by the background, indicating that the image of the object searched by the user is correct, and progressing to the next pass.
According to the interaction method provided by the embodiment of the disclosure, the identifier models for identifying different types of objects are preloaded in the computer equipment. When interaction is carried out, prompt information of an object to be identified is output firstly, namely a corresponding object identification sub-model can be called according to the type of the object to be identified, the initial image obtained through shooting is input into the called object identification sub-model, and whether the initial image contains the object to be identified or not can be detected, so that an interaction flow of identifying the object of the corresponding type by utilizing the object identification sub-model of the corresponding type is completed. Therefore, the calculated amount of the corresponding objects for identifying the pictures is greatly simplified, the interaction efficiency is improved, and the whole flow is simplified.
On the basis of the above embodiment, the present embodiment further defines a process of detecting an object in an image by using an object recognition sub-model, and the detection process of the present embodiment is different from the existing object detection algorithm in that a plurality of object detection frame anchor points are required to be preset in the existing object detection algorithm, and the setting of the object detection frame affects the accuracy and the recognition speed of object recognition. The object detection method provided in this embodiment does not depend on the setting of the object detection anchors, so that the number of anchors and the aspect ratio can be prevented from being set empirically.
Specifically, as shown in fig. 2, the step of inputting the initial image obtained by shooting into the recognition sub-model and detecting whether the initial image includes the object to be recognized may specifically include:
s201, inputting the initial image into the object recognition sub-model, and acquiring a central point position thermodynamic diagram and an object height-width characteristic diagram corresponding to the initial image;
in this embodiment, an initial image is input into a recognition sub-model of a corresponding type, and a thermodynamic diagram of a center point position corresponding to the initial image and an object height-width feature diagram are obtained, and the two feature diagrams can accurately represent features of an object in the image by comparing.
The thermodynamic diagram of the center point position is used for representing the probability that any point in the image is the center point of the object frame, and the value is (0-1). Any point may be a larger probability of being the center point of the object frame, the larger the value in the thermodynamic diagram of the center point location, the darker the color, whereas the smaller the probability, the smaller the value in the thermodynamic diagram of the center point location, the lighter the color.
The object high-width feature map refers to the fact that when each pixel point in the image is possibly the center point of the object detection frame, the length and the width of the corresponding object detection frame are embodied in the region in the feature map.
Further, the step of inputting the initial image into the recognition sub-model to obtain a central point position thermodynamic diagram and an object height-width characteristic diagram corresponding to the initial image may include:
carrying out standardized pretreatment on the initial image;
coding the initial image after standardized pretreatment to obtain a corresponding feature matrix;
and decoding the feature matrix to obtain a central point position thermodynamic diagram and an object height-width feature diagram corresponding to the initial image.
Considering that the initial image is an image acquired by a camera, the sizes of the images acquired by different cameras are different, for example, the sizes are 640×480, and the sizes are 1920×1080, which are determined according to the resolution of the cameras. In order to ensure the accuracy of the subsequent model detection, the initial image is subjected to standardized pretreatment and processed into an image meeting the preset specification.
In a specific implementation, the step of performing standardized preprocessing on the initial image may include:
cutting the initial image into a preset size;
and normalizing the cut initial image.
First, the initial image with different specifications can be cut into a square, and the size of the initial image is changed from the original size into a square image with the size of 480 x 480 after cutting.
In addition, the algorithm obtains each frame of image through the camera, and the value range of each frame of RGB three-channel image is 0-255. For ease of calculation, the initial image may be normalized. Normalization refers to mapping the original RGB channel values, namely channel values of 0-255, to the range of 0-1 for processing according to a unified rule, so that the algorithm can process images more conveniently and rapidly. The specific operation method is that each pixel in the image is subtracted by the smallest pixel value in the image and divided by the largest pixel value in the image. The specific mathematical formula is as follows: (I) i -I i min )/I i max . Where N represents the number of channels of the image, i.e. the dimension information, e.g. n=3 for one RGB image. h represents the high height of the image and w represents the wide width of the image.
The initial image is subjected to standardized processing such as clipping and normalization and then is used as the input of the knowledge sub-model, so that the calculation amount of the model can be reduced, and the detection precision can be improved.
After normalization, a lightweight neural network N can be used first E Extracting an imageIs a set of high-dimensional features.
Then a decoder network N is used D Decoding the obtained high-dimensional feature F to obtain two groups of outputs: thermodynamic diagram of central point positionAnd object height and width feature map->Where c represents the kind of object to be identified.
Using a neural network N of lightweight E To extract high-dimensional features in the image, the neural network N E Not limited to a particular oneThe fixed neural network model may be mobilet or shufflelenet. The algorithm uses a neural network to extract the high-dimensional features F inside the image, F being essentially a matrix with image features. Decoder network N D Nor is it limited to a particular neural network model, any network with the ability to extract information from the high-dimensional features F by way of restoration may be used in the algorithm.
S202, searching candidate objects matched with the central point thermodynamic diagram corresponding to the initial image according to the prestored central point thermodynamic diagram of all the objects;
the central point position thermodynamic diagram and the height-width characteristic diagram of all objects capable of participating in interaction are prestored in the computer equipment. In view of the fact that the central point thermodynamic diagram can more accurately reflect the characteristics of the object, when the detection is carried out, the central point thermodynamic diagram corresponding to the initial image is firstly compared with the central point thermodynamic diagrams of other objects in characteristics, and the object matched with the central point thermodynamic diagram of the initial image is found out and defined as an alternative object.
S203, judging whether the height-width characteristic diagram of the candidate object is matched with the object height-width characteristic diagram corresponding to the initial image;
s204, if the height-width feature map of the candidate object is matched with the object height-width feature map corresponding to the initial image, determining that the initial image contains the object to be identified.
And comparing the height-width characteristic map of the candidate object with the height-width characteristic map of the object corresponding to the initial image, and judging whether the height-width characteristic map and the height-width characteristic map are matched. If the height-width feature map of the candidate object is matched with the object height-width feature map corresponding to the initial image, the object to be identified output in the previous step can be determined to be contained in the initial image.
Further, the step of searching for an alternative object that matches the central point thermodynamic diagram corresponding to the initial image according to the central point thermodynamic diagrams of all the prestored objects may specifically include:
traversing each pixel point in the central point thermodynamic diagram corresponding to the initial image according to the central point position thermodynamic diagram of all the objects, and determining the confidence coefficient of the central point position thermodynamic diagram of each object;
searching the candidate object with the highest confidence;
and if the confidence coefficient of the candidate object is higher than a preset threshold value, determining the candidate object as a matched candidate object.
Thermodynamic diagram of the location of a center pointTraversing at the positions of the width and the height to find out the category with the highest confidence. If the confidence level of the category is higher than a preset threshold value at a certain position (x, y), then it is indicated that an object frame exists at the coordinate, and then the object frame is represented by the aspect ratio feature map ++>Finding the height and width of the corresponding object frame at the coordinatesThe coordinates of the final object frame are +.>Center point position thermodynamic diagram ++>Where h represents the height of the image, w represents the width of the image, and c represents the category of the object to be recognized. The traversing process is as follows: h x W represents the size of the image, i.e. each pixel is determined from top to bottom and from left to right over the entire image. When judging each pixel point, there are C channels at each pixel point position, which indicates that there are C kinds of objects. The highest likelihood is selected as the highest confidence class.
Since each pixel in the image is likely to be the center point of the object, the magnitude of the center point location thermodynamic diagram should be consistent with the input of the image. Meanwhile, assume that a certain one of the imagesThe pixel is the center point of the object detection frame, and then the object detection frame may belong to any one of C classes of objects to be identified by the algorithm, that is, a point has C possibilities, that is, C numbers are needed to represent the predicted result of the pixel point, each predicted probability result is between 0 and 1, and the probability addition of the C classes of predicted results is equal to 1, thus the thermodynamic diagram of the center pointIs HxWxC. Height profile->Assuming that a certain pixel in the image is the center point of the object frame, the height and width of the object detection frame corresponding to the pixel point in the height feature map are found.
As shown in fig. 3 and 4: in the thermodynamic diagram of the central pointEach pixel has C possibilities, i.e., C numbers to represent the prediction result of the pixel. The algorithm will find the largest prediction score among the C results, and find the class hypothesis corresponding to the largest prediction result as class C i . Next, it is compared whether this maximum prediction score is greater than a threshold set in advance, which is generally equal to 0.5. If it is greater than the threshold, the algorithm considers this point to be C i The center point of this class of object detection boxes. Then in the aspect ratio feature map->Finding the height and width of the (x, y) corresponding object frame at the coordinates>Thus, the coordinates of the final object frame are +.>First two point tablesShowing the coordinates of the center point, the latter two points representing the height and width of the object detection frame, while this detection frame represents C i This category, and the confidence of the prediction is score.
According to another embodiment of the present disclosure, a training process of an knowledge sub-model is specifically defined. Namely, the knowledge sub-model can be obtained in advance by the following way:
acquiring a predetermined number of object sample images of each category;
and respectively inputting object sample images of all the classes into a basic neural network, and training to obtain a knowledge sub-model of the corresponding class.
The present embodiment specifically defines a training process for identifying an identification sub-model of an object of a certain category, dividing a sample data set for training into N different sub-categories, for example: office supplies, plants, animals, fruits, flowers, vehicles, each subclass may contain about 40 objects, and then training is performed using sample images of these subclasses of objects to input into the underlying neural network, resulting in N knowledge sub-models.
The neural network structure used for training is: encoder-decoder. The principle of the encoder is that a neural network is obtained by combining a series of convolution accounting and pooling, and the neural network is trained so as to extract the characteristics in the image, wherein the characteristics can be information such as color, contour, texture and the like. These features are further processed and combined to obtain a high-dimensional feature F which theoretically contains information of the object in the image, including position information and category information.
The principle of the decoder is that a neural network is obtained by combining a series of deconvolution and linear interpolation operations, and the neural network decodes the high-dimensional features F obtained in the encoder through training, thereby obtaining results, namely a central point position thermodynamic diagram and a wide-high feature diagram. By analysis, we obtain the center point and width and height of the object detection frame.
The training process is consistent with the normal neural network algorithm training process, as shown in fig. 5, and mainly comprises the following steps:
firstly, inputting an image, acquiring characteristics by an encoder of a convolutional neural network (Convolutional Neural Networks, CNN for short), and obtaining a result predicted by a set of algorithms, namely an object detection frame in the image by the characteristics through a decoder. The algorithm then calculates the error, also known as the loss function loss, between these predicted object detection frames and the actual artificially marked object detection frames. The parameters in the iterative codec are updated by a back-propagation algorithm, so that the result of the next prediction by the codec is more accurate.
According to the interaction method provided by the embodiment of the disclosure, the sub-models of a plurality of object categories are trained first. Then, when the interactive application is started, a prompt message is given, the algorithm firstly judges the category to which the prompt message belongs, and then invokes the sub-small model corresponding to the category to which the prompt message of the object to be identified belongs. The training time of the lightweight small model is short, the training difficulty is small, and the accuracy rate on the subcategory can be higher than that of the large model. After judging which category the recognition prompt information belongs to, invoking the sub-model, the operation time of the recognition algorithm is much faster than that of directly invoking the large model, and the waiting of a user is reduced. In the whole model updating iterative process, if the variety of the fruits is identified by adding an algorithm, only the small model of the fruits is needed to be trained in an iterative mode, and the difficulty of subsequent iterative maintenance is reduced.
Example 2
Corresponding to the above-described method embodiments, as shown in fig. 6, an embodiment of the present disclosure provides an interaction device 600. As shown in fig. 6, the interaction device 600 includes:
the output module 601 is configured to output a prompt message of an object to be identified;
a calling module 602, configured to call a corresponding identifier model according to a class of an object to be identified;
the detection module 603 is configured to input an initial image obtained by capturing into the recognition sub-model, and detect whether the initial image includes the object to be recognized.
In addition, an embodiment of the disclosure further provides a computer device, including a memory and a processor, where the memories are all connected to the processor, and the memory is used to store a computer program, and the processor runs the computer program to make the computer device execute the interaction method in any of the foregoing embodiments.
In addition, the embodiment of the present disclosure provides a computer-readable storage medium storing a computer program used in the above-described computer device.
In summary, the interaction method, the device and the computer equipment provided by the embodiments of the present disclosure identify the interaction flow of the object of the corresponding category by using the identifier model of the corresponding category. Therefore, the calculated amount of the corresponding objects for identifying the pictures is greatly simplified, the interaction efficiency is improved, and the whole flow is simplified. The specific implementation process of the interaction device and the computer device provided can be referred to the specific implementation process of the interaction method provided in the foregoing embodiment, which is not described herein in detail.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flow diagrams and block diagrams in the figures, which illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules or units in various embodiments of the application may be integrated together to form a single part, or the modules may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a smart phone, a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application.

Claims (10)

1. A method of interaction, the method comprising:
outputting prompt information of an object to be identified;
invoking a corresponding recognition sub-model according to the category of the object to be recognized;
inputting an initial image into the object recognition sub-model, and acquiring a central point position thermodynamic diagram and an object height-width characteristic diagram corresponding to the initial image; the center point position thermodynamic diagram is used for representing the probability that any point in the image is the center point of the object frame; the object height and width feature map refers to that when each pixel point in the image is the center point of the object detection frame, the length and width of the corresponding object detection frame are reflected in the region of the feature map;
searching candidate objects matched with the central point thermodynamic diagrams corresponding to the initial images according to the prestored central point thermodynamic diagrams of all the objects;
judging whether the height-width characteristic diagram of the candidate object is matched with the object height-width characteristic diagram corresponding to the initial image;
and if the height-width characteristic diagram of the candidate object is matched with the object height-width characteristic diagram corresponding to the initial image, determining that the initial image contains the object to be identified.
2. The method of claim 1, wherein prior to the step of outputting a hint of the object to be identified, the method further comprises:
collecting a basic image of a current scene;
identifying characteristic data of all objects contained in the basic image, wherein the object to be identified is any one of the all objects;
and correspondingly storing the prompt information and the category of each object.
3. The method of claim 1, wherein the step of searching for an alternative object matching the center point location thermodynamic diagram corresponding to the initial image based on the pre-stored center point location thermodynamic diagrams of all objects comprises:
traversing each pixel point in the central point thermodynamic diagram corresponding to the initial image according to the central point position thermodynamic diagram of all the objects, and determining the confidence coefficient of the central point position thermodynamic diagram of each object;
searching the candidate object with the highest confidence;
and if the confidence coefficient of the candidate object is higher than a preset threshold value, determining the candidate object as a matched candidate object.
4. The method of claim 1, wherein the step of inputting an initial image into the identifier sub-model to obtain a thermodynamic diagram of a location of a center point and an aspect ratio feature map of an object corresponding to the initial image comprises:
carrying out standardized pretreatment on the initial image;
coding the initial image after standardized pretreatment to obtain a corresponding feature matrix;
and decoding the feature matrix to obtain a central point position thermodynamic diagram and an object height-width feature diagram corresponding to the initial image.
5. The method of claim 4, wherein the step of normalizing the initial image comprises:
cutting the initial image into a preset size;
and normalizing the cut initial image.
6. The method according to claim 1, wherein the knowledge sub-model is obtained in advance by:
acquiring a predetermined number of object sample images of each category;
and respectively inputting object sample images of all the classes into a basic neural network, and training to obtain a knowledge sub-model of the corresponding class.
7. The method according to claim 1, wherein after the step of detecting whether the object to be identified is contained in the initial image, the method further comprises:
and if the initial image contains the object to be identified, determining that the object to be identified is successfully identified.
8. An interactive apparatus, the apparatus comprising:
the output module is used for outputting prompt information of the object to be identified;
the calling module is used for calling the corresponding recognition sub-model according to the category of the object to be recognized;
the detection module is used for inputting an initial image into the object recognition sub-model and acquiring a central point position thermodynamic diagram and an object height-width characteristic diagram corresponding to the initial image; the center point position thermodynamic diagram is used for representing the probability that any point in the image is the center point of the object frame; the object height and width feature map refers to that when each pixel point in the image is the center point of the object detection frame, the length and width of the corresponding object detection frame are reflected in the region of the feature map;
searching candidate objects matched with the central point thermodynamic diagrams corresponding to the initial images according to the prestored central point thermodynamic diagrams of all the objects;
judging whether the height-width characteristic diagram of the candidate object is matched with the object height-width characteristic diagram corresponding to the initial image;
and if the height-width characteristic diagram of the candidate object is matched with the object height-width characteristic diagram corresponding to the initial image, determining that the initial image contains the object to be identified.
9. A computer device comprising a memory and a processor, the memory being connected to the processor, the memory being for storing a computer program, the processor being operative to cause the computer device to perform the interaction method of any one of claims 1 to 7.
10. A computer readable storage medium, characterized in that it stores a computer program for use in the computer device of claim 9.
CN202011474943.XA 2020-12-14 2020-12-14 Interaction method and device and computer equipment Active CN112464894B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011474943.XA CN112464894B (en) 2020-12-14 2020-12-14 Interaction method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011474943.XA CN112464894B (en) 2020-12-14 2020-12-14 Interaction method and device and computer equipment

Publications (2)

Publication Number Publication Date
CN112464894A CN112464894A (en) 2021-03-09
CN112464894B true CN112464894B (en) 2023-09-01

Family

ID=74804208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011474943.XA Active CN112464894B (en) 2020-12-14 2020-12-14 Interaction method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN112464894B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469116A (en) * 2015-12-01 2016-04-06 深圳市图灵机器人有限公司 Picture recognition and data extension method for infants based on man-machine interaction
CN106529460A (en) * 2016-11-03 2017-03-22 贺江涛 Object classification identification system and identification method based on robot side
CN108550107A (en) * 2018-04-27 2018-09-18 Oppo广东移动通信有限公司 A kind of image processing method, picture processing unit and mobile terminal
CN108647588A (en) * 2018-04-24 2018-10-12 广州绿怡信息科技有限公司 Goods categories recognition methods, device, computer equipment and storage medium
CN109993138A (en) * 2019-04-08 2019-07-09 北京易华录信息技术股份有限公司 A kind of car plate detection and recognition methods and device
CN110647844A (en) * 2019-09-23 2020-01-03 深圳一块互动网络技术有限公司 Shooting and identifying method for articles for children
CN111783675A (en) * 2020-07-03 2020-10-16 郑州迈拓信息技术有限公司 Intelligent city video self-adaptive HDR control method based on vehicle semantic perception

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704857B (en) * 2017-09-25 2020-07-24 北京邮电大学 End-to-end lightweight license plate recognition method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469116A (en) * 2015-12-01 2016-04-06 深圳市图灵机器人有限公司 Picture recognition and data extension method for infants based on man-machine interaction
CN106529460A (en) * 2016-11-03 2017-03-22 贺江涛 Object classification identification system and identification method based on robot side
CN108647588A (en) * 2018-04-24 2018-10-12 广州绿怡信息科技有限公司 Goods categories recognition methods, device, computer equipment and storage medium
CN108550107A (en) * 2018-04-27 2018-09-18 Oppo广东移动通信有限公司 A kind of image processing method, picture processing unit and mobile terminal
CN109993138A (en) * 2019-04-08 2019-07-09 北京易华录信息技术股份有限公司 A kind of car plate detection and recognition methods and device
CN110647844A (en) * 2019-09-23 2020-01-03 深圳一块互动网络技术有限公司 Shooting and identifying method for articles for children
CN111783675A (en) * 2020-07-03 2020-10-16 郑州迈拓信息技术有限公司 Intelligent city video self-adaptive HDR control method based on vehicle semantic perception

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种面向机动目标跟踪的交互式多模型算法;王美健 等;《计算机应用与软件》;第34卷(第5期);第211-216页 *

Also Published As

Publication number Publication date
CN112464894A (en) 2021-03-09

Similar Documents

Publication Publication Date Title
WO2020221298A1 (en) Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus
CN109146892B (en) Image clipping method and device based on aesthetics
CN111814902A (en) Target detection model training method, target identification method, device and medium
CN107832700A (en) A kind of face identification method and system
CN110263215B (en) Video emotion positioning method and system
CN109033955B (en) Face tracking method and system
CN107133567B (en) woundplast notice point selection method and device
CN111291887A (en) Neural network training method, image recognition method, device and electronic equipment
CN112836625A (en) Face living body detection method and device and electronic equipment
CN111612004A (en) Image clipping method and device based on semantic content
CN111401238A (en) Method and device for detecting character close-up segments in video
CN113837065A (en) Image processing method and device
CN111178146A (en) Method and device for identifying anchor based on face features
CN111814690A (en) Target re-identification method and device and computer readable storage medium
CN114494775A (en) Video segmentation method, device, equipment and storage medium
CN111062388B (en) Advertisement character recognition method, system, medium and equipment based on deep learning
CN111539390A (en) Small target image identification method, equipment and system based on Yolov3
CN112464894B (en) Interaction method and device and computer equipment
CN112257628A (en) Method, device and equipment for identifying identities of outdoor competition athletes
CN111539435A (en) Semantic segmentation model construction method, image segmentation equipment and storage medium
CN112084874B (en) Object detection method and device and terminal equipment
EP4086786A1 (en) Video processing method, video searching method, terminal device, and computer-readable storage medium
CN113139629A (en) Font identification method and device, electronic equipment and storage medium
CN108021918B (en) Character recognition method and device
CN112884866A (en) Coloring method, device, equipment and storage medium for black and white video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant