CN111931762B

CN111931762B - AI-based image recognition solution method, device and readable storage medium

Info

Publication number: CN111931762B
Application number: CN202011021413.XA
Authority: CN
Inventors: 不公告发明人
Original assignee: Guangzhou Bairui Network Technology Co ltd
Current assignee: Guangzhou Bairui Network Technology Co ltd
Priority date: 2020-09-25
Filing date: 2020-09-25
Publication date: 2021-07-30
Anticipated expiration: 2040-09-25
Also published as: CN111931762A

Abstract

The invention discloses an image recognition solution method and device based on AI and a readable storage medium, relating to the technical field of image recognition, wherein the image recognition solution method based on AI comprises the following steps: acquiring a user indication image; obtaining a corresponding preset instruction trigger through at least one superior prediction model; updating the characteristic identifier of a corresponding second instruction trigger object in the user indication image based on the obtained preset instruction trigger identifier of at least one superior prediction model; based on the user indication image after updating the at least one feature identifier, training an image recognition model, so that the image recognition model can predict the control instruction corresponding to the first instruction trigger object and the at least one second instruction trigger object based on the user real-time image acquired by the user terminal, and execute the control instruction corresponding to the control instruction prediction result based on the control instruction prediction result, thereby realizing a scheme of control instruction triggering based on image recognition.

Description

AI-based image recognition solution method, device and readable storage medium

Technical Field

The invention relates to the technical field of image recognition, in particular to an AI-based image recognition solution method, an AI-based image recognition solution device and a readable storage medium.

Background

At present, with the popularization of intelligent mobile terminals and the innovation of technical means, functions such as video call, live broadcast and the like are gradually created. Compared with the method for sending characters or voice, the method for directly carrying out video interaction among users accords with the current development trend. In the process of video interaction of a user, when the user needs to use some related instructions, most of the devices also realize triggering related functions through manual key pressing or simple voice input. In actual use, it may be inconvenient for the user to directly operate the device by hands, and the voice input may conflict with the functions being used (such as calling, watching video or live broadcasting). In the prior art, there is no mature scheme for triggering the corresponding instruction by identifying the image containing the user action.

In view of the above, it is necessary for those skilled in the art to provide a solution that can solve the control instruction triggering based on image recognition.

Disclosure of Invention

The invention aims to provide an AI-based image recognition solution method, an AI-based image recognition solution device and a readable storage medium.

In a first aspect, an embodiment of the present invention provides an AI-based image recognition solution, where the method includes:

obtaining a user-directed image of an image recognition model for image recognition, the user-directed image including at least two feature identifiers corresponding to an instruction trigger object, the instruction trigger object including: the image acquisition time range of the second instruction trigger object is larger than that of the first instruction trigger object;

respectively inputting the user indication image into at least one superior prediction model, wherein each superior prediction model is used for predicting one second instruction trigger object;

respectively carrying out second instruction trigger object prediction on the user indication image through the at least one superior prediction model to obtain corresponding preset instruction trigger marks;

updating the feature identifier of a corresponding second instruction trigger object in the user indication image based on the obtained preset instruction trigger identifier of the at least one superior prediction model to obtain the user indication image with the updated at least one feature identifier;

and training the image recognition model based on the user indication image after updating the at least one feature identifier, so that the image recognition model can predict the control instruction corresponding to the first instruction trigger object and the at least one second instruction trigger object based on the acquired user real-time image, and execute the control instruction corresponding to the control instruction prediction result based on the control instruction prediction result.

Optionally, the acquiring a user-indicated image of an image recognition model for image recognition includes: acquiring data of a user real-time image corresponding to the first instruction trigger object and data corresponding to the at least one second instruction trigger object based on the image acquisition time range of the first instruction trigger object;

a user-directed image of the image recognition model is constructed based on the acquired data.

Optionally, the updating, based on the obtained preset instruction trigger of the at least one upper-level prediction model, a feature identifier of a corresponding second instruction trigger object in the user indication image to obtain the user indication image after updating the at least one feature identifier includes:

and respectively marking the preset instruction trigger marks of the superior prediction models as the feature marks of corresponding second instruction trigger objects in the user indication image so as to update the feature marks of the corresponding second instruction trigger objects in the user indication image and obtain the user indication image with at least one updated feature mark.

Optionally, before the user indication images are respectively input into at least one upper prediction model, the method further includes:

acquiring a user indication image of the at least one superior prediction model, wherein the user indication image of each superior prediction model is obtained by sampling based on the image acquisition time range of the corresponding second instruction trigger object and at least comprises a feature identifier corresponding to the corresponding second instruction trigger object;

respectively inputting the user indication image of each superior prediction model into the corresponding superior prediction model, and predicting the second instruction trigger object through the corresponding superior prediction model to obtain a corresponding preset instruction trigger identifier;

determining the value of a loss function of each superior prediction model based on the obtained preset instruction trigger identifier and the characteristic identifier marked by the user indication image of each superior prediction model;

updating model parameters of the corresponding upper prediction model based on the value of the loss function of each upper prediction model, so that the upper prediction model can predict the corresponding second instruction trigger object based on the acquired real-time user image.

Optionally, the user real-time image includes information of a plurality of vectors of the user, and the user real-time image is a specific motion image;

the method further comprises the step of verifying the real-time image of the user, comprising:

acquiring a plurality of collected user real-time image frames, wherein at least one same vector information exists between any two user real-time image frames, matching the same vector information between any two user real-time image frames, and acquiring at least one group of instruction actions if the same vector information is matched; alternatively, the first and second electrodes may be,

sending an action verification trigger to an action analysis server, wherein the action verification trigger is used for triggering the action analysis server to authenticate the characteristic indexes in the information of the vectors;

when receiving the information that the authentication returned by the action analysis server according to the action verification trigger identifier passes, executing the step of obtaining at least one group of instruction actions; alternatively, the first and second electrodes may be,

cutting the specific action image according to an image cutting technology to obtain a local action image; identifying the local motion image according to a picture capturing technology to obtain the information of the plurality of structured vectors;

sending a security protocol to a user terminal, wherein the security protocol is used for requesting the user terminal to authorize an action analysis server to acquire an instruction action in one action capture server;

if receiving the information of confirming authorization returned by the user terminal according to the security protocol, respectively obtaining instruction actions consistent with the information of a plurality of vectors in the plurality of vectors from at least one action capturing server, and executing the step of obtaining at least one group of instruction actions;

matching the information of each vector in the at least one group of instruction actions with the information of the plurality of vectors respectively to obtain at least one group of action reference data;

counting confidence coefficient reference coefficients corresponding to each group of action reference data according to information whether the information of each vector in each group of action reference data is matched or not matched and a preset confidence evaluation rule, wherein the preset confidence evaluation rule comprises: for one matched information in a group of action reference data, the corresponding configuration is standard action data; for information which is not matched with one item in the action reference data, the action reference data is correspondingly configured into abnormal action data; if one item of information in the group of action reference data is not matched, the action reference data is correspondingly configured into undetermined action data; the confidence coefficient reference coefficient corresponding to each group of action reference data is a sum of reference coefficients corresponding to action reference data of information of each vector in a group of instruction actions, wherein each group of action reference data comprises: information whether information of each vector in a set of instruction actions matches or does not match information of the plurality of vectors;

respectively calculating the ratio of the confidence coefficient reference coefficient corresponding to each group of action reference data to the maximum confidence coefficient reference coefficient corresponding to the corresponding group of action reference data;

taking the sum of the ratios corresponding to each group of motion reference data as the motion confidence, or taking the weighted sum of the ratios corresponding to each group of motion reference data as the motion confidence;

and if the action confidence coefficient is within a preset confidence coefficient threshold value, the real-time image of the user passes the verification.

Optionally, the training the image recognition model based on the updated user indication image of the at least one feature identifier includes:

predicting the instruction trigger object for the user indication image with the updated at least one feature identifier through the image recognition model to obtain a control instruction prediction result;

acquiring the difference between the control instruction prediction result of each instruction trigger object and the characteristic identifier corresponding to the corresponding instruction trigger object;

determining the value of a loss function corresponding to the corresponding instruction trigger object in the image recognition model based on the corresponding difference of each instruction trigger object;

when the value of the loss function corresponding to each instruction trigger object exceeds the corresponding loss threshold value, determining a deviation vector of the corresponding instruction trigger object based on the loss function corresponding to each instruction trigger object;

and propagating each deviation vector in the image recognition model in a reverse direction, and updating model parameters of each neural network layer in the image recognition model in the process of propagation.

Optionally, the image recognition model includes a multiple input layer, an image extraction layer, an image stitching layer, and a prediction layer, and the propagating each of the deviation vectors in the image recognition model in a reverse direction and updating the model parameters of each of the neural network layers in the image recognition model in a process of propagation includes:

sequentially transmitting the deviation vector of the first instruction trigger object to the prediction layer, the image splicing layer, the image extraction layer and the multi-input layer so as to realize the backward transmission of the deviation vector of the first instruction trigger object in the image recognition model;

transmitting the deviation vector of the second instruction trigger object to the prediction layer, the image splicing layer and the image extraction layer in sequence;

blocking the offset vector of the second instruction trigger object so that the offset vector of the second instruction trigger object cannot be propagated to the multi-input layer;

and updating the model parameters of each layer in the image recognition model in the process of back propagation of the deviation vector of the first instruction trigger object and the deviation vector of the second instruction trigger object.

In a second aspect, an embodiment of the present invention provides an AI-based image recognition solution apparatus, which is applied to a computer device, where the computer device is in communication connection with a user terminal, and the apparatus includes:

an obtaining module configured to obtain a user-indicated image of an image recognition model for image recognition, the user-indicated image including at least two feature identifiers corresponding to an instruction trigger object, the instruction trigger object including: the image acquisition time range of the second instruction trigger object is larger than that of the first instruction trigger object;

the updating module is used for respectively inputting the user indication images into at least one upper prediction model, and each upper prediction model is used for predicting one second instruction trigger object; respectively carrying out second instruction trigger object prediction on the user indication image through the at least one superior prediction model to obtain corresponding preset instruction trigger marks; updating the feature identifier of a corresponding second instruction trigger object in the user indication image based on the obtained preset instruction trigger identifier of the at least one superior prediction model to obtain the user indication image with the updated at least one feature identifier;

and the execution module is used for training the image recognition model based on the user indication image after the at least one feature identifier is updated, so that the image recognition model can predict the control instruction corresponding to the first instruction trigger object and the at least one second instruction trigger object based on the acquired user real-time image, and execute the control instruction corresponding to the control instruction prediction result based on the control instruction prediction result.

In a third aspect, an embodiment of the present invention provides a computer device, where the computer device includes a processor and a non-volatile memory storing computer instructions, and when the computer instructions are executed by the processor, the computer device executes the AI-based image recognition solution of the first aspect.

In a fourth aspect, an embodiment of the present invention provides a readable storage medium, where the readable storage medium includes a computer program, and the computer program controls, when executed, a computer device in which the readable storage medium is located to perform the AI-based image recognition solution method according to the first aspect.

Compared with the prior art, the beneficial effects provided by the invention comprise: with the AI-based image recognition solution method, device and readable storage medium provided by embodiments of the present invention, by obtaining a user instruction image of an image recognition model for image recognition, the user instruction image includes at least two feature identifiers corresponding to an instruction trigger object, and the instruction trigger object includes: the image acquisition time range of the second instruction trigger object is larger than that of the first instruction trigger object; respectively inputting the user indication image into at least one upper prediction model, wherein each upper prediction model is used for predicting one second instruction trigger object; respectively predicting a second instruction trigger object of the user indication image through the at least one superior prediction model to obtain a corresponding preset instruction trigger identifier; then, based on the obtained preset instruction trigger identifier of the at least one superior prediction model, updating the feature identifier of a corresponding second instruction trigger object in the user indication image to obtain the user indication image with the updated at least one feature identifier; and training the image recognition model based on the user indication image after updating the at least one feature identifier, so that the image recognition model can predict the control instruction corresponding to the first instruction trigger object and the at least one second instruction trigger object based on the user real-time image acquired by the user terminal, execute the control instruction corresponding to the control instruction prediction result based on the control instruction prediction result, and skillfully solve the problem of triggering the corresponding control instruction based on the user real-time image.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments will be briefly described below. It is appreciated that the following drawings depict only certain embodiments of the invention and are therefore not to be considered limiting of its scope. For a person skilled in the art, it is possible to derive other relevant figures from these figures without inventive effort.

FIG. 1 is an interactive schematic diagram of an AI-based image recognition solution system according to an embodiment of the invention;

FIG. 2 is a flowchart illustrating steps of an AI-based image recognition solution according to an embodiment of the invention;

FIG. 3 is a block diagram schematically illustrating an AI-based image recognition solution apparatus according to an embodiment of the present invention;

fig. 4 is a block diagram schematically illustrating a structure of a computer device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

Furthermore, the terms "first," "second," and the like are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it is also to be noted that, unless otherwise explicitly stated or limited, the terms "disposed" and "connected" are to be interpreted broadly, and for example, "connected" may be a fixed connection, a detachable connection, or an integral connection; can be mechanically or electrically connected; the connection may be direct or indirect via an intermediate medium, and may be a communication between the two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

The following detailed description of embodiments of the invention refers to the accompanying drawings.

Fig. 1 is an interactive schematic diagram of an AI-based image recognition solution system 10 according to an embodiment of the present disclosure. The AI-based image recognition solution system 10 may include a computer device 100 and a user terminal 200 communicatively connected to the computer device 100. The AI-based image recognition solution system 10 shown in FIG. 1 is but one possible example, and in other possible embodiments, the AI-based image recognition solution system 10 may include only some of the components shown in FIG. 1 or may include additional components.

In this embodiment, the user terminal 200 may comprise a mobile device, a tablet computer, a laptop computer, etc., or any combination thereof. In some embodiments, the mobile device may include a smart home device, a wearable device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. In some embodiments, the smart home devices may include control devices of smart electrical devices, smart monitoring devices, smart televisions, smart cameras, and the like, or any combination thereof. In some embodiments, the wearable device may include a smart bracelet, a smart lace, smart glass, a smart helmet, a smart watch, a smart garment, a smart backpack, a smart accessory, or the like, or any combination thereof. In some embodiments, the smart mobile device may include a smartphone, a personal digital assistant, a gaming device, and the like, or any combination thereof. In some embodiments, the virtual reality device and/or the augmented reality device may include a virtual reality helmet, virtual reality glass, a virtual reality patch, an augmented reality helmet, augmented reality glass, an augmented reality patch, or the like, or any combination thereof. For example, the virtual reality device and/or augmented reality device may include various virtual reality products and the like.

In this embodiment, the internet-of-things cloud computer device 100 and the user terminal 200 in the AI-based image recognition solution system 10 may cooperatively perform the AI-based image recognition solution described in the following method embodiments, and the detailed description of the specific method embodiments may be referred to in the execution steps of the computer device 100 and the user terminal 200.

To solve the technical problem in the background art, fig. 2 is a flowchart illustrating an AI-based image recognition solution provided by an embodiment of the disclosure, which can be executed by the computer device 100 shown in fig. 1, and the AI-based image recognition solution is described in detail below.

Step 201, acquiring a user instruction image of an image recognition model for image recognition.

Wherein the user indication image comprises at least two feature identifiers corresponding to an instruction trigger object, the instruction trigger object comprising: the image acquisition time range of the second instruction trigger object is larger than that of the first instruction trigger object.

Step 202, inputting the user indication image into at least one upper prediction model respectively, wherein each upper prediction model is used for predicting a second instruction trigger object.

And 203, respectively carrying out second instruction trigger object prediction on the user indication image through at least one superior prediction model to obtain corresponding preset instruction trigger marks.

And 204, updating the feature identifier of the corresponding second instruction trigger object in the user indication image based on the obtained preset instruction trigger identifier of the at least one superior prediction model to obtain the user indication image with the updated at least one feature identifier.

Step 205, training an image recognition model based on the user indication image after updating the at least one feature identifier, so that the image recognition model can perform control instruction prediction corresponding to the first instruction trigger object and the at least one second instruction trigger object based on the user real-time image acquired by the user terminal 200, and execute a control instruction corresponding to the control instruction prediction result based on the control instruction prediction result.

In the embodiment of the present invention, the computer device 100 and the user terminal 200 may be matched one to one, and in other implementations of the embodiment of the present invention, the computer device 100 may serve a plurality of user terminals 200. The user instruction image of the image recognition model for image recognition may be acquired first, and the user instruction image may be pre-stored in the motion which the user thinks of by himself or in the motion which the user refers to. The user-indicated image includes at least two feature identifiers corresponding to instruction-triggered objects, the instruction-triggered objects including: the image acquisition time range of the second instruction trigger object is larger than that of the first instruction trigger object, and real-time images of users corresponding to the second instruction trigger object can form complete actions. The user-directed image may be input into at least one superior model for training, the superior model may be an initial classification model, and a superior prediction model predicts a second instruction-triggered object in order to achieve the training accuracy. And predicting the second instruction trigger object through the superior model to obtain a preset instruction trigger identifier of the second instruction trigger object. After the preset instruction trigger of the second instruction trigger object is obtained, the feature identifier of the second instruction trigger object of the user indication image can be updated. The user-directed image with updated features may be input to an image recognition model for training. Through the above steps, a model for identifying the user real-time image acquired by the user terminal 200 can be obtained, the control instruction prediction of the first instruction trigger object and the at least one second instruction trigger object can be performed according to the identification result of the user real-time image, so as to execute the control instruction corresponding to the control instruction prediction result based on the control instruction prediction result, the control instruction may be a corresponding operation executed by the computer device 100 itself, or may be an instruction issued by the computer device 100 to the user terminal 200 for controlling the user terminal 200, which is not limited herein. Through the steps, triggering of the control instruction based on the user real-time image acquired by the user terminal 200 device can be provided.

On the basis of the foregoing, as an alternative embodiment, the foregoing step 201 may be implemented by the following specific embodiments.

The substep 201-1 is to collect data of the user real-time image corresponding to the first instruction trigger object and data corresponding to at least one second instruction trigger object based on the image acquisition time range of the first instruction trigger object.

Sub-step 202-2, constructing a user-directed image of the image recognition model based on the acquired data.

It should be understood that the control command can be triggered by a continuous action generated by a user, so that the data of the real-time user image corresponding to the first command trigger object and the data of the at least one second command trigger object can be collected based on the image acquisition time range of the first command trigger object, and then the collected data is used for constructing the user indication image of the image recognition model.

In order to more clearly describe the present solution, a specific implementation of the foregoing step 204 is provided below.

And a substep 204-1, respectively labeling the preset instruction trigger marks of the superior prediction models as the feature marks of corresponding second instruction trigger objects in the user indication image, so as to update the feature marks of the corresponding second instruction trigger objects in the user indication image, and obtain the user indication image with at least one updated feature mark.

Prior to the foregoing step 202, embodiments of the present invention may also include the following detailed description.

Step 206, a user indication image of at least one superior prediction model is obtained.

The user indication images of the superior prediction models are obtained by sampling based on the image acquisition time range of the corresponding second instruction trigger object, and at least comprise the feature identifications corresponding to the corresponding second instruction trigger objects.

And step 207, respectively inputting the user indication images of the upper prediction models into the corresponding upper prediction models, and predicting a second instruction trigger object through the corresponding upper prediction models to obtain corresponding preset instruction trigger marks.

And 208, determining the value of the loss function of each superior prediction model based on the obtained preset instruction trigger identifier and the characteristic identifier marked by the user indication image of each superior prediction model.

And 209, updating the model parameters of the corresponding upper prediction model based on the loss function value of each upper prediction model, so that the upper prediction model can predict the corresponding second instruction trigger object based on the acquired real-time user image.

In combination with the above steps, embodiments of the present invention provide a detailed training step based on a superior prediction model, which may obtain a user indication image of at least one superior prediction model, as described above, one superior prediction model correspondingly processes one second instruction trigger object, and may respectively input the user indication image of each superior prediction model to the corresponding superior prediction model, and perform prediction on the second instruction trigger object through the corresponding superior prediction model to obtain a corresponding preset instruction trigger, and determine a loss function (loss) value of each superior prediction model, according to the preset instruction trigger and the feature identifier marked by the user indication image of each superior prediction model. Therefore, model parameters of the corresponding upper-level prediction model can be updated based on the loss function value of each upper-level prediction model, so that the upper-level prediction model can predict the corresponding second instruction trigger object based on the acquired real-time user image. Through the steps, the characteristic identification of the acquired second instruction trigger object can be utilized to continuously train the superior prediction model until the superior prediction model can predict the corresponding second instruction trigger object based on the acquired real-time user image.

On the basis of the above, the user real-time image comprises information of a plurality of vectors of the user, and the user real-time image is a specific motion image. Since the real-time image acquisition of the user is a continuous process, the step of verifying whether the acquired real-time image of the user is qualified under the embodiment of the present invention is provided below to avoid the recognition of the unintentional motion or the irregular motion of the user and the waste of the memory calculation amount of the computer device 100.

Step 301, acquiring a plurality of collected user real-time image frames.

Wherein, any two user real-time image frames have at least one same vector of information therebetween.

Step 302, matching the information of the same vector between any two real-time image frames of the user, and obtaining at least one group of instruction actions if the information of each same vector is matched. Or steps 303-304 are performed.

Step 303, sending a motion verification trigger to the motion analysis server, where the motion verification trigger is used to trigger the motion analysis server to authenticate the feature indicators in the information of the multiple vectors.

And step 304, when receiving the information that the authentication returned by the action analysis server according to the action verification trigger identifier passes, executing the step of obtaining at least one group of instruction actions. Or steps 305 to 308 are performed.

And 305, cutting the specific motion image according to an image cutting technology to obtain a local motion image.

And step 306, identifying the local motion image according to a picture capturing technology to obtain information of a plurality of structured vectors.

Step 307, sending a security protocol to the user terminal 200, where the security protocol is used to request the user terminal 200 to authorize the motion analysis server to obtain the command motion in a certain motion capture server.

Step 308, if receiving the information for confirming authorization returned by the user terminal 200 according to the security protocol, obtaining instruction actions consistent with the information of a plurality of vectors in the plurality of vectors from at least one action capture server, and executing the step of obtaining at least one group of instruction actions.

Step 309, matching the information of each vector in at least one group of instruction actions with the information of a plurality of vectors respectively to obtain at least one group of action reference data.

And step 310, counting confidence coefficient reference coefficients corresponding to each group of motion reference data according to information whether the information of each vector in each group of motion reference data is matched or not and preset confidence evaluation rules.

Wherein, presetting the confidence evaluation rule comprises: for one matched information in a group of action reference data, the corresponding configuration is standard action data; for information which is not matched with one item in the action reference data, the action reference data is correspondingly configured into abnormal action data; if one item of information in the group of action reference data is not matched, the action reference data is correspondingly configured into undetermined action data; the confidence coefficient reference coefficient corresponding to each group of action reference data is the sum of the reference coefficients corresponding to the action reference data of the information of each vector in a group of instruction actions, wherein each group of action reference data comprises: and whether the information of each vector in the instruction action group is matched with or not matched with the information of the plurality of vectors.

Step 311, respectively calculating a ratio of the confidence coefficient reference coefficient corresponding to each group of motion reference data to the maximum confidence coefficient reference coefficient corresponding to the corresponding group of motion reference data.

In step 312, the sum of the ratios corresponding to each group of motion reference data is used as the motion confidence, or the weighted sum of the ratios corresponding to each group of motion reference data is used as the motion confidence.

And 313, if the action confidence coefficient is within the preset confidence coefficient threshold, the real-time image of the user passes the verification.

In order to avoid the condition of instruction false triggering caused by inaccurate identification, the collected real-time image of the user can be verified. Specifically, a plurality of user real-time image frames may be collected, and at least one piece of same-vector information exists between any two user real-time image frames, that is, the plurality of user real-time image frames are consecutive motions generated by pointing to a user (that is, information of each same vector is matched). In the embodiment of the invention, three ways can be adopted for verification, (1) the action verification trigger can be sent to the action analysis server, so that the action analysis server can authenticate the characteristic indexes in the information of a plurality of vectors. (2) The action verification trigger can be sent to the action analysis server, the action analysis server judges the action verification trigger, and when the information that the authentication returned by the action analysis server according to the action verification trigger is passed is received, the step of obtaining at least one group of instruction actions is executed. (3) The image segmentation technology can be used for segmenting the specific motion image to obtain a local motion image, and the local motion image can be processed according to the picture capturing technology to obtain information of a plurality of structured vectors. And simultaneously, a security protocol can be sent to the user terminal 200 to inform the user of action verification, and when receiving authorization confirmation information returned by the user terminal 200 according to the security protocol, instruction actions consistent with the information of a plurality of vectors in the plurality of vectors are respectively acquired from at least one action capturing server, and the step of obtaining at least one group of instruction actions is executed. After the step of obtaining at least one group of instruction actions is triggered by the foregoing scheme, the information of each vector in the at least one group of instruction actions may be respectively matched with the information of a plurality of vectors to obtain at least one group of action reference data, and according to the information whether the information of each vector in each group of action reference data is matched or not and the preset confidence evaluation rule, counting the confidence coefficient reference coefficient corresponding to each group of action reference data, respectively calculating the confidence coefficient reference coefficient corresponding to each group of action reference data, the ratio of the maximum confidence reference coefficients corresponding to the corresponding sets of motion reference data takes the sum of the ratios corresponding to each set of motion reference data as the motion confidence, or, taking the weighted sum of the ratios corresponding to each group of motion reference data as the motion confidence coefficient, and if the motion confidence coefficient is within the preset confidence coefficient threshold, the real-time image of the user passes the verification. Through the steps, whether the real-time image of the user can be used for subsequently triggering the related instruction or not can be reliably determined.

Based on the foregoing, in order to more clearly describe the scheme, a specific implementation of the foregoing step 205 is provided below.

And a substep 205-1, performing instruction triggering object prediction on the user indication image with at least one updated feature identifier through an image recognition model to obtain a control instruction prediction result.

And a sub-step 205-2 of obtaining the difference between the control instruction prediction result of each instruction trigger object and the feature identifier corresponding to the corresponding instruction trigger object.

And a substep 205-3 of determining a value of a loss function corresponding to the corresponding instruction trigger object in the image recognition model based on the difference corresponding to each instruction trigger object.

Sub-step 205-4, determining a deviation vector for each instruction trigger object based on the loss function corresponding to each instruction trigger object when the value of the loss function corresponding to each instruction trigger object exceeds the corresponding loss threshold.

And a substep 205-5 of propagating each deviation vector in the image recognition model in a reverse direction and updating model parameters of each neural network layer in the image recognition model in the process of propagation.

Through the steps, the control instruction prediction result can be obtained based on the image recognition model, and then the value of the loss function corresponding to the corresponding instruction trigger object in the image recognition model is determined based on the difference between the control instruction prediction result and the feature identifier corresponding to the corresponding instruction trigger object, so that the loss function corresponding to each instruction trigger object can be obtained, and the deviation vector of the corresponding instruction trigger object is determined. Finally, each deviation vector can be reversely propagated in the image recognition model, and the model parameters of each neural network layer in the image recognition model are updated in the propagation process, so that the accuracy of the image recognition model can be ensured.

On the basis that the image recognition model comprises a multi-input layer, an image extraction layer, an image splicing layer and a prediction layer, the foregoing sub-step 205-5 may specifically comprise the following specific embodiments.

(1) And sequentially transmitting the deviation vector of the first instruction trigger object to a prediction layer, an image splicing layer, an image extraction layer and a multi-input layer so as to realize the backward transmission of the deviation vector of the first instruction trigger object in the image recognition model.

(2) And the deviation vector of the second instruction trigger object is sequentially transmitted to the prediction layer, the image splicing layer and the image extraction layer.

(3) Blocking the offset vector of the second instruction trigger object so that the offset vector of the second instruction trigger object cannot propagate to the multi-input layer.

(4) And updating model parameters of each layer in the image recognition model in the process of back propagation of the deviation vector of the first instruction trigger object and the deviation vector of the second instruction trigger object.

Through the steps, the model parameters of each layer in the image recognition model can be specifically updated.

An AI-based image recognition solution apparatus 110 according to an embodiment of the present invention is applied to a computer device 100, wherein the computer device 100 is in communication connection with a user terminal 200, and as shown in fig. 3, the AI-based image recognition solution apparatus 110 includes:

an obtaining module 1101, configured to obtain a user-indicated image of an image recognition model for image recognition, where the user-indicated image includes at least two feature identifiers corresponding to an instruction trigger object, and the instruction trigger object includes: the image acquisition time range of the second instruction trigger object is larger than that of the first instruction trigger object;

an updating module 1102, configured to input the user indication image into at least one upper prediction model respectively, where each upper prediction model is used to predict a second instruction trigger object; respectively carrying out second instruction trigger object prediction on the user indication image through at least one superior prediction model to obtain corresponding preset instruction trigger marks; updating the feature identifier of a corresponding second instruction trigger object in the user indication image based on the obtained preset instruction trigger identifier of the at least one superior prediction model to obtain the user indication image after the at least one feature identifier is updated; the executing module 1103 is configured to train the image recognition model based on the user instruction image with the updated at least one feature identifier, so that the image recognition model can perform control instruction prediction corresponding to the first instruction trigger object and the at least one second instruction trigger object based on the acquired real-time user image, and execute a control instruction corresponding to the control instruction prediction result based on the control instruction prediction result.

Further, the obtaining module 1101 is specifically configured to:

acquiring data of a user real-time image corresponding to a first instruction trigger object and data corresponding to at least one second instruction trigger object based on an image acquisition time range of the first instruction trigger object; a user-directed image of the image recognition model is constructed based on the acquired data.

Further, the updating module 1102 is specifically configured to:

and respectively marking the preset instruction trigger marks of the superior prediction models as the feature marks of the corresponding second instruction trigger objects in the user indication image so as to update the feature marks of the corresponding second instruction trigger objects in the user indication image and obtain the user indication image with at least one updated feature mark.

Further, the obtaining module 1101 is further configured to:

acquiring a user indication image of at least one superior prediction model, wherein the user indication image of each superior prediction model is obtained by sampling based on the image acquisition time range of the corresponding second instruction trigger object and at least comprises a characteristic identifier corresponding to the corresponding second instruction trigger object; respectively inputting the user indication images of the upper prediction models into the corresponding upper prediction models, and predicting a second instruction trigger object through the corresponding upper prediction models to obtain corresponding preset instruction trigger marks; determining the value of a loss function of each superior prediction model based on the obtained preset instruction trigger identifier and the characteristic identifier marked by the user indication image of each superior prediction model; and updating the model parameters of the corresponding upper prediction model based on the loss function value of each upper prediction model, so that the upper prediction model can predict the corresponding second instruction trigger object based on the acquired user real-time image.

Further, the computer device 100 is also in communication connection with both the motion analysis server and the motion capture server, the user real-time image comprising information of a plurality of vectors of the user, the user real-time image being a specific motion image; the AI-based image recognition solution device 110 further includes a verification module for:

acquiring a plurality of collected user real-time image frames, wherein at least one same vector information exists between any two user real-time image frames, matching the same vector information between any two user real-time image frames, and executing the step of acquiring at least one group of instruction actions if the same vector information is matched; sending an action verification trigger to an action analysis server, wherein the action verification trigger is used for triggering the action analysis server to authenticate characteristic indexes in the information of the vectors; when receiving the information that the authentication returned by the action analysis server according to the action verification trigger identifier passes, executing the step of obtaining at least one group of instruction actions; cutting the specific action image according to an image cutting technology to obtain a local action image; then, identifying a local motion image according to a picture capturing technology to obtain information of a plurality of structured vectors; sending a security protocol to the user terminal 200, wherein the security protocol is used for requesting the user terminal 200 to authorize the action analysis server to acquire an instruction action in a certain action capture server; if receiving the information for confirming authorization returned by the user terminal 200 according to the security protocol, respectively acquiring instruction actions consistent with the information of a plurality of vectors in the plurality of vectors from at least one action capturing server to obtain at least one group of instruction actions; matching the information of each vector in at least one group of instruction actions with the information of a plurality of vectors respectively to obtain at least one group of action reference data; counting confidence coefficient reference coefficients corresponding to each group of action reference data according to information whether the information of each vector in each group of action reference data is matched or not matched and a preset confidence evaluation rule, wherein the preset confidence evaluation rule comprises the following steps: for one matched information in a group of action reference data, the corresponding configuration is standard action data; for information which is not matched with one item in the action reference data, the action reference data is correspondingly configured into abnormal action data; if one item of information in the group of action reference data is not matched, the action reference data is correspondingly configured into undetermined action data; the confidence coefficient reference coefficient corresponding to each group of action reference data is the sum of the reference coefficients corresponding to the action reference data of the information of each vector in a group of instruction actions, wherein each group of action reference data comprises: information whether the information of each vector in a group of instruction actions is matched or not matched with the information of a plurality of vectors; respectively calculating the ratio of the confidence coefficient reference coefficient corresponding to each group of action reference data to the maximum confidence coefficient reference coefficient corresponding to the corresponding group of action reference data; taking the sum of the ratios corresponding to each group of action reference data as an action confidence coefficient, or taking the weighted sum of the ratios corresponding to each group of action reference data as an action confidence coefficient; and if the action confidence coefficient is within the preset confidence coefficient threshold value, the real-time image of the user passes the verification.

Further, the execution module 1103 is specifically configured to:

predicting the user indication image after updating at least one characteristic mark through an image recognition model to obtain a control instruction prediction result; acquiring the difference between the control instruction prediction result of each instruction trigger object and the characteristic identifier corresponding to the corresponding instruction trigger object; determining the value of a loss function corresponding to the corresponding instruction trigger object in the image recognition model based on the corresponding difference of each instruction trigger object; when the value of the loss function corresponding to each instruction trigger object exceeds the corresponding loss threshold value, determining the deviation vector of the corresponding instruction trigger object based on the loss function corresponding to each instruction trigger object; and reversely propagating each deviation vector in the image recognition model, and updating model parameters of each neural network layer in the image recognition model in the process of propagation.

Further, the image recognition model includes a multiple input layer, an image extraction layer, an image stitching layer, and a prediction layer, and the execution module 1103 is further specifically configured to:

sequentially transmitting the deviation vector of the first instruction trigger object to a prediction layer, an image splicing layer, an image extraction layer and a multi-input layer so as to realize the backward transmission of the deviation vector of the first instruction trigger object in an image recognition model; the deviation vector of the second instruction trigger object is sequentially transmitted to a prediction layer, an image splicing layer and an image extraction layer; blocking the deviation vector of the second instruction trigger object, so that the deviation vector of the second instruction trigger object cannot be transmitted to the multi-input layer; and updating model parameters of each layer in the image recognition model in the process of back propagation of the deviation vector of the first instruction trigger object and the deviation vector of the second instruction trigger object.

It should be noted that, for the implementation principle of the AI-based image recognition solution apparatus 110, reference may be made to the implementation principle of the AI-based image recognition solution, and details are not repeated here. It should be understood that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the obtaining module 1101 may be a processing element separately set up, or may be implemented by being integrated into a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the processing element of the apparatus calls and executes the functions of the obtaining module 1101. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when some of the above modules are implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor that can call program code. As another example, these modules may be integrated together, implemented in the form of a system-on-a-chip (SOC).

The embodiment of the present invention provides a computer device 100, wherein the computer device 100 includes a processor and a non-volatile memory storing computer instructions, and when the computer instructions are executed by the processor, the computer device 100 executes the aforementioned AI-based image recognition solution. As shown in fig. 4, fig. 4 is a block diagram of a computer device 100 according to an embodiment of the present invention. The computer apparatus 100 includes an AI-based image recognition solution device 110, a memory 111, a processor 112, and a communication unit 113.

To facilitate the transfer or interaction of data, the elements of the memory 111, the processor 112 and the communication unit 113 are electrically connected to each other, directly or indirectly. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The AI-based image recognition solving means 110 includes at least one software function module which can be stored in the memory 111 in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the computer device 100. The processor 112 is configured to execute the acquisition module 1101 stored in the memory 111, for example, software functional modules and computer programs included in the acquisition module 1101.

An embodiment of the present invention provides a readable storage medium, where the readable storage medium includes a computer program, and the computer program controls, when running, a computer device 100 in which the readable storage medium is located to execute the foregoing AI-based image recognition solution.

In summary, with the AI-based image recognition solution method, the AI-based image recognition solution device, and the readable storage medium provided by the embodiments of the present invention, by acquiring a user instruction image of an image recognition model for image recognition, where the user instruction image includes at least two feature identifiers corresponding to an instruction trigger object, the instruction trigger object includes: the image acquisition time range of the second instruction trigger object is larger than that of the first instruction trigger object; respectively inputting the user indication image into at least one upper prediction model, wherein each upper prediction model is used for predicting one second instruction trigger object; respectively predicting a second instruction trigger object of the user indication image through the at least one superior prediction model to obtain a corresponding preset instruction trigger identifier; then, based on the obtained preset instruction trigger identifier of the at least one superior prediction model, updating the feature identifier of a corresponding second instruction trigger object in the user indication image to obtain the user indication image with the updated at least one feature identifier; and training the image recognition model based on the user indication image after updating the at least one feature identifier, so that the image recognition model can predict the control instruction corresponding to the first instruction trigger object and the at least one second instruction trigger object based on the user real-time image acquired by the user terminal, execute the control instruction corresponding to the control instruction prediction result based on the control instruction prediction result, and skillfully solve the problem of triggering the corresponding control instruction based on the user real-time image.

The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the disclosure and its practical applications, to thereby enable others skilled in the art to best utilize the disclosure and various embodiments with various modifications as are suited to the particular use contemplated. The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the disclosure and its practical applications, to thereby enable others skilled in the art to best utilize the disclosure and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. An AI-based image recognition solution, applied to a computer device, wherein the computer device is in communication connection with a user terminal, the method comprising:

training the image recognition model based on the user indication image after updating the at least one feature identifier, so that the image recognition model can predict control instructions corresponding to the first instruction trigger object and the at least one second instruction trigger object based on a user real-time image acquired by the user terminal, and execute the control instructions corresponding to the control instruction prediction result based on the control instruction prediction result;

the computer equipment is also in communication connection with both the motion analysis server and the motion capture server, the user real-time image comprises information of a plurality of vectors of a user, and the user real-time image is a specific motion image;

sending a security protocol to the user terminal, wherein the security protocol is used for requesting the user terminal to authorize an action analysis server to acquire an instruction action in a certain action capture server;

if receiving the information of confirming authorization returned by the user terminal according to the security protocol, respectively acquiring instruction actions consistent with the information of a plurality of vectors in the plurality of vectors from at least one action capturing server, and executing the step of acquiring at least one group of instruction actions;

counting confidence coefficient reference coefficients corresponding to each group of action reference data according to the information of the vectors matched with the information of each vector in each group of action reference data or the information of the vectors unmatched with the information of each vector in each group of action reference data and a preset confidence evaluation rule, wherein the preset confidence evaluation rule comprises: for one matched information in a group of action reference data, the corresponding configuration is standard action data; for information which is not matched with one item in the action reference data, the action reference data is correspondingly configured into abnormal action data; the confidence coefficient reference coefficient corresponding to each group of action reference data is a sum of reference coefficients corresponding to action reference data of information of each vector in a group of instruction actions, wherein each group of action reference data comprises: information of whether information of each vector in a set of instruction actions matches information of the plurality of vectors;

taking the sum of the ratios corresponding to each group of motion reference data as a motion confidence coefficient, or taking the weighted sum of the ratios corresponding to each group of motion reference data as the motion confidence coefficient;

if the action confidence coefficient is within a preset confidence coefficient threshold value, the real-time image of the user passes verification;

training the image recognition model based on the updated user-indicated image of the at least one feature identifier comprises:

reversely propagating each deviation vector in the image recognition model, and updating model parameters of each neural network layer in the image recognition model in the process of propagation;

the image recognition model comprises a multi-input layer, an image extraction layer, an image splicing layer and a prediction layer, wherein the step of propagating each deviation vector in the image recognition model in a reverse direction and updating model parameters of each neural network layer in the image recognition model in the process of propagation comprises the following steps:

2. The method of claim 1, wherein obtaining a user-directed image of an image recognition model for image recognition comprises:

acquiring data of a user real-time image corresponding to the first instruction trigger object and data corresponding to the at least one second instruction trigger object based on the image acquisition time range of the first instruction trigger object;

3. The method according to claim 1, wherein the updating, based on the obtained preset instruction trigger of the at least one superior prediction model, the feature identifier of the corresponding second instruction trigger object in the user indication image to obtain the user indication image after updating the at least one feature identifier comprises:

4. The method according to claim 1, wherein before the inputting the user indication image into at least one upper level prediction model respectively, the method further comprises:

5. An AI-based image recognition solution apparatus applied to a computer device, the computer device being in communication connection with a user terminal, the apparatus comprising:

the execution module is used for training the image recognition model based on the user indication image after the at least one feature identifier is updated, so that the image recognition model can predict the control instruction corresponding to the first instruction trigger object and the at least one second instruction trigger object based on the user real-time image acquired by the user terminal, and execute the control instruction corresponding to the control instruction prediction result based on the control instruction prediction result;

the acquisition module is further configured to:

cutting the specific action image according to an image cutting technology to obtain a local action image; identifying the local motion image according to a picture capturing technology to obtain the information of the plurality of structured vectors; sending a security protocol to the user terminal, wherein the security protocol is used for requesting the user terminal to authorize an action analysis server to acquire an instruction action in a certain action capture server; if receiving the information of confirming authorization returned by the user terminal according to the security protocol, respectively acquiring instruction actions consistent with the information of a plurality of vectors in the plurality of vectors from at least one action capturing server, and executing the step of acquiring at least one group of instruction actions; matching the information of each vector in the at least one group of instruction actions with the information of the plurality of vectors respectively to obtain at least one group of action reference data; counting confidence coefficient reference coefficients corresponding to each group of action reference data according to the information of the vectors matched with the information of each vector in each group of action reference data or the information of the vectors unmatched with the information of each vector in each group of action reference data and a preset confidence evaluation rule, wherein the preset confidence evaluation rule comprises: for one matched information in a group of action reference data, the corresponding configuration is standard action data; for information which is not matched with one item in the action reference data, the action reference data is correspondingly configured into abnormal action data; the confidence coefficient reference coefficient corresponding to each group of action reference data is a sum of reference coefficients corresponding to action reference data of information of each vector in a group of instruction actions, wherein each group of action reference data comprises: information of whether information of each vector in a set of instruction actions matches information of the plurality of vectors; respectively calculating the ratio of the confidence coefficient reference coefficient corresponding to each group of action reference data to the maximum confidence coefficient reference coefficient corresponding to the corresponding group of action reference data; taking the sum of the ratios corresponding to each group of motion reference data as a motion confidence coefficient, or taking the weighted sum of the ratios corresponding to each group of motion reference data as the motion confidence coefficient; if the action confidence coefficient is within a preset confidence coefficient threshold value, the real-time image of the user passes verification;

the execution module is specifically configured to:

the image recognition model comprises a multi-input layer, an image extraction layer, an image splicing layer and a prediction layer, and the execution module is further specifically configured to:

6. A computer device comprising a processor and a non-volatile memory having computer instructions stored thereon that, when executed by the processor, perform the AI-based image recognition solution of any of claims 1-4.

7. A readable storage medium, characterized in that the readable storage medium comprises a computer program which controls a computer device on which the readable storage medium is executed to execute the AI-based image recognition solution of any one of claims 1 to 4 when the computer program is executed.