CN117877070A

CN117877070A - Infant and pet interaction content assessment method, device, equipment and storage medium

Info

Publication number: CN117877070A
Application number: CN202410042841.2A
Authority: CN
Inventors: 陈辉; 熊章; 杜沛力; 张智; 胡国湖
Original assignee: Wuhan Xingxun Intelligent Technology Co ltd
Current assignee: Wuhan Xingxun Intelligent Technology Co ltd
Priority date: 2023-05-24
Filing date: 2023-05-24
Publication date: 2024-04-12
Also published as: CN116580427A; CN116580427B

Abstract

The invention relates to the technical field of image processing, solves the problem that in the prior art, a user cannot effectively evaluate the interactive content of an infant and a pet, and provides an infant and pet interactive content evaluation method, an infant and pet interactive content evaluation device, infant and pet interactive content evaluation equipment and a storage medium. The method comprises the following steps: acquiring a first target image of a plurality of frames including infants and pets in an infant care scene; inputting each first target image into a preset monocular depth estimation model based on self-supervision learning, and extracting a second target image of interaction between the infant and the pet; and respectively scoring the pet characteristics, infant characteristics and image quality characteristics in each second target image by using a preset interaction scoring rule, and outputting an interaction scoring result. The invention avoids resource waste caused by false starting of the equipment when no grading target appears, and realizes effective evaluation of the interaction content of the infant and the pet when the infant and the pet interact.

Description

Infant and pet interaction content assessment method, device, equipment and storage medium

The application is a divisional application of an invention patent application with application number 202310592158.1, which is filed on 24 months of 2023 and has the name of a method, a device and equipment for manufacturing an electronic album containing interactive contents of people and pets.

Technical Field

The invention relates to the technical field of image processing, in particular to a method, a device, equipment and a storage medium for evaluating interaction content of infants and pets.

Background

With the continuous improvement of the living standard of people and the transformation of the consumption of pets, more and more people start to raise the pets, wherein the types of the pets mainly include dogs and cats, and the average pet ownership rate is gradually improved. Pet owners are also very willing for infants to interact with pets in the pet raising process, hope to keep the fine moment when the infants and the pets meet, make an electronic album, and then share fine photos in the album with other users on various social software. The current electronic photo album is usually generated by automatically arranging the system according to the date, the user cannot select the photo album according to the photo quality, and the automatically generated photo album is often abandoned because the photo album is low in quality and does not accord with the preference of the user.

The prior Chinese patent CN113420708A provides a pet nursing method, a device, electronic equipment and a storage medium, and discloses the following characteristics: and determining state information corresponding to the pet based on the position information of the pet object in the target image, the quality of the target image, the position transformation, the posture change and the expression change of the pet object, and generating the pet album after performing at least one beautifying operation based on the state information corresponding to the pet. The characteristics are that the electronic album is generated based on the state information and the image quality of the pet only to care the pet, the state information of the infant and the interaction information of the infant and the pet are not considered, and the interaction content between the infant and the pet cannot be effectively evaluated.

Therefore, how to effectively evaluate the interaction content of the infant and the pet when the infant and the pet interact is a problem to be solved.

Disclosure of Invention

In view of the above, the present invention provides a method, apparatus, device and storage medium for evaluating the interactive contents of infants and pets, which are used for solving the problem that users cannot effectively evaluate the interactive contents of infants and pets when the infants and pets interact in the prior art.

The technical scheme adopted by the invention is as follows:

in a first aspect, the present invention provides a method for evaluating interactive contents between an infant and a pet, which is characterized in that the method includes:

acquiring a first target image of a plurality of frames including infants and pets in an infant care scene;

inputting each first target image into a preset monocular depth estimation model based on self-supervision learning, and extracting a second target image of interaction between the infant and the pet;

and respectively scoring the pet characteristics, infant characteristics and image quality characteristics in each second target image by using a preset interaction scoring rule, and outputting an interaction scoring result.

Preferably, inputting each first target image into a preset monocular depth estimation model based on self-supervised learning, and extracting a second target image of interaction between the infant and the pet includes:

Detecting each first target image by using a preset target position detection model, and identifying infant position information and pet position information in each first target image;

obtaining first central point position information of the minimum circumscribed rectangular frame of the infant and second central point position information of the minimum circumscribed rectangular frame of the pet according to the infant position information and the pet position information;

inputting the first central point position information and the second central point position information into a depth estimation model based on a Monodepth structure to obtain relative position information between the infant and the pet;

if the distance between the first center point and the second center point is smaller than the preset threshold value according to the relative position information, the interaction between the infant and the pet is considered to occur, and a second target image of the interaction between the infant and the pet is extracted.

Preferably, the scoring the pet features, the infant features and the image quality features in each of the second target images by using a preset interactive scoring rule, and outputting an interactive scoring result includes:

inputting each second target image into a preset pet action detection model, scoring the pet characteristics in each second target image according to the extracted pet limb key point information and the number of the pet limb key points, and outputting a first score;

Inputting each second target image into a preset face detection model, judging the shielding condition of the face of the infant and the expression of the infant, scoring the characteristics of the infant in each second target image according to the judging result, and outputting a second score;

scoring the definition and the light intensity of each second target image, and outputting a third score;

and weighting and summing the first score, the second score and the third score according to a preset scoring weight to obtain a total score as an interaction scoring result.

Preferably, the inputting each second target image into a preset pet action detection model, scoring the pet features in each second target image according to the extracted pet limb key point information and the number of the pet limb key points, and outputting the first score includes:

collecting images of various growth stages of family pets disclosed in a large number of societies in advance;

extracting pet action images with high user preference according to the user preference;

and constructing and training a pet action detection model based on a YoloV6s structure by taking the pet action image with high user preference as a training basis of the deep learning model.

Preferably, the inputting each second target image into a preset face detection model, judging the shielding condition of the face of the infant and the expression of the infant, scoring the infant features in each second target image according to the judging result, and outputting the second score includes:

acquiring the infant key point information, and judging the infant expression;

when the infant expression is crying, optimizing the second score, and obtaining crying duration of the infant;

and when the crying time is longer than a preset time threshold, a dangerous alarm is sent to the user.

Preferably, the acquiring the first target image including the infant and the pet for the multiple frames in the infant care scene includes:

acquiring a real-time video stream in an infant care scene, and decomposing the video stream into multi-frame images;

and detecting each frame of image by using a preset target detection model based on the YoloV6s structure, and extracting a plurality of frames of first target images including infants and pets according to detection results.

Preferably, after the scoring the pet features, the infant features and the image quality features in each of the second target images by using a preset interactive scoring rule, the method further includes:

Extracting images with total scores higher than a preset threshold value as high-quality images according to the interaction scoring result;

comparing differences among the pet characteristics, the infant characteristics and the image quality characteristics of each high-quality image and each preset replacement image respectively;

according to the difference, adjusting the scoring weight, and generating a new high-quality image according to the adjusted scoring weight;

and outputting each new high-quality image to generate the electronic album.

In a second aspect, the present invention provides an infant and pet interactive content assessment device, the device comprising:

the image acquisition module is used for acquiring a plurality of frames of first target images including infants and pets in an infant care scene;

the interaction identification module is used for inputting each first target image into a preset monocular depth estimation model based on self-supervision learning, and extracting a second target image of interaction between the infant and the pet;

and the interaction scoring module is used for scoring the pet characteristics, the infant characteristics and the image quality characteristics in each second target image respectively by utilizing a preset interaction scoring rule and outputting an interaction scoring result.

In a third aspect, an embodiment of the present invention further provides an electronic device, including: at least one processor, at least one memory and computer program instructions stored in the memory, which when executed by the processor, implement the method as in the first aspect of the embodiments described above.

In a fourth aspect, embodiments of the present invention also provide a storage medium having stored thereon computer program instructions which, when executed by a processor, implement a method as in the first aspect of the embodiments described above.

In summary, the beneficial effects of the invention are as follows:

the invention provides a method, a device, equipment and a storage medium for evaluating interaction content of infants and pets, wherein the method comprises the following steps: acquiring a first target image of a plurality of frames including infants and pets in an infant care scene; inputting each first target image into a preset monocular depth estimation model based on self-supervision learning, and extracting a second target image of interaction between the infant and the pet; and respectively scoring the pet characteristics, infant characteristics and image quality characteristics in each second target image by using a preset interaction scoring rule, and outputting an interaction scoring result. The monocular depth estimation model based on self-supervision learning is used, so that depth information in an image can be extracted, and the relative position and interaction condition between an infant and a pet can be better understood; a preset interactive scoring rule is formulated, so that objective evaluation of an interactive scene is facilitated, scoring in aspects of pet characteristics, infant characteristics, image quality and the like is facilitated, and the objectivity and consistency of evaluation are improved; the depth estimation model based on self-supervision learning is used for training under the condition of no manually marked depth information, so that the burden of data marking is reduced, and the expandability is improved; therefore, the invention provides timely feedback for caretakers by monitoring and effectively evaluating the interaction content of infants and pets in real time, and the automatic evaluation is helpful for relieving the workload of the caretakers, so that the caretakers are more focused on providing high-quality care; personalized care advice is provided for each infant and pet by evaluating interactive behavior.

Drawings

In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described, and it is within the scope of the present invention to obtain other drawings according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart illustrating the overall operation of the method for evaluating the interactive contents of infants and pets according to embodiment 1 of the present invention;

FIG. 2 is a flow chart of scoring the interaction condition in embodiment 1 of the present invention;

FIG. 3 is a flow chart of the method for determining whether an infant interacts with a pet according to embodiment 1 of the present invention;

FIG. 4 is a flow chart of determining whether interaction occurs between an infant and a pet according to a depth map in embodiment 1 of the present invention;

FIG. 5 is a flow chart of scoring pet characteristics, infant characteristics and image quality characteristics according to embodiment 1 of the present invention;

fig. 6 is a schematic flow chart of generating an electronic album in embodiment 1 of the present invention;

FIG. 7 is a flow chart of the preset adjustment in the embodiment 1 of the present invention;

FIG. 8 is a block diagram illustrating an apparatus for evaluating content of infant and pet interactions in accordance with embodiment 3 of the present invention;

Fig. 9 is a schematic structural diagram of an electronic device in embodiment 4 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. In the description of the present invention, it should be understood that the terms "center," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate description of the present application and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element. If not conflicting, the embodiments of the present invention and the features of the embodiments may be combined with each other, which are all within the protection scope of the present invention.

Example 1

Referring to fig. 1, embodiment 1 of the invention discloses a method for evaluating interaction content between an infant and a pet, which comprises the following steps:

s1: acquiring a real-time video stream in an infant care scene, and decomposing the video stream into multi-frame images;

specifically, a real-time video stream collected by a camera on nursing equipment is obtained, and the video stream comprises a daytime color video stream and a night infrared video stream, so that uninterrupted nursing for twenty-four hours of infants can be realized, the video stream is decomposed into multi-frame images, and the images possibly comprise infant information, pet information, parent information and other target information, and further detection and analysis are needed.

S2: detecting each frame of image by using a preset target detection model, and extracting a plurality of frames of first target images including infants and pets according to detection results;

specifically, before the care equipment leaves the factory, a large number of socially disclosed images of infants and household common pets (such as cats and dogs) at each growth stage are collected and analyzed, the images are mainly focused on images of daily activities of the infants and the pets, a neural network detection algorithm model based on a YoloV6s structure is constructed and trained by taking the images as training data of a deep learning model, multi-frame images decomposed by real-time video streams are detected by utilizing the neural network detection algorithm model, whether the infants and the pets appear in each frame image is identified, and the images of the infants and the pets appear are extracted as first target images. Before interactive scoring, whether infants and pets appear in the images is detected, images without infants and pets are not extracted and analyzed, unnecessary resource waste is avoided, and interactive analysis efficiency of a subsequent process is improved.

S3: and scoring the interaction condition of the infants and the pets in each first target image according to a preset interaction scoring rule, and extracting each frame of high-quality image to generate an electronic album according to a scoring result.

Specifically, the first target images of the infants and the pets are obtained, the interaction conditions of the infants and the pets in the first target images are further scored, and the generated high-quality images of all frames are comprehensively and comprehensively evaluated according to the preset scoring rules, so that the requirements of parents and other users on recording the fine and instant interaction time of the infants and the pets can be well met, and the use experience of the users is improved.

In one embodiment, referring to fig. 2, the step S3 includes:

s31: inputting each first target image into a preset monocular depth estimation model based on self-supervision learning, and judging whether the infant and the pet interact with each other or not;

specifically, because visual differences exist between the two targets in the images and under the real scene, especially the longitudinal distance is difficult to accurately judge, the real distance between the two targets cannot be truly reflected at present, the distance between the infant and the pet can be more truly reflected by processing the first target image through the monocular depth estimation model based on self-supervision learning, further, whether interaction occurs between the infant and the pet is judged, waste of working resources caused by images which do not interact in the first target image of each frame is effectively avoided, and the working efficiency is improved.

In one embodiment, referring to fig. 3, the step S31 includes:

s311: detecting each first target image by using a preset target position detection model, and identifying infant position information and pet position information in each first target image;

specifically, after detecting the infant and the pet by using the neural network detection algorithm model based on the Yol oV6s structure, the respective position information of the infant and the pet is further detected and represented by coordinates, for example, the coordinates of the top left corner vertex of the smallest circumscribed rectangular frame of the infant is marked as a (x 1, y 1), the coordinates of the bottom right corner vertex is marked as B (x 2, y 2), the coordinates of the top left corner vertex of the smallest circumscribed rectangular frame of the pet is marked as C (x 3, y 3), and the coordinates of the bottom right corner is marked as D (x 4, y 4).

S312: obtaining first central point position information of the minimum circumscribed rectangular frame of the infant and second central point position information of the minimum circumscribed rectangular frame of the pet according to the infant position information and the pet position information;

specifically, according to the A, B, C and D four-point coordinate information, the first center point coordinate of the minimum circumscribed rectangular frame of the infant is calculated to be O1 (x 5, y 5), where x5= (x1+x2)/2, y5= (y1+y2)/2; the second center point coordinate of the minimum circumscribed rectangle for the pet is O2 (x 6, y 6), where x6= (x3+x4)/2, y6= (y3+y4)/2.

S313: inputting the first central point position information and the second central point position information into a depth estimation model based on a Monodepth structure to obtain relative position information between the infant and the pet;

specifically, inputting the coordinate information of the first center point O1 and the second center point O2 into a monocular depth estimation model based on a monoscopic structure, so as to obtain distances between the first center point and the second center point and the camera respectively, namely a first depth map and a second depth map; and obtaining the relative position between the infant and the pet according to the first depth map and the second depth map. Monodepth is a model for depth estimation based on self-supervision, and can better judge the longitudinal relative position between the infant and the pet, so that the longitudinal distance between the infant and the pet can be accurately judged.

S314: and judging whether interaction occurs between the infant and the pet according to the relative position information. In one embodiment, referring to fig. 4, S314 includes:

s3141: if the distance between the first center point and the second center point is judged to be smaller than the preset threshold value according to the relative position information, the interaction between the infant and the pet is considered to occur;

S3142: and if the distance between the first center point and the second center point is judged to be not smaller than the preset threshold value according to the relative position information, the infant and the pet are considered to have no interaction.

Specifically, for example, a distance threshold value d=4cm is preset, if the real longitudinal depth distance between the infant and the pet is 3cm smaller than the distance threshold value according to the first depth map and the second depth map, the infant and the pet are considered to be very close to each other, interaction occurs at this time, and the interaction condition needs to be evaluated to select an image to generate an electronic album; on the contrary, if the true longitudinal depth distance between the infant and the pet is 5cm and is larger than the distance threshold, the infant and the pet are considered to be far away from each other, no interaction occurs, and the interaction condition does not need to be evaluated.

S32: extracting second target images with the judgment result of interaction between the infant and the pet, and scoring pet characteristics, infant characteristics and image quality characteristics in each second target image by utilizing the interaction scoring rule;

specifically, each frame of second target image of interaction between the infant and the pet is obtained, and the second target image of each frame is scored from three aspects of the pet element, the infant element and the image quality element by utilizing a preset interaction scoring rule, so that the comprehensiveness of scoring is ensured, and a user is better helped to record the attractive moment of interaction between the infant and the pet.

In one embodiment, referring to fig. 5, the step S32 includes:

s321: inputting each second target image into a preset pet action detection model, extracting pet limb key point information in the second target images, and counting the number of the pet limb key points;

specifically, images of a large number of socially public household common pets (such as cats and dogs) in each growth stage are collected in advance for analysis, images containing pet actions (such as shaking tails, rubbing owners and the like to represent actions compliant to the owners) with high user favorites are extracted mainly according to user favorites, the images are used as training bases of a deep learning model, a pet action detection model based on a YoloV6s structure is constructed and trained, the pet action detection model is used for detecting each second target image, key points similar to features in the pet action detection model in each frame of second target image are extracted, key point information of pet limbs is output, and the number of the key points of the pet limbs is counted. Because the YoloV6s has the characteristics of high detection precision, high speed and the like, accurate and high-efficiency detection of the key points of the pet limbs can be realized through the pet action detection model.

S322: scoring pet features in each second target image according to the pet limb key point information and the key point number, and outputting a first score;

specifically, a key point matching level q and a number of levels num are preset, wherein 0= < q= < 1, num is a positive integer, for example, 0= < q < 0.4 is set, and the score is 1 corresponding to the first level; 0.4 = < q < 0.7, corresponding to the second level, score 2;0.7 = < q < = 1, corresponding to the third level, score 3; setting 0= < num= < 2, and corresponding score is 1;2 < num= < 4, corresponding score 2;4 < num= < 6, corresponding score 3; and (3) scoring the pet features in each second target image according to the pet limb key point information and the key point number, wherein num is greater than 6, the corresponding score is 4, for example, the matching degree of the key point features detected in the second target image and the key point features in the pet action detection model is 0.8, the score is m1=3, the number of the pet limb key points detected by statistics is 8, the corresponding score is m2=4, and the first score P1=m1+m2=7 is obtained. The first score is obtained through the two dimensions of the quality and the number of the key points of the pet, so that the comprehensiveness of the evaluation of the pet elements is ensured.

S323: inputting each second target image into a preset face detection model, and extracting infant face key point information in the second target images;

specifically, a large number of socially disclosed images of each growth stage of the infant are collected and analyzed, the images are mainly focused on the front face and the side face of the infant, a first neural network classification algorithm model based on an Onet structure is constructed and trained by taking the images as training data of a deep learning model, each frame of second target image is detected by utilizing the first neural network classification algorithm model, and infant face key point information in the second target image is detected, wherein the infant face key point at least comprises: nose, eyes, ears and mouth keypoints. The infant face key point information is input into the first neural network classification algorithm model based on Onet, so that the infant face is classified into different face states such as a front face, a side face, a blocked face and the like, and the infant state required by a user is ensured to be displayed in the electronic album by the follow-up infant.

S324: judging the shielding condition of the infant face and the expression of the infant according to the infant face key point information, grading infant characteristics in each second target image according to the judging result, and outputting a second score;

Specifically, acquiring the infant face key point information, judging the shielding condition of the infant face and the expression of the infant according to the face key point information, for example, presetting the infant as a front face and correspondingly grading the infant as 5, wherein the front face can correspondingly detect all key points in the nose, eyes, ears and mouth bars through the classification model; the face is blocked, and the corresponding score is 4, wherein the blocked face corresponds to at least one key point in a nose, eyes, ears and mouth, which can be detected through the classification model; a side face corresponding score is 3, wherein the side face corresponds to only the key points of the ears detected by the classification model; and (3) a face-free correspondence score of 1, wherein the face-free correspondence does not detect all key points through the classification model, if the face key point information of the detected infant is used for judging that the infant is a face, the score is n1=5, a large number of images of all growth stages of the infant disclosed in society are collected and analyzed at the same time, images of the infant smile and cry are mainly focused on, a second neural network classification algorithm model based on a ResNet18 structure is constructed and trained by taking the images as training data of the deep learning model, the infant expression in the second target image is classified by utilizing the second neural network classification algorithm model, and if the infant is classified as smile according to the detected face key point information of the infant, the score is n2=5, and at the moment, the second score is p2=n1+n2=10. Through the first neural network classification algorithm model and the second neural network classification algorithm model, the shielding condition of the face of the infant and the expression of the infant are scored according to the judgment result, the integrity and the delicacy of the facial features of the infant in the pictures which are subsequently output to the electronic album are ensured, and the requirement of users for sharing the good pictures is met.

S325: and scoring the definition and the light intensity of each second target image, and outputting a third score.

Specifically, in an actual application scene, the user often hopes that the definition of the image is higher, the light intensity can be stronger, so that the higher the definition of the image is, the higher the obtained score is, and conversely, the score is lower; meanwhile, the larger the obtained photosensitive value on the photosensitive element of the camera is, the higher the obtained score is, for example, the score p3=8 is carried out on the image quality according to the image definition and the photosensitive value obtained by the photosensitive element of the camera.

S33: and extracting images with the scores higher than a preset threshold value from each frame, marking the images as high-quality images, and carrying out preset adjustment on the high-quality images to generate the electronic album.

In one embodiment, referring to fig. 6, the step S33 includes:

s331: acquiring preset scoring weights;

specifically, the nursing device firstly assigns weights of the first score, the second score and the third score according to the analysis of big data of infant nursing, for example, according to the analysis of big data, if the user such as parents prefers the photograph in the album to the infant, the first score corresponding weight A is 0.3, the second score corresponding weight B is 0.5, and the third score corresponding weight C is 0.2. According to the preset weights, the electronic album generated according to the recommendation of different weights can better meet different actual demands of users.

S332: weighting the first score, the second score and the third score according to the scoring weight, and summing to obtain a total score;

s333: extracting an image with the total score higher than a preset threshold value as a high-quality image according to the total score;

specifically, a score threshold value P is preset to be 8, the first score, the second score, the third score and the weights respectively corresponding to the first score, the second score and the third score are obtained, and the formula is used for: the total score p=a×p1+b×p2+c×p3 is calculated to obtain a total score p=7× 0.3+10× 0.5+8×0.2=8.7, and when the total score is greater than a preset threshold, the image corresponding to the score at this time is considered to be a high-quality image. Because the high-quality image is obtained by integrating the pet element, the infant element and the image quality element, the requirements of users for recording the good time of infants and pets can be better met, and the users can share the good time with other people conveniently.

S334: and carrying out preset adjustment on the high-quality image to generate the electronic album.

In one embodiment, referring to fig. 7, the step S334 includes:

s3341: acquiring a multi-frame replacement image input by a user;

specifically, in actual use, a user may select a part of images to replace an original image in the electronic album according to different application scenes and use requirements, which indicates that the original image cannot meet the user requirements at the moment and needs to be replaced in time.

S3342: comparing differences between the high quality image and the replacement image respectively in the pet characteristics, the infant characteristics and the image quality characteristics;

s3343: according to the difference, adjusting the scoring weight, and generating a new high-quality image according to the adjusted scoring weight;

specifically, differences among pet features, infant features and image quality features of the high-quality images and the replacement images are compared, the scoring weight is adjusted according to the differences, a new high-quality image is generated according to the adjusted scoring weight, and the original image of the electronic album is dynamically adjusted and replaced in time, so that the new high-quality image can be updated at any time and replaced at any time, and the actual requirements of users are met.

S3344: and outputting each new high-quality image to generate the electronic album.

Example 2

In another embodiment, the step S324 further includes:

s3241: acquiring the infant key point information, and judging the infant expression;

specifically, for the recognition of the infant expression, it is necessary to comprehensively consider a plurality of pieces of feature information including the shape of the mouth, the eye expression, the position of the eyebrow, the face color change, and the like. By combining these features, and combining more comprehensive analysis methods, such as computer vision and deep learning techniques, infant expression can be more accurately identified.

S3242: when the infant expression is crying, optimizing the second score, and obtaining crying duration of the infant;

specifically, if the infant expression in the second target image is classified according to the detected infant face key point information by the second neural network classification algorithm model, the infant is classified as crying, and at this time, the real longitudinal depth distance between the infant and the pet is 3cm smaller than the distance threshold value according to the first depth map and the second depth map, the distance between the infant and the pet is too close, and the infant is crying, and the infant is considered to possibly generate dangerous situations such as being scratched by the pet, being bitten by the pet in the pet interaction process, for example, the score is set to be n2= -5 at this time, the optimized second score p2=n1+n2=0, and the continuous crying duration t of the infant is obtained, if the infant is only accidentally frightened by the pet, the continuous crying duration t is not too long, and if dangerous situations such as being scratched or being bitten occur, the infant can continuously cry due to continuous pain, and the infant can be accurately identified in the pet interaction process.

S3243: and when the crying time is longer than a preset time threshold, a dangerous alarm is sent to the user.

Specifically, according to different practical application scenes, different time length thresholds are set, for example, the time length threshold is set to be 3min, at the moment, the time length of the infant which continuously crys is larger than the preset time length threshold, the infant is considered to be dangerous, a dangerous alarm is sent to a user, the user is helped to record the good time of the infant and the pet interaction, and meanwhile the infant is prevented from being injured.

Example 3

Referring to fig. 8, embodiment 3 of the present invention further provides an apparatus for evaluating interactive contents between infants and pets, the apparatus comprising:

the image acquisition module is used for acquiring a real-time video stream in an infant care scene and decomposing the video stream into multi-frame images;

the target detection module is used for detecting each frame of image by using a preset target detection model, and extracting a plurality of frames of first target images including infants and pets according to detection results;

the interaction scoring module is used for scoring interaction conditions of infants and pets in the first target images according to preset interaction scoring rules, and extracting high-quality images of each frame according to scoring results to generate an electronic album.

Specifically, the device for evaluating the interactive content of the infant and the pet according to the embodiment of the invention comprises the following components: the image acquisition module is used for acquiring a real-time video stream in an infant care scene and decomposing the video stream into multi-frame images; the target detection module is used for detecting each frame of image by using a preset target detection model, and extracting a plurality of frames of first target images including infants and pets according to detection results; the interaction scoring module is used for scoring interaction conditions of infants and pets in the first target images according to preset interaction scoring rules, and extracting high-quality images of each frame according to scoring results to generate an electronic album. The device provided by the embodiment of the invention judges whether the infants and the pets appear in the real-time video through the preset target detection model, so that the resource waste caused by the false start of equipment when the scoring target does not appear is avoided; meanwhile, according to a preset grading rule, grading is carried out on the interaction condition, a high-quality image is generated and is manufactured into the electronic album, and the requirement that a user records good time between an infant and a pet when the infant interacts with the pet is met.

Example 4

In addition, the method for evaluating the interactive contents of the infant and the pet according to embodiment 1 of the present invention described in connection with fig. 1 may be implemented by an electronic device. Fig. 9 shows a schematic hardware structure of an electronic device provided in embodiment 4 of the present invention.

The electronic device may include a processor and memory storing computer program instructions.

In particular, the processor may comprise a Central Processing Unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured as one or more integrated circuits that implement embodiments of the present invention.

The memory may include mass storage for data or instructions. By way of example, and not limitation, the memory may comprise a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, magnetic tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the foregoing. The memory may include removable or non-removable (or fixed) media, where appropriate. The memory may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory is a non-volatile solid state memory. In a particular embodiment, the memory includes Read Only Memory (ROM). The ROM may be mask programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory, or a combination of two or more of these, where appropriate.

The processor reads and executes the computer program instructions stored in the memory to implement any of the methods for evaluating the interactive content of infants and pets in the above embodiments.

In one example, the electronic device may also include a communication interface and a bus. The processor, the memory, and the communication interface are connected by a bus and complete communication with each other, as shown in fig. 9.

The communication interface is mainly used for realizing communication among the modules, the devices, the units and/or the equipment in the embodiment of the invention.

The bus includes hardware, software, or both that couple the components of the device to each other. By way of example, and not limitation, the buses may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a micro channel architecture (MCa) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus, or a combination of two or more of the above. The bus may include one or more buses, where appropriate. Although embodiments of the invention have been described and illustrated with respect to a particular bus, the invention contemplates any suitable bus or interconnect.

Example 5

In addition, in combination with the method for evaluating the interactive contents between infants and pets in the above embodiment 1, embodiment 5 of the present invention may also provide a computer readable storage medium. The computer readable storage medium has stored thereon computer program instructions; the computer program instructions, when executed by the processor, implement any of the infant and pet interactive content assessment methods of the embodiments described above.

In summary, the embodiments of the present invention provide a method, an apparatus, a device, and a storage medium for evaluating interactive contents between infants and pets.

It should be understood that the invention is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and shown, and those skilled in the art can make various changes, modifications and additions, or change the order between steps, after appreciating the spirit of the present invention.

The functional blocks shown in the above-described structural block diagrams may be implemented in hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuitry, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and the like. The code segments may be downloaded via computer networks such as the internet, intranets, etc.

It should also be noted that the exemplary embodiments mentioned in this disclosure describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, or may be performed in a different order from the order in the embodiments, or several steps may be performed simultaneously.

In the foregoing, only the specific embodiments of the present invention are described, and it will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein. It should be understood that the scope of the present invention is not limited thereto, and any equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present invention, and they should be included in the scope of the present invention.

Claims

1. A method for evaluating the interactive contents of infants and pets, which is characterized by comprising the following steps:

2. The method for evaluating the interactive contents of an infant and a pet according to claim 1, wherein inputting each of the first target images into a preset monocular depth estimation model based on self-supervised learning, and extracting a second target image of the infant and the pet for interaction comprises:

3. The method for evaluating the interactive contents of infants and pets according to claim 1, wherein the scoring the pet features, the infant features and the image quality features in each of the second target images by using a preset interactive scoring rule, and outputting the interactive scoring result comprises:

4. The method for evaluating the interactive contents of infants and pets according to claim 3, wherein inputting each of the second target images into a preset pet action detection model, scoring the pet features in each of the second target images according to the extracted pet limb key point information and the number of the pet limb key points, and outputting the first score comprises:

5. The method for evaluating the interactive contents of infants and pets according to claim 3, wherein inputting each of the second target images into a preset face detection model, judging the shielding condition of the face of the infant and the expression of the infant, scoring the characteristics of the infant in each of the second target images according to the judgment result, and outputting a second score comprises:

Acquiring the infant key point information, and judging the infant expression;

6. The method for evaluating the interactive contents of an infant and a pet according to claim 1, wherein the acquiring the first target image including the infant and the pet for the plurality of frames in the infant care scene comprises:

7. The method for evaluating the interactive contents of infants and pets according to claim 1, wherein the scoring the pet features, the infant features and the image quality features in each of the second target images by using a preset interactive scoring rule, and outputting the interactive scoring result further comprises:

and outputting each new high-quality image to generate the electronic album.

8. An infant and pet interactive content assessment device, the device comprising:

9. An electronic device, comprising: at least one processor, at least one memory, and computer program instructions stored in the memory, which when executed by the processor, implement the method of any one of claims 1-7.

10. A storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1-7.