CN115145402A - Intelligent toy system with network interaction function and control method - Google Patents

Intelligent toy system with network interaction function and control method Download PDF

Info

Publication number
CN115145402A
CN115145402A CN202211063424.3A CN202211063424A CN115145402A CN 115145402 A CN115145402 A CN 115145402A CN 202211063424 A CN202211063424 A CN 202211063424A CN 115145402 A CN115145402 A CN 115145402A
Authority
CN
China
Prior art keywords
user
data
voice
module
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211063424.3A
Other languages
Chinese (zh)
Inventor
樊庆伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Fumi Health Technology Co ltd
Original Assignee
Shenzhen Fumi Health Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Fumi Health Technology Co ltd filed Critical Shenzhen Fumi Health Technology Co ltd
Priority to CN202211063424.3A priority Critical patent/CN115145402A/en
Publication of CN115145402A publication Critical patent/CN115145402A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63HTOYS, e.g. TOPS, DOLLS, HOOPS OR BUILDING BLOCKS
    • A63H33/00Other toys
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers

Abstract

The invention discloses an intelligent toy system with a network interaction function and a control method, wherein the intelligent toy system comprises a feature acquisition module, a learning module, a processing module, a tracking module and an identification module, the feature acquisition module acquires user features, the learning module trains a feature model in the processing module according to user feature data and classifies the feature data, the processing module selects interaction data matched with the current feature data according to the feature model after receiving the user feature data, and the interaction between a control toy and a user is carried out based on the interaction data. The toy control system can continuously train the feature model according to the user feature data in the process of collecting the user features, so that the system can identify a plurality of users, different interactions can be performed according to interests and hobbies of different users, and the applicability is higher.

Description

Intelligent toy system with network interaction function and control method
Technical Field
The invention relates to the technical field of toy control systems, in particular to an intelligent toy system with a network interaction function and a control method.
Background
The toy can be a natural object, namely, a non-artificial object such as sand, stone, mud, tree branches and the like, can be understood in a broad sense, is not limited to things sold on the street for people to play, can be called as a toy when being played, watched, listened and touched, is suitable for children, is more suitable for young and middle-aged people, and is a tool for opening a smart skylight, so that people can be intelligent and smart;
along with the development of the times, intelligent toys have appeared, are a market segment of toy categories, integrate some IT techniques and ancient toys together, are novel toys different from traditional toys, can interact with people, and are better in interactivity and deeply loved by people.
The prior art has the following defects: the existing intelligent toy control system can only still input control instructions and interact with users, however, because users have children, young people and middle-aged and old people, the control system can not select to make interaction matched with the users according to user characteristics (for example, after the control system samples voice data, the control system can only make response according to voice data content, and can not judge that the current users are children, young people or middle-aged and old people), and the applicability is poor.
Disclosure of Invention
The invention aims to provide an intelligent toy system with a network interaction function and a control method, so as to solve the defects in the background technology.
In order to achieve the above purpose, the invention provides the following technical scheme: the intelligent toy system with the network interaction function comprises a feature acquisition module, a learning module, a processing module, a tracking module and an identification module;
the feature acquisition module acquires user features, the learning module trains a feature model inside the processing module according to the user feature data and classifies the feature data, and the processing module selects interactive data matched with the current feature data according to the feature model after receiving the user feature data and interacts with a user based on the interactive data and the control toy.
Preferably, the feature acquisition module comprises a gesture acquisition unit, the gesture acquisition unit comprises skin color extraction, fingertip extraction and finger number identification, the skin color extraction is used for extracting the skin color of the hand, and the fingertip extraction and the finger number identification are used for identifying the edge of the hand and the number of fingers.
Preferably, the fingertip extraction and the finger number identification extract a binary image of the palm region through a convex hull, the convex hull is a convex polygon formed by connecting points on the outer layer of the binary image and is a convex hull, and the coordinates of the palm center of the convex hull are extracted through the following formula:
Figure 320864DEST_PATH_IMAGE001
in the above formula, the first and second carbon atoms are,
Figure 500173DEST_PATH_IMAGE002
is the first in the gesture area
Figure 260449DEST_PATH_IMAGE003
The coordinate values of the individual pixel points,
Figure 107183DEST_PATH_IMAGE004
is the total number of pixel points in the gesture area
Figure 911191DEST_PATH_IMAGE005
The coordinates of the palm of the hand.
Preferably, the feature acquisition module further includes a speech acquisition unit, the speech acquisition unit includes text analysis, prosody processing, and speech synthesis, the text analysis is used for processing an input text, the prosody control is used for predicting prosody features of a synthesized speech, and the speech synthesis is used for processing the text features and prosody model parameters obtained through the text analysis and the prosody control.
Preferably, the voice recognition unit recognizes the voice including the steps of:
filtering out secondary information and environmental noise in an original voice signal;
analyzing a voice waveform and extracting a voice time sequence characteristic sequence;
and inputting the obtained voice characteristic parameters into an acoustic model for continuous training to obtain a model matched with a training output signal.
Preferably, the feature data includes user gesture data and user voice data acquired by the feature acquisition module.
Preferably, the tracking module is further included: when the user characteristics are acquired by the characteristic acquisition module, the tracking module continuously tracks the user characteristics, so that the characteristic acquisition module tracks and acquires the user characteristics.
Preferably, the tracking module is a tracking camera, the characteristic acquisition module acquires user characteristics, the tracking camera divides a user activity area, the tracking module continuously tracks the user in the activity area, the characteristic acquisition module continuously acquires the characteristics, and the tracking camera stops tracking after the user moves out of the activity area.
Preferably, the processing module comprises a processor and a signal transceiver, the signal transceiver is electrically connected with the processor, the processor is used for processing the characteristic data, and the processor is wirelessly connected with the mobile phone terminal through the signal transceiver based on a WiFi network.
The invention also provides a control method of the intelligent toy with the network interaction function, which comprises the following steps:
s1: collecting user characteristic data;
s2: training a feature model according to the user feature data;
s3: classifying the feature data;
s4: and selecting interactive data matched with the current characteristic data according to the characteristic model, and interacting with the control toy and the user based on the interactive data.
In the technical scheme, the invention provides the following technical effects and advantages:
1. according to the invention, the characteristic acquisition module is used for acquiring user characteristics, the learning module is used for training the characteristic model in the processing module according to the user characteristic data and classifying the characteristic data, the processing module is used for selecting interactive data matched with the current characteristic data according to the characteristic model after receiving the user characteristic data, and the toy is controlled to interact with the user based on the interactive data.
2. The system extracts the binary image of the palm area through the convex hull, the convex hull is a convex polygon formed by connecting outermost points, the convex hull comprises all points in a point set, a covexhull function is provided by OpenCV for determining the positions of the fingertips to search each vertex of the convex polygon, the positions of the palm and the fingertips can be accurately identified through the steps, the number of the fingers can be obtained by counting the number of blue circles, and therefore the acquisition precision of system characteristic data is improved.
3. The system of the invention automatically generates continuous voice according to the rule by analyzing the input text data, and the pronunciation rule synthesis method has the advantages that the sentence with infinite vocabulary can be synthesized after the accurate and fine pronunciation rule is established, so that the system has great plasticity and self-adaptability.
Drawings
In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.
FIG. 1 is a block diagram of the system of the present invention.
FIG. 2 is a flow chart of the gesture capturing unit according to the present invention.
FIG. 3 is a flow chart of the speech recognition unit of the present invention.
Fig. 4 is a schematic diagram of an overall framework of the voice collecting unit according to the present invention.
FIG. 5 is a schematic diagram of the convolutional neural network of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
Referring to fig. 1, the intelligent toy system with network interaction function according to this embodiment includes a feature collecting module, a learning module, a processing module, a tracking module, and an identifying module;
wherein, the first and the second end of the pipe are connected with each other,
a characteristic acquisition module: the system is used for collecting user characteristics;
a learning module: training a feature model inside the processing module according to the user feature data, and classifying the feature data;
a processing module: the interactive toy is used for receiving the user characteristic data, selecting interactive data matched with the current characteristic data according to the characteristic model, and interacting with the user based on the interactive data and the control toy;
a tracking module: in the process of acquiring the user characteristics by the characteristic acquisition module, the tracking module continuously tracks the user characteristics, so that the characteristic acquisition module tracks and acquires the user characteristics;
an identification module: the type of the feature data is identified, the auxiliary learning module classifies the feature data, and the toy control system can continuously train the feature model according to the feature data of the user in the process of collecting the features of the user, so that the system can identify a plurality of users and perform different interactions according to interests and hobbies of different users, and the applicability is higher.
The processing module establishes a feature model based on a function polyfit (), the method for establishing the model by the function polyfit () belongs to the prior art, and details are not repeated herein, and the feature data comprises user gesture data and user voice data acquired by the feature acquisition module.
The processing module comprises a processor and a signal transceiver, the signal transceiver is electrically connected with the processor, the processor is used for processing characteristic data, and a user can use a mobile phone terminal and the signal transceiver to realize network interaction based on WiFi network wireless connection so as to control the processor to operate through a mobile phone APP.
The tracking module is a tracking camera, after the characteristic acquisition module acquires the characteristics of the user, the tracking camera divides an active area according to the user, the tracking module continuously tracks the user after the active area, so that the characteristic acquisition module continuously acquires the characteristics, and after the user moves out of the active area, the tracking camera stops tracking.
The feature acquisition module comprises a gesture acquisition unit;
wherein the content of the first and second substances,
gesture collection unit: the gesture collection unit comprises skin color extraction, fingertip extraction and finger number identification, wherein the skin color extraction is used for extracting the skin color of a hand, and the fingertip extraction and the finger number identification are used for identifying the edge of the hand and the number of fingers;
(1) Skin color extraction: the hand complexion is extracted to the gesture collection unit based on YCrCb color space, and YCrCb color space has the characteristics of separating chroma and brightness, and is relatively good to the clustering characteristic of complexion, receives little influence of brightness change, can be fine distinguish the complexion region, and the distribution range of human skin color in YCrCb chroma space is roughly: cb is more than or equal to 77 and less than or equal to 127, cr is more than or equal to 133 and less than or equal to 173, the range is selected as a threshold value of skin color segmentation, and the following is a conversion formula of an RGB color space and a YCrCb color space:
Figure 92642DEST_PATH_IMAGE006
considering that background noise may interfere with hand extraction in the actual image acquisition process, therefore, the influence similar to skin color background noise needs to be eliminated, openCV provides a function for searching for a connected region, and can return the labels and pixel points of each connected region.
And in order to avoid the situation that the background area similar to skin color is mistakenly understood as the gesture area by the gesture acquisition unit when the hand is not in the image capturing range, the gesture area can be correctly divided without marking when the maximum connected area is smaller than 5000 pixel points.
(2) Fingertip extraction and finger number identification: the binary image of the palm region extracted by the convex hull is formed by a point set of a two-dimensional plane in a computer, the convex hull is a convex polygon formed by connecting outermost points, all the points in the point set are contained, and a contevexHull function is provided by OpenCV for determining the position of a fingertip to search each vertex of the convex polygon.
In the actual operation process, some fingertips are repeatedly marked and some irrelevant areas are also marked, which brings trouble to accurately identify the number of the fingertips, therefore, the point regulations which do not meet the requirements need to be removed, when the distance between the vertexes of the convex hull is less than 500 pixel points, only one of the fingertips is marked, and the area which is lower than the palm coordinate is not marked, wherein the palm coordinate extraction formula is as follows:
Figure 640298DEST_PATH_IMAGE001
in the above formula, the first and second carbon atoms are,
Figure 341538DEST_PATH_IMAGE002
is the first in the gesture area
Figure 316447DEST_PATH_IMAGE003
The coordinate value of each pixel point is calculated,
Figure 483730DEST_PATH_IMAGE004
is the total number of pixel points in the gesture area
Figure 569498DEST_PATH_IMAGE005
The positions of the palm and the fingertips can be accurately identified through the steps, and the number of the fingers can be obtained by counting the number of the blue circles.
Referring to fig. 2, the gesture information of the motion is recognized by frame difference method, and the program sets two variables, namely Hand and Count, to respectively mark whether to capture the number of frames that the Hand enters and the number of frames that the Hand enters, including the following cases:
(1) (Count =0, hand = 0): no hand appears in the current image and the previous frame of image, which indicates that no hand enters or exits the camera capturing area in the period of time and no processing is performed.
(2) (Count =0, hand = 1): no hand appears in the current image, and the hand appears in the last frame, which indicates that the hand just leaves the camera capturing area, and the image information is saved.
(3) (Count =1, hand = 1): the current image has a hand present and is the first frame image, indicating that the hand has just entered the image capture area, saving the image information.
(4) (Count = K, hand = 1): indicating that the hand is always in the image capture area.
Combining the condition (2) and the condition (3), the four gesture movement directions of up, down, left and right can be recognized by judging the change condition of the gesture center coordinates of the first frame and the last frame, and the recognition process can be represented by the following formula:
Figure 390823DEST_PATH_IMAGE007
Figure 333371DEST_PATH_IMAGE008
in the above formula, the first and second carbon atoms are,
Figure 223836DEST_PATH_IMAGE009
and
Figure 113294DEST_PATH_IMAGE010
coordinates representing the centers of the gestures of the first frame and the last frame respectively,
Figure 789126DEST_PATH_IMAGE011
is the angle between two points by judgment
Figure 856571DEST_PATH_IMAGE011
The general direction of the gesture movement can be known, the first frame and the last frame are selected to replace a series of image frames from the time when the hand enters a capture area to the time when the hand leaves the capture area, the complexity of programming is also reduced, and at most eight movement directions can be recognized by calculating the tangent value of the included angle of the palm coordinates of the two frames of images, so that the recognition effect is good.
For the case (4), two kinds of dynamic gesture information, namely a hand and a fist, are recognized through an inter-frame difference method, the hand can be defined as that the number of fingers in continuous W frames is 5, the number of fingers in subsequent continuous W frames is kept unchanged, the fist can be defined as that the number of fingers in continuous Y frames is 5, values of W, Y and T for reducing T pixel points in subsequent Y frames need to be set according to specific conditions of a system, the set threshold value is W = Y =20, T =1000, values of W, Y and T are immediately cleared and counted again when any frame does not meet the conditions, and the recognition of the fist can be completed.
Example 2
The feature acquisition module also comprises a voice acquisition unit;
the voice acquisition unit comprises text analysis, rhythm processing and voice synthesis;
(1) The text analysis is to process the input text, the computer understands the text, knows what pronunciation and how each word should pronounce, and determines the words, phrases and sentences in the text, the text analysis module firstly carries out standardization processing to the input text, checks spelling error and filters out some words which are not standard or can not pronounce, then carries out word segmentation operation according to language and grammar rules, determines the boundaries of the words in the text, determines pronunciations of polyphonic characters and proper nouns under the action of a dictionary, and finally determines pronunciation tone and the transformation of pronunciation mood at different moments according to the text characteristics of the text context relationship and punctuation marks appearing at different positions in the text.
(2) The prosody control module is mainly used for predicting prosodic features of synthesized voice, corresponding prosodic information (such as intonation, rhythm and accent) in the voice is expressed by the prosodic features (such as fundamental frequency, duration and frequency spectrum), firstly, the prosodic control module collects a large amount of voice and text information data to establish a database, then extracts specific prosodic parameters according to the prosodic features in the voice, and finally inputs the prosodic parameters into a prosodic model to train and continuously perfect model parameters.
(3) The speech synthesis is further processing after text characteristics and prosody model parameters are obtained through text analysis and prosody control, the speech synthesis module is realized through an acoustic model, and the model synthesizes the final speech meeting the requirements by using a parameter synthesizer.
The voice synthesis method comprises parameter synthesis, splicing synthesis and pronunciation rule-based synthesis.
The parametric synthesis method is also called as an analysis synthesis method, and an acoustic model is usually generated by simulating the vocal tract characteristics of the human mouth, and the process of synthesizing the speech is as follows:
(1) Recording the recording of all possible pronunciations of human according to a certain language, analyzing a voice signal according to a certain method, and extracting acoustic parameters of the voice;
(2) And during synthesis, proper acoustic parameters are selected from the sound library according to the requirement of the synthesized sound, and are sent to a parameter synthesizer together with the prosodic parameters obtained from the prosodic model, and finally the synthesized voice is obtained.
The advantage of the parametric speech synthesis method is that the acoustic library stores the encoded acoustic parameters, so the required storage space is generally small, and the whole speech synthesis system can adapt to a very wide prosodic feature range.
The splicing synthesis method is different from a parameter synthesis method for storing acoustic parameters of voice, a sound library stores natural voice waveforms of synthesized voice units, proper splicing units are extracted from the sound library during synthesis, continuous synthesized voice is formed through a splicing algorithm and prosody modification, and the splicing synthesis method is characterized in that the splicing units are from the sound library, so the capacity of the library is large, the complexity is reduced by fine design, but the splicing units are obtained from the natural voice waveforms and replace the encoded acoustic parameters with the natural voice waveforms, so the synthesized voice effect is superior to the parameter synthesis method in the aspects of tone quality and naturalness.
Referring to fig. 3, the voice recognition unit recognizes the voice, including the following steps:
(1) Pretreatment: the method filters out the secondary information, environmental noise and other influencing factors in the original voice signal, thus not only carrying out information compression and reducing the operation amount and memory space of the system, but also greatly reducing the error rate of the system, and generally dividing the preprocessing process into a plurality of stages, such as filtering sampling, pre-emphasis, framing, windowing, endpoint detection and the like.
(2) The voice feature extraction aims at analyzing voice waveforms and extracting voice time sequence feature sequences, the extraction of voice feature parameters is a core part of a voice recognition system and determines a final recognition effect, and the feature parameters have the following features:
(2.1) voice characteristics such as pronunciation characteristics and vocal tract characteristics can be well expressed;
(2.2) the dimensionality of the extracted feature vectors is lower as much as possible, and parameter vectors of each order have good mutual independence;
and (2.3) the characteristic parameters can be calculated by using an efficient algorithm, so that the system can realize the identification process in real time.
(3) Acoustic model and pattern matching: and inputting the obtained voice characteristic parameters into an acoustic model for continuous training to obtain an optimal model with the maximum probability of coincidence with a training output signal, and inputting the voice characteristics of unknown voice signals into the acoustic model for comparison and matching in the recognition process to obtain a final recognition result.
(4) Language model and language processing: the language model is a grammar network formed by recognizing voice commands or a statistical language model, and language processing can be used for analyzing grammar and semantics and determining the correctness of a recognition result.
The pronunciation rule synthesis method is based on how to generate the rule synthesis voice, the system stores the acoustic parameters of the minimum constitution unit of the voice, and the constitution rules among phonemes, syllables, words, phrases or sentences and various control rules of prosodic information such as intonation, accent, rhythm and the like.
The pronunciation acquisition unit is based on FM1288 voice processing chip realizes, and the voice processing chip need be handled noise and echo among the conversation process to provide high-quality conversation effect, the FM1288 chip utilizes the acoustics echo cancellation principle, gets rid of the acoustics echo of ambient noise, and the compatible extensive host computer treater of FM1288 chip, the design system integration of being convenient for, the key feature of FM1288 chip has:
an integrated Digital Signal Processor (DSP) including a hardware calculus accelerator, ROM, and RAM;
an integrated analog-to-digital converter (ADC) and a digital-to-analog converter (DAC);
providing an IIS/PCM multiplexed digital audio interface;
providing automatic gain control (PGA) and Dynamic Range Control (DRC);
a user-selectable dual-microphone input is provided to support full-duplex, echo-free communication, and noise suppression of upstream and downstream voice signals.
Referring to fig. 4, the voice acquisition unit uses an X1000 main chip as a core, expands a peripheral I/O interface to connect with a key and an indicator light, and performs voice transmission and data transmission with a BCM43438 bluetooth chip through a PCM interface and a UART interface, respectively, and uses a special echo cancellation and noise suppression chip FM1288 as a voice processing unit, an external microphone, and an audio power amplifier to form a high-efficiency and stable audio input and output system.
Example 3
The learning model of the learning module is divided into four types, namely supervised learning, unsupervised learning and semi-supervised learning, wherein,
and (3) a supervised learning algorithm: supervised learning refers to establishing a prediction model by training labeled data, wherein the supervised learning model belongs to a classification method which comprises a vector machine (SVM) model and A Neural Network (ANN) model;
the vector machine (SVM) model firstly maps original data samples into a higher-dimensional space through a kernel function, and secondly constructs a separating hyperplane with the maximum distance between the nearest data samples in each sample class for classification, and the specific formula is as follows:
Figure 985064DEST_PATH_IMAGE012
in the formula (I), the compound is shown in the specification,
Figure DEST_PATH_IMAGE013
is shown as
Figure 130743DEST_PATH_IMAGE014
The number of the data samples is one,
Figure 661081DEST_PATH_IMAGE015
representing the corresponding label;
Figure 414274DEST_PATH_IMAGE016
representing a normal vector of the hyperplane, and determining the direction of the hyperplane;
Figure 777866DEST_PATH_IMAGE017
representing a penalty factor;
Figure 9127DEST_PATH_IMAGE018
for measuring constraint conflicts;
Figure 190710DEST_PATH_IMAGE019
representing a kernel function;
Figure 849224DEST_PATH_IMAGE020
for determining the offset of the hyperplane from the origin to the normal vector, the above concept can be translated into the convex quadratic programming problem below.
The neural network (ANN) model has various types according to its specific structure, including an error back propagation neural network, an extreme learning machine, a convolutional neural network, and the like, and among the simplest models in the conventional network, a multilayer perceptron is used.
The supervised learning algorithm further comprises regression analysis, and the future change trend of the dependent variable is predicted by fitting the relation between the dependent variable and the independent variable.
Unsupervised learning algorithm: dividing the samples into different groups and subsets according to similar attributes of the samples, wherein the unsupervised learning method comprises density-based clustering, division-based clustering, hierarchy-based clustering, grid-based clustering and the like;
density-based clustering: it is first necessary to define two parameters,
Figure 467156DEST_PATH_IMAGE021
a neighborhood radius of sample points and a minimum number of points required to form a dense domain
Figure 236529DEST_PATH_IMAGE022
First, for each sample point, it is found
Figure DEST_PATH_IMAGE023
Points in the neighborhood and find their core object, i.e. for the
Figure 164296DEST_PATH_IMAGE024
Points in the neighborhood if the point contains at least
Figure 993712DEST_PATH_IMAGE025
Each sample is regarded as a core object;
secondly, finding out connected components of the core object, and ignoring all non-core object points;
finally, from the current set of core objectsSelecting a core object from the pool and based thereon
Figure 115251DEST_PATH_IMAGE026
The neighborhood assigns non-core object points to the cluster closest to them, and the other points are considered noise.
Clustering based on partitioning: the method is divided into different categories according to the characteristics of sample points and data similarity, and the measurement is usually performed based on the distance between the points, that is, the sample points in the same category are as far as possible, and the sample points in different categories are as far as possible, and the specific formula is as follows:
Figure 672004DEST_PATH_IMAGE027
(1)
Figure 297020DEST_PATH_IMAGE028
(2)
in the above-mentioned formula, the compound has the following structure,
Figure 297337DEST_PATH_IMAGE029
representing data points
Figure 857238DEST_PATH_IMAGE030
The category to which the user belongs to is,
Figure DEST_PATH_IMAGE031
as a result of the data points,
Figure 702834DEST_PATH_IMAGE032
is the center of the cluster, and,
Figure DEST_PATH_IMAGE033
are initial data points; wherein the formula (2) is centered for each class
Figure 634887DEST_PATH_IMAGE034
The method comprises the following steps of (1) repeatedly calculating, wherein the specific processing logic is as follows: firstly, randomly selecting
Figure DEST_PATH_IMAGE035
Using sample point as initial clustering center
Figure 71684DEST_PATH_IMAGE036
And repeatedly executing the formula (1) and the formula (2) until the termination condition is reached.
Semi-supervised learning algorithm: the method comprises the steps of generating pseudo-label data by using a large amount of unlabeled data, training a classifier by matching with a small amount of data with real labels, calculating by taking an SVM (support vector machine) as a model, and finding a hyperplane which maximizes intervals after labeling all unlabeled samples, wherein a semi-supervised learning algorithm comprises self-training, cooperative training, semi-supervised SVM and the like.
In summary, supervised learning generally requires a long time of debugging, and parameters and model frameworks are selected repeatedly;
the theoretical basis of semi-supervised learning lies in the continuity and consistency of the distribution of the marked data and the unmarked data, so that the learning module can utilize the point to carry out effective structural learning and enhance the representation capability of the model;
therefore, in this embodiment, an unsupervised learning algorithm is preferably used as the learning module learning algorithm of the training module, and the feature data can be quickly identified.
Example 4
The recognition module recognizes the characteristic data based on a deep learning algorithm, namely deep learning, namely a learning network which superposes hidden layer numbers on the basis of a neural network;
the processing logic of the deep learning algorithm is as follows:
a system L is provided having n layers (L1.... Ln), with I as input and O as output, and the process can be expressed as: i = > L1= > L2= >.. = > Ln = > O, if output O is equal to input I, i.e. there is no loss of information after input I has passed this systematic change, which means there is no loss of information after input I has passed each layer Li, i.e. at any layer Li, it is another representation of the original information (i.e. input I);
thus, a series of hierarchical features of the input I, namely L1 \8230Ln, can be automatically obtained, deep learning is to stack a plurality of layers, and the output of the layer is used as the input of the next layer to realize hierarchical expression of the input information.
The deep learning algorithm comprises a convolutional neural network, the convolutional neural network reduces the number of parameters to be learned by utilizing a spatial relationship so as to improve the training performance of a general forward BP algorithm, a small part (a local perception region) of an image is used as the input of the lowest layer of a hierarchical structure, information is sequentially transmitted to different layers, each layer obtains the most significant characteristics of observation data through a digital filter, and the method can obtain the significant characteristics of the observation data with unchanged translation, scaling and rotation.
The convolutional neural network is a multilayer artificial neural network, each layer is composed of a plurality of two-dimensional planes, each plane is composed of a plurality of independent neurons, and the specific processing logic is as follows:
as shown in fig. 5, an input image is convolved with three filters and an applicable bias, three feature maps are generated in a C1 layer after convolution, then adjacent four pixels in the feature maps are grouped and added to obtain an average value, then a weighted value and a bias are performed, a feature map of three S2 layers is obtained through an activation function (Sigmoid function), the maps are filtered correspondingly to obtain a C3 layer, the layer generates S4 as with S2, and finally, the pixel values are rasterized and connected into a one-dimensional vector to be input into a conventional neural network to obtain an output;
the convolutional neural network comprises local receptive fields, weight sharing and time and space sampling, wherein,
local receptive field: some local features of the sample data can be found through the perception of the local area;
weight sharing: each layer in the convolutional neural network is composed of a plurality of feature maps, each feature map comprises a plurality of neural units, all the neural units of the same feature map share the same convolutional kernel (namely weight), and one convolutional kernel usually represents one class of features of a sample;
spatial sampling: the purpose of sampling the sample is primarily to shuffle the specific location of a feature because once a feature of the sample is found, then its specific location is not important, and the system is only concerned with the relative location of that feature to other features.
In this embodiment, a convolutional neural network is used as a deep learning algorithm of the recognition model, so that:
(1) The input image can be well matched with the topological structure of the network;
(2) Feature extraction and pattern classification can be performed simultaneously and generated in network training;
(3) The weight sharing can reduce the training parameters of the network, so that the neural network structure becomes simpler and the adaptability is stronger.
The feature data (gesture images) are accurately analyzed through a deep learning algorithm, the single-frame images are acquired by the feature acquisition module and are transmitted to the trained deep learning model, the model uses a Yolo algorithm to perform target detection on the content of the single-frame images, and the content of the single-frame images is analyzed, so that the detection accuracy and the learning capacity of the toy control system are improved.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer instructions or the computer program are loaded or executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists singly, A and B exist simultaneously, and B exists singly, wherein A and B can be singular or plural. In addition, the "/" in this document generally indicates that the former and latter associated objects are in an "or" relationship, but may also indicate an "and/or" relationship, and may be understood with particular reference to the former and latter contexts.
In the present application, "at least one" means one or more, "a plurality" means two or more. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.
It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a variety of media that can store program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (8)

1. Intelligent toy system with network interaction function, its characterized in that: the device comprises a characteristic acquisition module, a learning module and a processing module;
the characteristic acquisition module acquires user characteristics, the learning module trains a characteristic model in the processing module according to user characteristic data and classifies the characteristic data, the processing module selects interactive data matched with the current characteristic data according to the characteristic model after receiving the user characteristic data, and the interactive data and the control toy are interacted with the user;
the feature acquisition module comprises a gesture acquisition unit, the gesture acquisition unit comprises skin color extraction, fingertip extraction and finger number identification, the skin color extraction is used for extracting the skin color of a hand, and the fingertip extraction and the finger number identification are used for identifying the edge of the hand and the number of fingers;
the method comprises the following steps that a binaryzation picture of a palm area is extracted through a convex hull by fingertip extraction and finger quantity identification, the convex hull is a convex polygon formed by connecting outer-layer points of the binaryzation picture, and the coordinates of the palm center of the convex hull are extracted through the following formula:
Figure DEST_PATH_IMAGE001
in the above formula, the first and second carbon atoms are,
Figure 385723DEST_PATH_IMAGE002
is the first in the gesture area
Figure 652626DEST_PATH_IMAGE003
The coordinate values of the individual pixel points,
Figure 147192DEST_PATH_IMAGE004
is the total number of pixel points in the gesture area
Figure 945384DEST_PATH_IMAGE005
The coordinates of the palm of the hand.
2. The intelligent toy system with network interaction function of claim 1, wherein: the feature acquisition module further comprises a voice acquisition unit, wherein the voice acquisition unit comprises text analysis, prosody processing and voice synthesis, the text analysis is used for processing an input text, the prosody control is used for predicting prosody features of a synthesized voice, and the voice synthesis is used for processing the text features and prosody model parameters obtained through the text analysis and the prosody control.
3. The intelligent toy system with network interaction function of claim 2, wherein: the voice acquisition unit recognizes voice and comprises the following steps:
filtering out secondary information and environmental noise in an original voice signal;
analyzing a voice waveform and extracting a voice time sequence characteristic sequence;
and inputting the obtained voice characteristic parameters into an acoustic model for continuous training to obtain a model matched with a training output signal.
4. The intelligent toy system with network interaction function of claim 1, wherein: the feature data comprises user gesture data and user voice data acquired by a feature acquisition module.
5. The intelligent toy system with network interaction function of claim 1, wherein: still include the tracking module: when the user characteristics are acquired by the characteristic acquisition module, the tracking module continuously tracks the user characteristics, so that the characteristic acquisition module tracks and acquires the user characteristics.
6. The intelligent toy system with network interaction function of claim 5, wherein: the tracking module is a tracking camera, the characteristic acquisition module acquires user characteristics, the tracking camera divides a user activity area, the tracking module continuously tracks the user in the activity area, the characteristic acquisition module continuously acquires the characteristics, and the tracking camera stops tracking after the user moves out of the activity area.
7. An intelligent toy system with network interaction function as claimed in any one of claims 1-6, wherein: the processing module comprises a processor and a signal transceiver, the signal transceiver is electrically connected with the processor, the processor is used for processing the characteristic data, and the processor is wirelessly connected with the mobile phone terminal through the signal transceiver based on a WiFi network.
8. A control method of an intelligent toy with a network interaction function is characterized in that: the method comprises the following steps:
s1: collecting user characteristic data;
s2: training a feature model according to the user feature data;
s3: classifying the feature data;
s4: and selecting interactive data matched with the current characteristic data according to the characteristic model, and interacting with the control toy and the user based on the interactive data.
CN202211063424.3A 2022-09-01 2022-09-01 Intelligent toy system with network interaction function and control method Pending CN115145402A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211063424.3A CN115145402A (en) 2022-09-01 2022-09-01 Intelligent toy system with network interaction function and control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211063424.3A CN115145402A (en) 2022-09-01 2022-09-01 Intelligent toy system with network interaction function and control method

Publications (1)

Publication Number Publication Date
CN115145402A true CN115145402A (en) 2022-10-04

Family

ID=83416655

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211063424.3A Pending CN115145402A (en) 2022-09-01 2022-09-01 Intelligent toy system with network interaction function and control method

Country Status (1)

Country Link
CN (1) CN115145402A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117492568A (en) * 2023-11-15 2024-02-02 杭州稚爱教育科技有限公司 Toy interaction identification method and system based on convolutional neural network
CN117492568B (en) * 2023-11-15 2024-04-26 杭州稚爱教育科技有限公司 Toy interaction identification method and system based on convolutional neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160132125A1 (en) * 2014-11-07 2016-05-12 HONG FU JIN PRECISION INDUSTRY ShenZhen) CO., LTD. System and method for generating gestures
CN106778670A (en) * 2016-12-30 2017-05-31 上海集成电路研发中心有限公司 Gesture identifying device and recognition methods
CN109550233A (en) * 2018-11-15 2019-04-02 东南大学 Autism child attention training system based on augmented reality
CN112462940A (en) * 2020-11-25 2021-03-09 苏州科技大学 Intelligent home multi-mode man-machine natural interaction system and method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160132125A1 (en) * 2014-11-07 2016-05-12 HONG FU JIN PRECISION INDUSTRY ShenZhen) CO., LTD. System and method for generating gestures
CN106778670A (en) * 2016-12-30 2017-05-31 上海集成电路研发中心有限公司 Gesture identifying device and recognition methods
CN109550233A (en) * 2018-11-15 2019-04-02 东南大学 Autism child attention training system based on augmented reality
CN112462940A (en) * 2020-11-25 2021-03-09 苏州科技大学 Intelligent home multi-mode man-machine natural interaction system and method thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117492568A (en) * 2023-11-15 2024-02-02 杭州稚爱教育科技有限公司 Toy interaction identification method and system based on convolutional neural network
CN117492568B (en) * 2023-11-15 2024-04-26 杭州稚爱教育科技有限公司 Toy interaction identification method and system based on convolutional neural network

Similar Documents

Publication Publication Date Title
CN108805087B (en) Time sequence semantic fusion association judgment subsystem based on multi-modal emotion recognition system
CN108805089B (en) Multi-modal-based emotion recognition method
CN108899050B (en) Voice signal analysis subsystem based on multi-modal emotion recognition system
CN108877801B (en) Multi-turn dialogue semantic understanding subsystem based on multi-modal emotion recognition system
CN109409296B (en) Video emotion recognition method integrating facial expression recognition and voice emotion recognition
Yu et al. A multimodal learning interface for grounding spoken language in sensory perceptions
CN110990543A (en) Intelligent conversation generation method and device, computer equipment and computer storage medium
CN108346427A (en) A kind of audio recognition method, device, equipment and storage medium
CN103996155A (en) Intelligent interaction and psychological comfort robot service system
CN101187990A (en) A session robotic system
JPH04329598A (en) Message recognition method and apparatus using consolidation type information of vocal and hand writing operation
WO2015171646A1 (en) Method and system for speech input
Liu et al. Re-synchronization using the hand preceding model for multi-modal fusion in automatic continuous cued speech recognition
Hao et al. A survey of research on lipreading technology
CN112784696A (en) Lip language identification method, device, equipment and storage medium based on image identification
CN111554279A (en) Multi-mode man-machine interaction system based on Kinect
Lim et al. Emotion Recognition by Facial Expression and Voice: Review and Analysis
CN111462762B (en) Speaker vector regularization method and device, electronic equipment and storage medium
Atkar et al. Speech Emotion Recognition using Dialogue Emotion Decoder and CNN Classifier
Ballard et al. A multimodal learning interface for word acquisition
Chinmayi et al. Emotion Classification Using Deep Learning
Akinpelu et al. Lightweight Deep Learning Framework for Speech Emotion Recognition
CN115455136A (en) Intelligent digital human marketing interaction method and device, computer equipment and storage medium
CN115145402A (en) Intelligent toy system with network interaction function and control method
US11681364B1 (en) Gaze prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20221004

RJ01 Rejection of invention patent application after publication