CN118092668A - Multi-mode input man-machine interaction method and device and electronic equipment - Google Patents
Multi-mode input man-machine interaction method and device and electronic equipment Download PDFInfo
- Publication number
- CN118092668A CN118092668A CN202410422853.8A CN202410422853A CN118092668A CN 118092668 A CN118092668 A CN 118092668A CN 202410422853 A CN202410422853 A CN 202410422853A CN 118092668 A CN118092668 A CN 118092668A
- Authority
- CN
- China
- Prior art keywords
- state
- data
- user
- input
- mode
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 230000003993 interaction Effects 0.000 title claims abstract description 36
- 238000012545 processing Methods 0.000 claims abstract description 56
- 230000007613 environmental effect Effects 0.000 claims abstract description 53
- 230000033001 locomotion Effects 0.000 claims abstract description 51
- 230000006399 behavior Effects 0.000 claims description 104
- 230000008859 change Effects 0.000 claims description 78
- 230000029058 respiratory gaseous exchange Effects 0.000 claims description 24
- 230000001133 acceleration Effects 0.000 claims description 17
- 230000036387 respiratory rate Effects 0.000 claims description 15
- 238000007781 pre-processing Methods 0.000 claims description 12
- 230000009471 action Effects 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 7
- 238000004140 cleaning Methods 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 6
- 230000000241 respiratory effect Effects 0.000 claims description 6
- 230000000284 resting effect Effects 0.000 abstract description 18
- 210000003414 extremity Anatomy 0.000 description 45
- 230000008569 process Effects 0.000 description 14
- 230000006870 function Effects 0.000 description 10
- 238000004891 communication Methods 0.000 description 9
- 230000003247 decreasing effect Effects 0.000 description 9
- 238000000605 extraction Methods 0.000 description 7
- 230000003542 behavioural effect Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 230000007958 sleep Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000035790 physiological processes and functions Effects 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 230000002618 waking effect Effects 0.000 description 3
- 230000003213 activating effect Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000035565 breathing frequency Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000007500 overflow downdraw method Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 208000037656 Respiratory Sounds Diseases 0.000 description 1
- 206010041235 Snoring Diseases 0.000 description 1
- 206010044565 Tremor Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009118 appropriate response Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005792 cardiovascular activity Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000010921 in-depth analysis Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000037081 physical activity Effects 0.000 description 1
- 230000008560 physiological behavior Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000002040 relaxant effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000003860 sleep quality Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000010415 tidying Methods 0.000 description 1
- 210000000707 wrist Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0487—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
- G06F3/0488—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The application provides a man-machine interaction method and device for multi-mode input and electronic equipment, and relates to the technical field of data processing. In the method, behavior data of a user is acquired, wherein the behavior data comprises limb data and voice data; acquiring environment light intensity data of an environment in which a user is located; the method comprises the steps of adopting a preset recognition model to recognize limb data and voice data, and generating a first recognition result, wherein the first recognition result is used for indicating a behavior state of a user, and the behavior state is a resting state or a movement state; the environmental light intensity data are identified by adopting a preset identification model, and a second identification result is generated, wherein the second identification result is used for indicating the environmental state of the environment where the user is located, and the environmental state is a daytime state or a night state; if the behavior state is determined to be a rest state and the environment state is determined to be a night state, setting a target mode input mode as voice input, wherein the target mode input mode is a mode input mode of user equipment corresponding to a user; if the behavior state is determined to be a motion state and the environment state is determined to be a daytime state, setting the target mode input mode as gesture input and/or touch input. By implementing the technical scheme provided by the application, the intelligent self-adaptive switching of the mode input mode between the user and the user equipment is convenient to realize.
Description
Technical Field
The application relates to the technical field of data processing, in particular to a man-machine interaction method and device for multi-mode input and electronic equipment.
Background
In the current digital age, interactions between users and user devices are becoming increasingly important. The mode input modes, such as voice, gesture, touch control and the like, are key bridges for information communication between the user and the device.
In reality, the input mode of a user may change with the change of various factors such as scenes and demands. For example, in some cases, the user may prefer to use voice input, while in other cases, gestures or touch may be more convenient. However, the related art cannot recognize these changes intelligently, and thus cannot switch to the most suitable input mode automatically. This not only affects the user experience, but may also lead to input errors or operational inconveniences.
Therefore, how to implement intelligent adaptive switching of the mode input between the user and the user device is a current urgent problem to be solved.
Disclosure of Invention
The application provides a man-machine interaction method and device for multi-mode input and electronic equipment, which are convenient for realizing intelligent self-adaptive switching of mode input modes between users and user equipment.
In a first aspect of the present application, a human-computer interaction method for multimodal input is provided, the method comprising: acquiring behavior data of a user, wherein the behavior data comprise limb data and voice data; acquiring environment light intensity data of an environment where the user is located; the limb data and the voice data are identified by adopting a preset identification model, and a first identification result is generated, wherein the first identification result is used for indicating the behavior state of the user, and the behavior state is a rest state or a motion state; the preset recognition model is adopted to recognize the environment light intensity data, and a second recognition result is generated, wherein the second recognition result is used for indicating the environment state of the environment where the user is located, and the environment state is a daytime state or a night state; if the behavior state is the rest state and the environment state is the night state, setting a target mode input mode as voice input, wherein the target mode input mode is a mode input mode of user equipment corresponding to the user; and if the behavior state is determined to be the motion state and the environment state is determined to be the daytime state, setting the target mode input mode as gesture input and/or touch input.
By adopting the technical scheme, the system can more accurately understand the current behavior state of the user by acquiring and analyzing the behavior data of the user, and the personalized recognition enables the system to recommend a more proper input mode for the user, so that the personalized degree of user experience is improved. By combining with the identification of the environment light intensity data, the system can judge the current environment state of the user, and the environment sensing capability enables the switching of the mode input mode to be more in line with the actual use scene, so that the operation convenience of the user in different environments is improved. When the system determines that the user is in a resting state and the environment is at night, the voice input mode is recommended, so that the difficulty of the user in finding a touch or gesture operation point in a dark environment can be reduced, and the operation efficiency is improved. Conversely, when the user is in motion and the environment is daytime, gestures or touch input modes are recommended, which more accords with the operation habit of the user in such a scene. By intelligently identifying the behavior state and the environment state of the user, the system can reduce input errors caused by misoperation or unsuitable input modes, thereby improving the interaction accuracy of the user and the equipment. Therefore, intelligent self-adaptive switching of the mode input mode between the user and the user equipment is facilitated.
Optionally, the acquiring behavior data of the user specifically includes: receiving initial limb data and initial voice data sent by wearable equipment worn by a user; analyzing the initial limb data and the initial voice data to obtain heart rate data, respiratory data and body acceleration data; and preprocessing the heart rate data, the breathing data and the body acceleration data to obtain the behavior data, wherein the preprocessing comprises data cleaning, data denoising, data filtering and data normalization.
By adopting the technical scheme, the system can acquire the limb data and the voice data of the user in real time and continuously through the wearable equipment worn by the user. The data acquisition mode not only has high convenience, but also can provide rich and various user behavior information, and provides a solid foundation for subsequent analysis and judgment. By further analyzing the initial data, the system can extract key physiological parameters such as heart rate, respiration, body acceleration and the like. The parameters can intuitively reflect the current physical state and behavior mode of the user, and provide powerful basis for the subsequent mode input mode switching. The preprocessing process is a key step to ensure data quality. Through data cleaning, denoising, filtering and normalization processing, the system can eliminate noise and interference in the original data, and improve the accuracy and reliability of the data. Meanwhile, the normalization processing can enable data with different sources and different scales to have uniform dimensions, and subsequent analysis and comparison are facilitated. By preprocessing the raw data, the system can better cope with variations in various complex environments and user behaviors. Even if the noise is large or the data is unstable, the system can keep high recognition accuracy and stability, and the switching of the mode input mode is ensured to be accurate.
Optionally, the step of identifying the limb data and the voice data by using a preset identification model to generate a first identification result specifically includes: extracting features of the limb data to obtain a first feature change curve, wherein the first feature change curve is drawn by a plurality of heart rate features; extracting features of the limb data to obtain a second feature change curve, wherein the second feature change curve is drawn by a plurality of limb action features; extracting features of the voice data to obtain a third feature change curve, wherein the third feature change curve is drawn by a plurality of breathing features; and fusing the first characteristic change curve, the second characteristic change curve and the third characteristic change curve to obtain the first identification result.
By adopting the technical scheme, the system can accurately acquire the key information related to the behavior state of the user by extracting the characteristics of the limb data and the voice data. For example, heart rate characteristics, limb motion characteristics, and respiration characteristics are all important indicators reflecting the current behavior state of the user. This accurate feature extraction provides a reliable data basis for subsequent judgment and identification. The extracted characteristics are drawn into a change curve, so that the behavior state change of the user is more visual and easy to analyze. By observing the variation trend and fluctuation condition of the curves, the system can more accurately judge whether the user is in a rest state or a motion state. By fusing characteristic change curves of heart rate characteristics, limb action characteristics, respiratory characteristics and the like, the system can comprehensively consider information of multiple aspects, so that a more comprehensive and accurate recognition result is obtained. The multi-feature fusion method can make up for the possible limitation of single feature identification, and improves the accuracy and reliability of identification. Because the physiological characteristics and behavior habits of each user are different, the system can more accurately identify the personalized behavior states of different users by combining a plurality of characteristics for identification. The method is beneficial to recommending the modal input mode which better accords with personal requirements and habits of users, and improves the individuation degree of user experience.
Optionally, before the behavior state is determined to be the rest state and the environment state is the night state, judging that the behavior state is the rest state or the motion state; the judging that the behavior state is the rest state or the motion state specifically includes: in a preset first time period, if the first characteristic change curve indicates that the heart rate of the user is in a descending trend, and the second characteristic change curve indicates that the respiratory rate of the user is in a descending trend, determining that the behavior state is a rest state; and in the preset first time period, if the first characteristic change curve indicates that the heart rate of the user is in a non-descending trend, and the second characteristic change curve indicates that the respiratory rate of the user is in a non-descending trend, determining that the behavior state is a motion state.
By adopting the technical scheme, the system can accurately judge the current behavior state of the user by comprehensively considering the change trend of the two key physiological indexes of the heart rate and the respiratory rate. The decreasing trend of heart rate and decreasing trend of respiratory rate are typical features of resting states, whereas the non-decreasing trend of heart rate and respiratory rate is usually related to exercise states. The method based on multi-feature comprehensive judgment improves the accuracy of state identification. By setting the preset first time period, the system can quickly judge the behavior state of the user in a short time. The switching of the mode input modes can timely respond to the change of the user state, and the instantaneity of user experience is improved. There may be differences in the physiological parameter changes of different users at rest and during exercise. By continuously accumulating data and optimizing the model, the system can gradually adapt to personalized features of different users, and more accurate state identification is realized. Accurately judging the behavior state of the user is the basis for switching the mode input modes subsequently. Only after the user state is correctly identified, the system can recommend a proper input mode according to the specific requirements and the scene of the user. Therefore, the state judgment process in the description provides reliable basis for subsequent intelligent self-adaptive switching. By combining the physiological data and the behavior mode of the user to perform state judgment, the system shows higher intelligence. This data-driven based approach enables the system to more autonomously understand the needs and intent of the user, thereby providing a more intelligent service.
Optionally, the identifying the environmental light intensity data by using the preset identification model to generate a second identification result specifically includes: obtaining an environment light intensity value of the environment where the user is located according to the environment light intensity data; judging the magnitude relation between the environment light intensity value and a preset threshold value to obtain the second identification result; judging whether the environmental state is the daytime state or the night state before the behavior state is the rest state and the environmental state is the night state if the behavior state is determined; the judging that the environmental state is the daytime state or the night state specifically includes: in a preset second time period, if the ambient light intensity value is greater than or equal to the preset threshold value, determining that the ambient state is the daytime state; and in the preset second time period, if the environment light intensity value is smaller than the preset threshold value, determining that the environment state is the night state.
By adopting the technical scheme, the system can rapidly judge whether the current environment is daytime or night by directly comparing the environment light intensity value with the preset threshold value. The simple comparison logic ensures the rapidity of judgment, and the setting of the preset threshold value also ensures the accuracy of judgment. The recognition result of the environment state provides an important basis for the self-adaptive switching of the mode input mode of the user equipment. During the day, the user may prefer to use gestures or touch inputs, while during the night, voice inputs may be more convenient and safe. The system automatically adjusts the input mode according to the environment state, and the consistency and convenience of user experience are improved. By judging the light intensity value in the preset second time period, the system can consider the stability of the environment light intensity and reduce misjudgment caused by short-time light intensity change. This time window arrangement improves the stability of the system to environmental state recognition. By automatically analyzing and judging the environmental light intensity data, the system has higher intelligence. The interaction mode of the equipment can be automatically adjusted according to the change of the environmental state, manual setting or adjustment of a user is not needed, and the convenience and the intelligent level of equipment use are improved.
Optionally, the method further comprises: responding to the input operation of the user, wherein the input operation comprises a mode input mode setting instruction and a mode input time setting instruction; determining a modal input execution time period according to the modal input time setting instruction; determining a mode input custom mode according to the mode input mode setting instruction; and setting the target mode input mode as the mode input self-defining mode according to the mode input execution time period.
By adopting the technical scheme, the system can better meet the personalized requirements of the user by allowing the user to set the modal input mode. Different users may have different usage habits and preferences, e.g., some users may prefer using voice input while some users prefer gestures or touch. The user is allowed to customize the input mode, and the satisfaction degree and the use experience of the user can be greatly improved. The user can set not only the mode input mode but also the execution time period of the mode input. The flexibility enables the user to customize the input modes of different time periods according to the schedule and the use scene of the user. The system can respond to the input operation instruction of the user and automatically adjust the mode input mode, so that the intelligent level of the system is embodied. The system does not simply execute the preset command any more, but can carry out self-adaptive adjustment according to the setting of the user, thereby providing more intelligent service. The user can complete the setting of the mode input mode and the execution time period only through simple input operation, and no complex operation flow is needed. The convenience enables the user to easily perform personalized setting, and improves the overall use efficiency.
Optionally, the method further comprises: if the input operation is a voice input operation, performing voice recognition on the voice input operation to obtain text data; matching the text data with the preset recognition model to obtain a target instruction, wherein the preset recognition model is pre-stored with a corresponding relation between the text data and the target instruction; and switching the target mode input mode according to the target instruction.
By adopting the technical scheme, the user is allowed to interact with the system through voice input operation, and convenience of user operation is greatly improved. The user can complete the switching of the mode input modes only by a simple voice command without complex manual operation. The voice is converted into text data through the voice recognition technology, so that errors possibly occurring when a user inputs the text data manually are reduced. The voice recognition system can accurately recognize the voice command of the user and convert the voice command into a corresponding text, so that the input accuracy and efficiency are improved. And matching the text data into a target instruction by using a preset recognition model, so that intelligent processing and response of the system to voice input are realized. The corresponding relation between the text data and the target instruction is prestored in the preset identification model, so that the system can quickly understand the intention of the user and correspondingly operate. Because the preset recognition model can preset the corresponding relation between the text data and the target instruction, the system can support the user-defined voice instruction. The user can set specific voice instructions to control the switching of the mode input modes according to the preference and the demand of the user, so that the flexibility and individuation of the system are improved. Through voice recognition and intelligent matching technology, the system can more accurately understand and execute voice instructions of the user, so that the use experience of the user is improved. The user can feel the intellectualization and high efficiency of the system, and the trust and satisfaction degree of the system are enhanced.
In a second aspect of the present application, a human-computer interaction device with multi-modal input is provided, where the human-computer interaction device includes an acquisition module and a processing module, where the acquisition module is configured to acquire behavior data of a user, where the behavior data includes limb data and voice data; the acquisition module is also used for acquiring environment light intensity data of the environment where the user is located; the processing module is used for identifying the limb data and the voice data by adopting a preset identification model to generate a first identification result, wherein the first identification result is used for indicating the behavior state of the user, and the behavior state is a rest state or a motion state; the processing module is further configured to identify the environmental light intensity data by using the preset identification model, and generate a second identification result, where the second identification result is used to indicate an environmental state of an environment where the user is located, and the environmental state is a daytime state or a night state; the processing module is further configured to set a target mode input mode as a voice input if the behavior state is determined to be the rest state and the environment state is determined to be the night state, where the target mode input mode is a mode input mode of a user device corresponding to the user; the processing module is further configured to set the target mode input mode as gesture input and/or touch input if the behavior state is determined to be the motion state and the environment state is determined to be the daytime state.
In a third aspect of the application there is provided an electronic device comprising a processor, a memory for storing instructions, a user interface and a network interface, both for communicating to other devices, the processor being for executing instructions stored in the memory to cause the electronic device to perform a method as described above.
In a fourth aspect of the application there is provided a computer readable storage medium storing instructions which, when executed, perform a method as described above.
In summary, one or more technical solutions provided in the embodiments of the present application at least have the following technical effects or advantages:
1. By acquiring and analyzing the behavior data of the user, the system can more accurately understand the current behavior state of the user, and the personalized recognition enables the system to recommend a more proper input mode for the user, so that the individuation degree of user experience is improved. By combining with the identification of the environment light intensity data, the system can judge the current environment state of the user, and the environment sensing capability enables the switching of the mode input mode to be more in line with the actual use scene, so that the operation convenience of the user in different environments is improved. When the system determines that the user is in a resting state and the environment is at night, the voice input mode is recommended, so that the difficulty of the user in finding a touch or gesture operation point in a dark environment can be reduced, and the operation efficiency is improved. Conversely, when the user is in motion and the environment is daytime, gestures or touch input modes are recommended, which more accords with the operation habit of the user in such a scene. By intelligently identifying the behavior state and the environment state of the user, the system can reduce input errors caused by misoperation or unsuitable input modes, thereby improving the interaction accuracy of the user and the equipment. Therefore, the intelligent self-adaptive switching of the mode input mode between the user and the user equipment is convenient to realize;
2. By performing feature extraction on the limb data and the voice data, the system can accurately acquire key information related to the behavior state of the user. For example, heart rate characteristics, limb motion characteristics, and respiration characteristics are all important indicators reflecting the current behavior state of the user. This accurate feature extraction provides a reliable data basis for subsequent judgment and identification. The extracted characteristics are drawn into a change curve, so that the behavior state change of the user is more visual and easy to analyze. By observing the variation trend and fluctuation condition of the curves, the system can more accurately judge whether the user is in a rest state or a motion state. By fusing characteristic change curves of heart rate characteristics, limb action characteristics, respiratory characteristics and the like, the system can comprehensively consider information of multiple aspects, so that a more comprehensive and accurate recognition result is obtained. The multi-feature fusion method can make up for the possible limitation of single feature identification, and improves the accuracy and reliability of identification. Because the physiological characteristics and behavior habits of each user are different, the system can more accurately identify the personalized behavior states of different users by combining a plurality of characteristics for identification. The method is beneficial to recommending the modal input mode which better accords with personal requirements and habits of users, and improves the individuation degree of user experience.
Drawings
Fig. 1 is a schematic flow chart of a man-machine interaction method for multi-modal input according to an embodiment of the present application.
Fig. 2 is another flow chart of a man-machine interaction method for multi-modal input according to an embodiment of the present application.
Fig. 3 is a schematic block diagram of a man-machine interaction device with multi-modal input according to an embodiment of the present application.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Reference numerals illustrate: 31. an acquisition module; 32. a processing module; 41. a processor; 42. a communication bus; 43. a user interface; 44. a network interface; 45. a memory.
Detailed Description
In order that those skilled in the art will better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments.
In describing embodiments of the present application, words such as "for example" or "for example" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "such as" or "for example" in embodiments of the application should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "or" for example "is intended to present related concepts in a concrete fashion.
In the description of embodiments of the application, the term "plurality" means two or more. For example, a plurality of systems means two or more systems, and a plurality of screen terminals means two or more screen terminals. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating an indicated technical feature. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
The importance of man-machine interaction is increasingly highlighted in the current information-based society. Particularly, the modal input technology comprises voice, gestures, touch control and the like, and the modal input technology plays a vital role in improving user experience as an important means for a user to communicate with the intelligent device. However, the user's preferences for the manner of input may also vary under different circumstances and requirements.
For example, in a sleeping environment, voice input would be more efficient in order to facilitate the user's wake up the next day to use the user device; in a wake-up environment, the user may prefer to use touch or gesture operation when a large amount of information needs to be input quickly. The prior art has a certain limitation in the aspects of intelligently identifying the user demands and automatically switching to the most suitable input mode, which not only reduces the operation convenience of the user, but also can lead to inaccuracy of information input. Therefore, how to implement intelligent adaptive switching of the mode input between the user and the user device is a current urgent problem to be solved.
In order to solve the above technical problems, the present application provides a man-machine interaction method for multi-modal input, and referring to fig. 1, fig. 1 is a flow chart of a man-machine interaction method for multi-modal input according to an embodiment of the present application. The man-machine interaction method is applied to a server and comprises the following steps of S110 to S160:
s110, acquiring behavior data of a user, wherein the behavior data comprise limb data and voice data.
In particular, the server is collecting data about the behavior of the user from some source, which may be the user's device, sensor, or other data source. Such data is typically used to analyze the user's behavioral patterns, habits, preferences, etc., to provide more personalized services to the user or to perform related data analysis. In the embodiment of the present application, a server may be understood as a server group managing a plurality of user devices, for example, a total control platform in a smart home.
Limb data refers to data relating to the user's body movements or functions. Such data may come from various sensors, such as accelerometers, gyroscopes, depth cameras, etc. in a smart watch. For example, when a user uses a smart watch, sensors on the device may capture the user's motion state, such as posture, heart rate, and respiration. Speech data refers to information related to the user's voice, which is collected by a microphone or other sound capturing device. For example, in an environment where the user is sleeping in bed, the breathing sound of the user, snoring sound, and the like may be collected.
In one possible implementation manner, the method for obtaining the behavior data of the user specifically includes: receiving initial limb data and initial voice data sent by wearable equipment worn by a user; analyzing the initial limb data and the initial voice data to obtain heart rate data, respiratory data and body acceleration data; and preprocessing heart rate data, respiratory data and body acceleration data to obtain behavior data, wherein the preprocessing comprises data cleaning, data denoising, data filtering and data normalization.
In particular, wearable devices worn by a user, such as smart bracelets, watches, headphones, etc., constantly monitor the user's body movements and sounds and send these data as initial data to a server or data processing system. The initial limb data may include a user's motion trajectory, acceleration, angular velocity, etc.; the initial speech data is the waveform of the sound emitted by the user. The system may perform in-depth analysis of the received initial data to extract information related to the physiological state and behavior pattern of the user. For example, by analyzing acceleration changes in limb data, the user's state of motion and physical activity level can be inferred; by analyzing the frequency and amplitude variations in the speech data, the emotional state or speech pattern of the user can be deduced. Heart rate data and respiration data are obtained by analyzing limb data (such as pulse waves) and reflect the physiological state of the user. Preprocessing is an indispensable step in data analysis, aimed at improving the quality and consistency of data for subsequent analysis and modeling. The data cleaning is used for removing abnormal values, repeated values or invalid data, and ensures the accuracy of the data. Data denoising is used to eliminate data fluctuations caused by sensor noise, external interference, and the like. Data filtering is used to smooth data through a filter, reducing high frequency noise. Data normalization converts data into the same scale or range, facilitating subsequent data processing and model training.
For example, assume that a user wears a smart bracelet with heart rate monitoring and acceleration sensors, as well as a smart headset with voice recognition functionality. The wrist strap and the earphone continuously monitor the physiological state and the movement condition of the user, and send the data, such as heart rate data, acceleration data and voice waveforms, to a mobile phone or a cloud server of the user in real time. After the server receives the data, a series of analyses are performed. For example, by analyzing heart rate data on the wristband, it can be known whether the user is currently in tension or relaxation; by analyzing the acceleration data, whether the user is walking or stationary can be judged; by analyzing the voice data of the headset, voice instructions or sleep breathing variations of the user can be identified. After obtaining these raw data, the server will perform preprocessing. For example, for heart rate data, outliers due to loose donning of the wristband or motion disturbances may be removed; for acceleration data, a filter may be applied to smooth the data, reducing noise due to hand tremble or device errors; for voice data, noise reduction processing may be performed, so as to improve the accuracy of voice recognition. After these steps, the server will obtain a set of cleaned, denoised, filtered and normalized behavior data that can be used for further user behavior analysis, health monitoring or personalized service provision.
S120, acquiring environment light intensity data of the environment where the user is located.
In particular, the server may actively acquire or passively receive ambient light intensity data regarding the environment in which the user is located. The ambient light intensity data refers to the intensity and brightness information of the light in the environment in which the user is located. This is typically measured by a photosensitive sensor (e.g., a photodiode) in the smart home device. The light intensity data may reflect the brightness level of the environment in which the user is located, such as indoor lighting, outdoor sunlight, dusk, or darkness at night, etc. For the intelligent home system, the environmental light intensity data can trigger the server to control the automatic switch or adjustment of the lighting system, so as to provide a proper lighting environment, which is not described herein.
For example, suppose a user installs a smart lighting system in his home that includes a plurality of smart light bulbs and a central controller (server or smart home gateway connected to the server). Each room in the user's home is fitted with one or more light sensitive sensors that are capable of detecting the light intensity in the room in real time and transmitting the data to a central controller. After the central controller receives the environmental light intensity data from each sensor, the data can be further sent to the cloud server for processing and storage.
S130, recognizing limb data and voice data by adopting a preset recognition model, and generating a first recognition result, wherein the first recognition result is used for indicating the behavior state of a user, and the behavior state is a rest state or a motion state.
Specifically, the "preset recognition model" referred to herein is a model that is built in advance, contains knowledge or rules learned from a large amount of data, and is capable of recognizing a specific pattern or feature. This model is built based on neural networks and has been trained and optimized to be able to accurately identify input limb data and speech data. The limb data includes acceleration, posture, position, etc. of the user, reflecting the user's body movements, posture changes, and sleep states. The voice data includes information such as voice waveform and frequency of the user, and reflects voice characteristics, emotion states, sleep quality and the like of the user. The identification process is to input the original data into a preset identification model, and the model analyzes and classifies the data according to the learned knowledge. After the recognition model is processed, a first recognition result is output. The first recognition result is a description or classification of the current behavior state of the user, which may be a tag, a probability distribution, or other form of data. Behavioral states are terms describing the state of activity or performance currently being performed by a user, such as resting state or movement state. In the embodiment of the application, the user equipment is mainly used for resting and waking up or the user equipment is used for waking up when the user moves, namely, the resting state can be a sleep state and the movement state is a waking state.
In one possible implementation manner, the method for identifying the limb data and the voice data by adopting the preset identification model, and generating the first identification result specifically includes: extracting features of limb data to obtain a first feature change curve, wherein the first feature change curve is drawn by a plurality of heart rate features; extracting features of the limb data to obtain a second feature change curve, wherein the second feature change curve is drawn by a plurality of limb action features; extracting the characteristics of the voice data to obtain a third characteristic change curve, wherein the third characteristic change curve is drawn by a plurality of breathing characteristics; and fusing the first characteristic change curve, the second characteristic change curve and the third characteristic change curve to obtain a first identification result.
Specifically, the limb data contains information about the heart rate of the user, and the heart rate characteristics can be extracted by processing and analyzing the data. These heart rate characteristics may include heart rate values, heart rate variability, etc., which reflect the cardiovascular activity of the user. The change of the heart rate characteristics along with time is drawn into a curve, namely a first characteristic change curve, so that the dynamic change of the heart rate of the user can be intuitively displayed, and whether the user is in a sleep rest state or not is reflected. In addition to heart rate characteristics, the limb data also contains limb action information of the user, such as acceleration, angular velocity and the like. By processing and analyzing the data, the limb movement characteristics such as movement amplitude, frequency and the like can be extracted. The change of the limb movement characteristics along with time is plotted into a curve, namely a second characteristic change curve, so that the change condition of the limb movement of the user can be displayed. For example, actions of playing a cell phone or tidying up a pillow before the user sleeps may be displayed. The voice data contains breathing information of the user, and the breathing characteristics can be extracted by analyzing the voice waveform. These breathing characteristics may include breathing frequency, depth of breathing, etc., which reflect the breathing state of the user. Plotting these changes in breathing characteristics over time, i.e. the third characteristic change curve, can demonstrate the dynamic changes in the user's breathing. Fusion is the process of combining multiple feature variation curves into one comprehensive feature representation. In the fusion process, the weights and the correlations of different characteristic curves can be considered so as to ensure that the fused characteristics can accurately reflect the behavior state of the user. The fused characteristic change curve or the integrated characteristic is used as a first recognition result to indicate the behavior state of the user.
And S140, recognizing the environmental light intensity data by adopting a preset recognition model, and generating a second recognition result, wherein the second recognition result is used for indicating the environmental state of the environment where the user is located, and the environmental state is a daytime state or a night state.
Specifically, the model is designed according to the environmental light intensity data at the same time, and the environment states corresponding to different light intensity levels can be identified after training. The environmental light intensity data contains specific values of illumination intensity, change trend and other information. The recognition process is to input the environmental light intensity data into a preset recognition model, and the model analyzes and classifies the data according to the learned knowledge. After the recognition model is processed, a second recognition result is output. The environmental status refers to whether the user is currently in the daytime or the nighttime. And through the second recognition result, the server can know the current environment state and respond or adjust correspondingly.
For example, suppose a user installs an intelligent lighting system at home and connects to a sensor that can measure the intensity of ambient light. The sensor continuously monitors the light intensity of the environment and transmits the data to a preset identification model in real time. After the recognition model receives the light intensity data, the recognition model can analyze according to the learned rule. If the data shows that the current environment is very high in light intensity, and exceeds a certain threshold, the model can judge that the current environment is in a daytime state. Conversely, if the intensity is very low, even near zero, the model will determine that it is now in the night state. The model outputs a second recognition result, i.e., the environmental status is "daytime" or "night time".
And S150, if the behavior state is determined to be in a rest state and the environment state is determined to be in a night state, setting a target mode input mode as voice input, wherein the target mode input mode is a mode input mode of user equipment corresponding to a user.
Specifically, first, the server recognizes the behavior state and the environmental state of the user through a preset recognition model. The behavioral state is determined to be a resting state, meaning that the user is currently in a resting or relaxed state; the ambient state is determined to be a night state, indicating that the current ambient light is weak, possibly at night or in a dim room. After determining that the behavior state of the user is a rest state and the environment state is a night state, the server automatically adjusts the mode input mode of the user equipment according to the conditions. The mode input mode refers to a mode of user interaction with the device, such as touch input, voice input, and the like. In this example, the server selects a voice input as the target modality input mode. This is because in the resting state, the user may be unwilling or inconvenient to perform complicated operations such as touching the screen; while in the night state, the touch screen may be inconvenient or difficult due to insufficient light. Therefore, voice input becomes a more suitable and convenient interaction mode. Once it is determined that the target modal input mode is voice input, the server will perform corresponding adjustment on the user equipment. This includes activating the voice recognition function of the device, adjusting microphone sensitivity, etc., to ensure that the user is able to interact with the device through voice.
Types of user equipment include, but are not limited to: android (Android) system equipment, mobile operating system (iOS) equipment developed by apple corporation, personal Computers (PCs), global area network (Web) equipment, virtual Reality (VR) equipment, augmented Reality (Augmented Reality, AR) equipment and other equipment. In the embodiment of the application, the user equipment may be a tablet.
S160, if the behavior state is determined to be a motion state and the environment state is determined to be a daytime state, setting the target mode input mode as gesture input and/or touch input.
Specifically, the server accurately judges the behavior state and the environment state of the user through a preset identification model. The behavior state is determined as a motion state, meaning that the user is currently in an active or motion state; the ambient condition is determined to be a daytime condition, indicating that the current ambient light is sufficient, possibly daytime or room lighting is good. According to the identification results, the server decides to adjust the mode input mode of the user equipment to adapt to the current state and environmental conditions of the user. In this case, the server selects the gesture input and/or the touch input as the target modality input mode. This is because the user may be more inclined to interact with the device using a non-contact gesture operation or a direct touch operation in a moving state. Once the target mode input mode is determined, the server carries out corresponding configuration on the user equipment. This includes activating a touch screen of the device, optimizing gesture recognition algorithms, adjusting sensor sensitivity, etc., to ensure that a user can conveniently and accurately interact with the device through gestures or touches.
In one possible implementation, before determining that the behavior state is the rest state and the environmental state is the night state, determining that the behavior state is the rest state or the movement state; judging the behavior state as a rest state or a motion state, specifically including: in a preset first time period, if the first characteristic change curve indicates that the heart rate of the user is in a descending trend, and the second characteristic change curve indicates that the respiratory rate of the user is in a descending trend, determining that the behavior state is a rest state; and in a preset first time period, if the first characteristic change curve indicates that the heart rate of the user is in a non-descending trend, and the second characteristic change curve indicates that the respiratory rate of the user is in a non-descending trend, determining that the behavior state is a motion state.
Specifically, before determining the environmental state of the user, the server first needs to determine whether the current behavior state of the user is a resting state or a movement state. This determination is made based on a change in a physiological characteristic of the user over a preset first period of time. The server judges the heart rate variation trend of the user by analyzing the first characteristic variation curve. During a preset first period of time, if it is determined that the heart rate is in a decreasing trend, this generally means that the heart rate of the user gradually slows down, possibly in a resting or relaxed state. Conversely, if the heart rate is not decreasing, it may indicate that the user is in an active or exercise state. At the same time, the server also analyzes the second characteristic change curve to determine a change in the breathing rate of the user. A decrease in respiratory rate is typically associated with a relaxed or resting state, while a hold or increase in respiratory rate may be indicative of a user being in motion or active state. The server can judge the behavior state of the user by integrating the change trend of the heart rate and the breathing frequency. If the heart rate and the respiratory rate are both in a descending trend, the system determines that the user is currently in a resting state; if both are not descending, the system determines that the user is currently in motion. The preset time period is set by user definition and can be 10 minutes or 15 minutes.
For example, assume that the user starts to lie on a sofa for rest after coming home at night. At this point, the server begins to monitor the heart rate and respiratory rate of the user. The heart rate may also be relatively high just as the user lies down, but gradually decreases over time, reflecting that the user is gradually relaxing and resting. At the same time, the user's breathing rate is also gradually slowed down, turning from a faster breath to a smooth or slower breath, which is also a feature of resting state. Within a preset period of 15 minutes, the server continuously monitors that the heart rate and the respiratory rate of the user are both in a descending trend. Based on the data, the server determines that the current behavior state of the user is a rest state.
In one possible implementation manner, the identifying the environmental light intensity data by using a preset identifying model, and generating a second identifying result specifically includes: obtaining an environment light intensity value of the environment where the user is located according to the environment light intensity data; judging the magnitude relation between the environment light intensity value and a preset threshold value to obtain a second identification result; judging whether the environmental state is a daytime state or a night state before determining that the behavior state is a rest state and the environmental state is a night state; judging whether the environment state is a daytime state or a night state, specifically comprising: in a preset second time period, if the environment light intensity value is larger than or equal to a preset threshold value, determining that the environment state is a daytime state; and in a preset second time period, if the environment light intensity value is smaller than the preset threshold value, determining that the environment state is a night state.
Specifically, the server first receives ambient light intensity data, which is typically provided by a photosensitive sensor, reflecting the light intensity of the environment in which the user is located. And the server calculates or extracts a specific light intensity value of the current environment according to the received environment light intensity data. This light intensity value is a quantization index for subsequent determination of the environmental condition. The server is preset with one or more thresholds for distinguishing the environmental states under different lighting conditions. These thresholds are typically determined based on empirical or experimental data and can reflect typical differences in the light intensity between day and night. The server can initially determine the environmental status by comparing the current ambient light intensity value with these preset thresholds. And generating a second identification result by the system according to the comparison result of the environment light intensity value and the preset threshold value, namely determining that the environment state is a daytime state or a night state. Before determining the behavior state of the user, the system first determines the environmental state. This is to comprehensively evaluate the context in which the user is currently located for subsequent appropriate responses. To avoid the interference of the instantaneous light intensity variation on the judgment, the server typically monitors and analyzes the ambient light intensity value for a preset second period of time. The choice of this time period depends on the specific application scenario and the accuracy requirements. If the environmental light intensity value is continuously greater than or equal to the preset threshold value in the period of time, the server considers that the current environmental light is sufficient, and the daytime state is judged. If the ambient light intensity value is continuously lower than the preset threshold value, the server considers that the current ambient light is weaker, and judges that the current ambient light is in a night state.
In a possible implementation manner, referring to fig. 2, fig. 2 is another flow chart of a man-machine interaction method of multi-modal input provided in an embodiment of the present application. The method comprises the steps of S210 to S240, wherein the steps are as follows: s210, responding to input operation of a user, wherein the input operation comprises a mode input mode setting instruction and a mode input time setting instruction; s220, determining a modal input execution time period according to the modal input time setting instruction; s230, determining a mode input self-defining mode according to a mode input mode setting instruction; s240, setting a target mode input mode as a mode input self-defining mode according to the mode input execution time period.
Specifically, the user sends an input operation to the server by some means, such as a touch screen, a key. These input operations contain specific instructions for adjusting the modal input mode and input time of the system. The instructions input by the user are mainly of two types: a mode input mode setting instruction and a mode input time setting instruction. The mode input mode setting instruction is that the user wants the user equipment to receive commands or information by adopting a specific input mode, such as gesture input, touch input, voice input and the like. The modality input time setting instruction designates a period of time for which the user device is expected to take a specific input mode, that is, an execution period of modality input, for the user. The server can determine in which time periods the user-specified mode input mode should be started according to the mode input time setting instruction sent by the user. This may be a fixed period of time, such as 9 am to 5 pm each day, or a dynamic period of time, such as a period of time when the user is currently in a meeting. According to the mode input mode setting instruction sent by the user, the server can determine the user-defined input mode which the user wants to adopt. The custom mode can be a single input mode or a combination of multiple input modes. After determining the modal input execution time period and the modal input customization mode, the server sets the target modal input mode as a user customization mode in a specified time period. This means that during this period the user device will only recognize and respond to input operations sent by the user in a custom manner. For example, voice input is between 9 hours at night and 8 hours in the morning, and gesture input and/or touch input is between 6 hours at night and 9 hours at night.
In one possible implementation, if the input operation is a voice input operation, performing voice recognition on the voice input operation to obtain text data; matching the text data with a preset recognition model to obtain a target instruction, wherein the preset recognition model is pre-stored with a corresponding relation between the text data and the target instruction; and switching the input mode of the target mode according to the target instruction.
Specifically, when the user selects to use voice input as the mode of operation, the server receives and processes the user's voice input. The server converts the user's voice input into text data using voice recognition techniques. This conversion process typically includes the steps of collection of the sound signal, preprocessing, feature extraction, pattern matching, etc., which ultimately converts the speech signal into readable text. The converted text data is then matched with a preset recognition model. The preset recognition model stores the corresponding relation between the text data and the target instruction in advance. The corresponding relations can be preset by a server or obtained by learning according to the history input and habit of the user, and the construction process and the training process of the preset identification model are realized through a neural network, which is not described herein. Through the matching process, the server can find a target instruction corresponding to the text data. This instruction represents the operation or parameter set by the user that he wants to perform. According to the obtained target instruction, the server performs corresponding operations, which may include switching the target mode input mode. This means that the server may adjust or change the current input mode of the user device according to the voice command of the user, such as switching from touch input to gesture input, or switching from gesture input to voice input, etc.
The application further provides a man-machine interaction device for multi-modal input, and referring to fig. 3, fig. 3 is a schematic diagram of a module of the man-machine interaction device for multi-modal input provided by the embodiment of the application. The human-computer interaction device is a server, and the server comprises an acquisition module 31 and a processing module 32, wherein the acquisition module 31 acquires behavior data of a user, and the behavior data comprises limb data and voice data; the acquisition module 31 acquires environmental light intensity data of an environment in which a user is located; the processing module 32 adopts a preset recognition model to recognize the limb data and the voice data, and generates a first recognition result, wherein the first recognition result is used for indicating the behavior state of the user, and the behavior state is a rest state or a motion state; the processing module 32 adopts a preset recognition model to recognize the environmental light intensity data, and generates a second recognition result, wherein the second recognition result is used for indicating the environmental state of the environment where the user is located, and the environmental state is a daytime state or a night state; if the processing module 32 determines that the behavior state is the rest state and the environment state is the night state, the target mode input mode is set as voice input, and the target mode input mode is the mode input mode of the user equipment corresponding to the user; if the processing module 32 determines that the behavior state is a motion state and the environment state is a daytime state, the target mode input mode is set as gesture input and/or touch input.
In one possible implementation, the obtaining module 31 obtains behavior data of the user, specifically includes: the acquisition module 31 receives initial limb data and initial voice data sent by a wearable device worn by a user; the processing module 32 analyzes the initial limb data and the initial voice data to obtain heart rate data, respiration data, and body acceleration data; the processing module 32 pre-processes the heart rate data, the respiration data and the body acceleration data to obtain behavior data, wherein the pre-processing comprises data cleaning, data denoising, data filtering and data normalization.
In one possible implementation, the processing module 32 uses a preset recognition model to recognize the limb data and the voice data, and generates a first recognition result, which specifically includes: the processing module 32 performs feature extraction on the limb data to obtain a first feature change curve, wherein the first feature change curve is drawn by a plurality of heart rate features; the processing module 32 performs feature extraction on the limb data to obtain a second feature change curve, wherein the second feature change curve is drawn by a plurality of limb action features; the processing module 32 performs feature extraction on the voice data to obtain a third feature change curve, wherein the third feature change curve is drawn by a plurality of breathing features; the processing module 32 fuses the first characteristic change curve, the second characteristic change curve and the third characteristic change curve to obtain a first recognition result.
In one possible implementation, the processing module 32 determines that the behavioral state is a resting state or a movement state before determining that the behavioral state is a resting state and the environmental state is a night state; judging the behavior state as a rest state or a motion state, specifically including: the processing module 32 determines the behavior state to be a rest state if it is determined that the first characteristic change curve indicates that the heart rate of the user is in a decreasing trend and the second characteristic change curve indicates that the respiratory rate of the user is in a decreasing trend within a preset first period; the processing module 32 determines the behavior state as the motion state if it is determined that the first characteristic change curve indicates that the heart rate of the user is in a non-decreasing trend and the second characteristic change curve indicates that the breathing rate of the user is in a non-decreasing trend within a preset first period.
In one possible implementation, the processing module 32 uses a preset recognition model to recognize the environmental light intensity data, and generates a second recognition result, which specifically includes: the processing module 32 obtains an environmental light intensity value of the environment in which the user is located according to the environmental light intensity data; the processing module 32 judges the magnitude relation between the environment light intensity value and the preset threshold value to obtain a second identification result; the processing module 32 determines that the environmental state is a daytime state or a nighttime state before determining that the behavioral state is a rest state and the environmental state is a nighttime state; judging whether the environment state is a daytime state or a night state, specifically comprising: the processing module 32 determines the ambient condition as a daytime condition if the ambient light intensity value is greater than or equal to the preset threshold value within the preset second period of time; the processing module 32 determines that the ambient state is a night state if the ambient light intensity value is less than the predetermined threshold value within the predetermined second period of time.
In one possible implementation, the obtaining module 31 responds to an input operation of a user, where the input operation includes a mode input mode setting instruction and a mode input time setting instruction; the processing module 32 determines a modal input execution time period according to the modal input time setting instruction; the processing module 32 determines a mode input custom mode according to the mode input mode setting instruction; the processing module 32 sets the target mode input mode to the mode input custom mode according to the mode input execution time period.
In one possible implementation, if the input operation is a voice input operation, the processing module 32 performs voice recognition on the voice input operation to obtain text data; the processing module 32 matches the text data with a preset recognition model to obtain a target instruction, wherein the preset recognition model is stored with a corresponding relation between the text data and the target instruction in advance; the processing module 32 switches the target mode input mode according to the target instruction.
It should be noted that: in the device provided in the above embodiment, when implementing the functions thereof, only the division of the above functional modules is used as an example, in practical application, the above functional allocation may be implemented by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the embodiments of the apparatus and the method provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the embodiments of the method are detailed in the method embodiments, which are not repeated herein.
The application further provides an electronic device, and referring to fig. 4, fig. 4 is a schematic structural diagram of the electronic device according to the embodiment of the application. The electronic device may include: at least one processor 41, at least one network interface 44, a user interface 43, a memory 45, at least one communication bus 42.
Wherein a communication bus 42 is used to enable connected communication between these components.
The user interface 43 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 43 may further include a standard wired interface and a standard wireless interface.
The network interface 44 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
Wherein processor 41 may comprise one or more processing cores. The processor 41 connects various parts within the overall server using various interfaces and lines, performs various functions of the server and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 45, and invoking data stored in the memory 45. Alternatively, the processor 41 may be implemented in at least one hardware form of digital signal Processing (DIGITAL SIGNAL Processing, DSP), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 41 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 41 and may be implemented by a single chip.
The Memory 45 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 45 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 45 may be used to store instructions, programs, code, a set of codes, or a set of instructions. The memory 45 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described respective method embodiments, etc.; the storage data area may store data or the like involved in the above respective method embodiments. The memory 45 may also optionally be at least one memory device located remotely from the aforementioned processor 41. As shown in fig. 4, an operating system, a network communication module, a user interface module, and an application program of a man-machine interaction method of multi-modal input may be included in the memory 45 as a computer storage medium.
In the electronic device shown in fig. 4, the user interface 43 is mainly used for providing an input interface for a user, and acquiring data input by the user; and processor 41 may be used to invoke an application program in memory 45 that stores a multimodal input human-machine interaction method that, when executed by one or more processors, causes the electronic device to perform the method as in one or more of the embodiments described above.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all of the preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
The application also provides a computer readable storage medium storing instructions. When executed by one or more processors, cause an electronic device to perform the method as described in one or more of the embodiments above.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, such as a division of units, merely a division of logic functions, and there may be additional divisions in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some service interface, device or unit indirect coupling or communication connection, electrical or otherwise.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in whole or in part in the form of a software product stored in a memory, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present application. And the aforementioned memory includes: various media capable of storing program codes, such as a U disk, a mobile hard disk, a magnetic disk or an optical disk.
The foregoing is merely exemplary embodiments of the present disclosure and is not intended to limit the scope of the present disclosure. That is, equivalent changes and modifications are contemplated by the teachings of this disclosure, which fall within the scope of the present disclosure. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a scope and spirit of the disclosure being indicated by the claims.
Claims (10)
1. A human-machine interaction method for multi-modal input, the method comprising:
Acquiring behavior data of a user, wherein the behavior data comprise limb data and voice data;
acquiring environment light intensity data of an environment where the user is located;
The limb data and the voice data are identified by adopting a preset identification model, and a first identification result is generated, wherein the first identification result is used for indicating the behavior state of the user, and the behavior state is a rest state or a motion state;
The preset recognition model is adopted to recognize the environment light intensity data, and a second recognition result is generated, wherein the second recognition result is used for indicating the environment state of the environment where the user is located, and the environment state is a daytime state or a night state;
If the behavior state is the rest state and the environment state is the night state, setting a target mode input mode as voice input, wherein the target mode input mode is a mode input mode of user equipment corresponding to the user;
and if the behavior state is determined to be the motion state and the environment state is determined to be the daytime state, setting the target mode input mode as gesture input and/or touch input.
2. The method for human-computer interaction with multimodal input according to claim 1, wherein the obtaining behavior data of the user specifically comprises:
receiving initial limb data and initial voice data sent by wearable equipment worn by a user;
analyzing the initial limb data and the initial voice data to obtain heart rate data, respiratory data and body acceleration data;
And preprocessing the heart rate data, the breathing data and the body acceleration data to obtain the behavior data, wherein the preprocessing comprises data cleaning, data denoising, data filtering and data normalization.
3. The method for human-computer interaction with multimodal input according to claim 1, wherein the step of recognizing the limb data and the voice data by using a preset recognition model to generate a first recognition result specifically comprises:
Extracting features of the limb data to obtain a first feature change curve, wherein the first feature change curve is drawn by a plurality of heart rate features;
Extracting features of the limb data to obtain a second feature change curve, wherein the second feature change curve is drawn by a plurality of limb action features;
Extracting features of the voice data to obtain a third feature change curve, wherein the third feature change curve is drawn by a plurality of breathing features;
And fusing the first characteristic change curve, the second characteristic change curve and the third characteristic change curve to obtain the first identification result.
4. The multi-modal input man-machine interaction method of claim 3, wherein before the behavior state is determined to be the rest state and the environment state is a night state, the behavior state is determined to be the rest state or the movement state; the judging that the behavior state is the rest state or the motion state specifically includes:
In a preset first time period, if the first characteristic change curve indicates that the heart rate of the user is in a descending trend, and the second characteristic change curve indicates that the respiratory rate of the user is in a descending trend, determining that the behavior state is a rest state;
And in the preset first time period, if the first characteristic change curve indicates that the heart rate of the user is in a non-descending trend, and the second characteristic change curve indicates that the respiratory rate of the user is in a non-descending trend, determining that the behavior state is a motion state.
5. The method for man-machine interaction of multi-modal input according to claim 1, wherein the identifying the environmental light intensity data by using the preset identification model, and generating a second identification result, specifically includes:
Obtaining an environment light intensity value of the environment where the user is located according to the environment light intensity data;
Judging the magnitude relation between the environment light intensity value and a preset threshold value to obtain the second identification result;
Judging whether the environmental state is the daytime state or the night state before the behavior state is the rest state and the environmental state is the night state if the behavior state is determined; the judging that the environmental state is the daytime state or the night state specifically includes:
In a preset second time period, if the ambient light intensity value is greater than or equal to the preset threshold value, determining that the ambient state is the daytime state;
and in the preset second time period, if the environment light intensity value is smaller than the preset threshold value, determining that the environment state is the night state.
6. The method of multimodal input human-machine interaction of claim 1, further comprising:
Responding to the input operation of the user, wherein the input operation comprises a mode input mode setting instruction and a mode input time setting instruction;
Determining a modal input execution time period according to the modal input time setting instruction;
Determining a mode input custom mode according to the mode input mode setting instruction;
and setting the target mode input mode as the mode input self-defining mode according to the mode input execution time period.
7. The method of multimodal input human-machine interaction of claim 6, further comprising:
if the input operation is a voice input operation, performing voice recognition on the voice input operation to obtain text data;
matching the text data with the preset recognition model to obtain a target instruction, wherein the preset recognition model is pre-stored with a corresponding relation between the text data and the target instruction;
And switching the target mode input mode according to the target instruction.
8. A man-machine interaction device with multi-modal input is characterized in that the man-machine interaction device comprises an acquisition module (31) and a processing module (32), wherein,
The acquisition module (31) is used for acquiring behavior data of a user, wherein the behavior data comprises limb data and voice data;
The acquisition module (31) is further used for acquiring environment light intensity data of the environment where the user is located;
The processing module (32) is used for identifying the limb data and the voice data by adopting a preset identification model to generate a first identification result, wherein the first identification result is used for indicating the behavior state of the user, and the behavior state is a rest state or a motion state;
The processing module (32) is further configured to identify the environmental light intensity data by using the preset identification model, and generate a second identification result, where the second identification result is used to indicate an environmental state of an environment where the user is located, and the environmental state is a daytime state or a night state;
the processing module (32) is further configured to set a target mode input mode as a voice input if the behavior state is determined to be the rest state and the environment state is determined to be the night state, where the target mode input mode is a mode input mode of a user device corresponding to the user;
The processing module (32) is further configured to set the target modal input mode as gesture input and/or touch input if the behavior state is determined to be the motion state and the environment state is determined to be the daytime state.
9. An electronic device, characterized in that the electronic device comprises a processor (41), a memory (45), a user interface (43) and a network interface (44), the memory (45) being arranged to store instructions, the user interface (43) and the network interface (44) being arranged to communicate to other devices, the processor (41) being arranged to execute the instructions stored in the memory (45) to cause the electronic device to perform the method according to any one of claims 1 to 7.
10. A computer readable storage medium storing instructions which, when executed, perform the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410422853.8A CN118092668A (en) | 2024-04-09 | 2024-04-09 | Multi-mode input man-machine interaction method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410422853.8A CN118092668A (en) | 2024-04-09 | 2024-04-09 | Multi-mode input man-machine interaction method and device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118092668A true CN118092668A (en) | 2024-05-28 |
Family
ID=91165293
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410422853.8A Pending CN118092668A (en) | 2024-04-09 | 2024-04-09 | Multi-mode input man-machine interaction method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118092668A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118226967A (en) * | 2024-05-24 | 2024-06-21 | 长沙硕博电子科技股份有限公司 | Multi-mode interaction intelligent control system |
CN118585071A (en) * | 2024-08-07 | 2024-09-03 | 杭州李未可科技有限公司 | Active interaction system of multi-mode large model based on AR (augmented reality) glasses |
CN118585071B (en) * | 2024-08-07 | 2024-10-22 | 杭州李未可科技有限公司 | Active interaction system of multi-mode large model based on AR (augmented reality) glasses |
-
2024
- 2024-04-09 CN CN202410422853.8A patent/CN118092668A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118226967A (en) * | 2024-05-24 | 2024-06-21 | 长沙硕博电子科技股份有限公司 | Multi-mode interaction intelligent control system |
CN118226967B (en) * | 2024-05-24 | 2024-09-06 | 长沙硕博电子科技股份有限公司 | Multi-mode interaction intelligent control system |
CN118585071A (en) * | 2024-08-07 | 2024-09-03 | 杭州李未可科技有限公司 | Active interaction system of multi-mode large model based on AR (augmented reality) glasses |
CN118585071B (en) * | 2024-08-07 | 2024-10-22 | 杭州李未可科技有限公司 | Active interaction system of multi-mode large model based on AR (augmented reality) glasses |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6365939B2 (en) | Sleep assist system | |
CN118092668A (en) | Multi-mode input man-machine interaction method and device and electronic equipment | |
CN113520340A (en) | Sleep report generation method, device, terminal and storage medium | |
JP2020521204A (en) | Intelligent sensing device and sensing system | |
CN109982124A (en) | User's scene intelligent analysis method, device and storage medium | |
KR102361458B1 (en) | Method for responding user speech and electronic device supporting the same | |
CN104991464B (en) | A kind of information processing method and control system | |
KR20150099678A (en) | Controlling Method of Electronic Device corresponding to the wearing state and Electronic Device supporting the same | |
US20160299483A1 (en) | Method for controlling terminal device, and wearable electronic device | |
CN113854969A (en) | Intelligent terminal and sleep monitoring method | |
CN104808780B (en) | Judge the device and method of head-wearing type intelligent equipment operation validity | |
CN106295158B (en) | A kind of automatic aided management system of infant, management method and equipment | |
KR102163996B1 (en) | Apparatus and Method for improving performance of non-contact type recognition function in a user device | |
US11596764B2 (en) | Electronic device and method for providing information for stress relief by same | |
CN111798811A (en) | Screen backlight brightness adjusting method and device, storage medium and electronic equipment | |
CN113671846A (en) | Intelligent device control method and device, wearable device and storage medium | |
CN109036410A (en) | Audio recognition method, device, storage medium and terminal | |
WO2022037555A1 (en) | Physiological data acquisition method and apparatus, and wearable device | |
WO2022012060A1 (en) | Method for collecting operation mode, and terminal device, massage device and storage medium | |
CN111163219A (en) | Alarm clock processing method and device, storage medium and terminal | |
WO2023027578A1 (en) | Nose-operated head-mounted device | |
WO2015109907A1 (en) | Device and method for detecting continuous attachment of head-mounted intelligent device | |
CN112272247A (en) | Method and device for controlling deformation, storage medium and electronic equipment | |
CN109549638A (en) | It is a kind of intelligently to nurse method, apparatus, system and storage medium | |
KR20240152683A (en) | System for controlling smartphone using wearable divice |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |