CN117711369A

CN117711369A - Intelligent communication system for human and animal situational language

Info

Publication number: CN117711369A
Application number: CN202311737461.2A
Authority: CN
Inventors: 廖翊允
Original assignee: Individual
Current assignee: Individual
Priority date: 2023-12-16
Filing date: 2023-12-16
Publication date: 2024-03-15

Abstract

The invention belongs to the field of language communication, and provides a human and animal situation language intelligent communication system, which comprises a database, wherein animal information and animal model information are stored in the database; the voice and word recognition module comprises a voice signal preprocessing component, a feature extraction component, an acoustic model component and a voice model component; the voice signal preprocessing component is used for carrying out noise elimination and voice enhancement on an input voice signal; the feature extraction component is used for extracting frequency spectrum features in the voice signals; the acoustic model component is used for carrying out voice recognition on the frequency spectrum characteristics; the voice model component is used for further correcting and optimizing the recognition result; the invention realizes the intelligent communication of the situation language between the human and the animal by combining the technologies of voice recognition, image processing, animal model establishment, voice synthesis and the like, can provide the feedback of the real animal for the user on the language and the action of the animal, and can realize personalized interaction and provide more accurate feedback.

Description

Intelligent communication system for human and animal situational language

Technical Field

The invention belongs to the field of language communication, in particular to an intelligent communication system for human and animal situational language.

Background

In daily life, there is often a need for interaction and communication with animals, such as when visiting zoos, it is desirable to be able to better understand the animal's habits, emotions, and corresponding needs. However, due to the enormous hurdles that humans and animals have in language, people often cannot truly communicate and understand the intent and emotion of animals.

In the prior art, when solving the problem of language communication between human and animals, the method mainly depends on the expertise of animal specialists, but only unidirectional language translation or simulation is often realized, for example, the voice or behavior of animals is translated into human language by the animal specialists, or the human language is simulated into the voice of the animals to be transmitted to the animals; the method has certain limitation and inconvenience in communication, and can not completely realize the two-way communication between the human and the animal. Accordingly, there is a need for further development and application of new techniques and methods to achieve more accurate, natural and bi-directional language communication.

Disclosure of Invention

In order to solve the technical problems, the invention provides a language intelligent communication system for human and animal situation, which aims to solve the problems that the prior art mainly depends on the professional knowledge of animal specialists, but only one-way language translation or simulation and the like can be realized.

An intelligent communication system for human and animal situation language comprises,

the database is used for storing animal information and animal model information;

the voice and word recognition module comprises a voice signal preprocessing component, a feature extraction component, an acoustic model component, a voice model component and a word recognition component; the voice signal preprocessing component is used for carrying out noise elimination and voice enhancement on an input voice signal; the characteristic extraction component is used for extracting frequency spectrum characteristics in the voice signal; the acoustic model component is used for carrying out voice recognition on the frequency spectrum characteristics; the voice model component is used for further correcting and optimizing the recognition result; the character recognition component is used for recognizing character contents;

the shooting module comprises an image capturing component, an image processing component and an animal expression and action recognition component; the image capturing component is used for acquiring animal images; the image processing component is used for preprocessing and enhancing the captured image; the animal expression and action recognition component is used for analyzing the image processed by the image, and recognizing and extracting expression characteristics of the animal from the image;

the animal image display module is used for comparing the animal image shot by the shooting module with animal information in the database, acquiring animal model information corresponding to the animal information in the database and generating a corresponding animal model; the animal image display module comprises a model making component, a bone animal component, an expression transformation component and an action generating component; the model making component is used for creating an animal model; the bone animal component is used for adding a bone system to the animal model, so that the animal model can perform animation performance according to the behavior and the expression of the animal; the expression transformation component is used for adjusting the expression of the animal model; the action generating component is used for generating actions of the animal model;

a speech synthesis module comprising a phoneme conversion component, an acoustic parameter generation component, and a speech synthesis model component; the phoneme conversion component is used for converting the processed text information into a corresponding phoneme sequence; the acoustic parameter generation component is used for generating acoustic parameters according to the phoneme sequence; the speech synthesis model component is for converting acoustic parameters into a speech signal.

Preferably, the system further comprises a computer vision module, wherein the computer vision module comprises an image preprocessing component, a target detection component and a key point identification component; the image preprocessing component is used for preprocessing the captured animal images; the object detection component is used for identifying an animal object in an animal image and framing the position of the animal object; the keypoint identification component is for identifying keypoints of an animal in an image.

Preferably, the system further comprises a context understanding module, wherein the context understanding module comprises a dialogue management component and a dialogue history tracking component; the dialogue management component is used for managing dialogue flows; the conversation history tracking component is configured to track and analyze previous conversation histories.

Preferably, the session flow includes tracking and converting session states.

Preferably, the database also stores user interaction information.

Preferably, the system further comprises a multi-language support module, wherein the multi-language support module comprises a language detection component, a translation engine interface component and a speech synthesis engine interface component; the language detection component is used for detecting the language type of the input text; the translation engine interface component is used for interacting with an external translation engine to realize translation of the text; the speech synthesis engine interface component is used for interacting with an external speech synthesis engine to realize multi-language speech synthesis.

Preferably, language translation information is also stored in the database.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention realizes the intelligent communication of the situation language between the human and the animal by combining the technologies of voice recognition, image processing, animal model establishment, voice synthesis and the like, can provide the feedback of the real animal for the user on the language and action of the animal, can realize personalized interaction and provide more accurate feedback, thereby improving the intelligent degree and user satisfaction of the system, providing new theoretical basis and reference for the research of animal behaviors and languages, and promoting the research and exploration of the related fields.

Drawings

FIG. 1 is a schematic diagram of the overall structure of the present invention;

FIG. 2 is a schematic diagram of a computer vision module structure according to the present invention;

FIG. 3 is a schematic diagram of a context understanding module structure of the present invention;

FIG. 4 is a schematic diagram of a multi-language support module according to the present invention;

fig. 5 is a schematic structural diagram of a second embodiment of the present invention.

In the figure:

1. a database; 11. animal information; 12. language translation information; 13. user interaction information; 14. animal model information;

2. a speech and text recognition module; 21. a voice signal preprocessing component; 22. a feature extraction component; 23. an acoustic model component; 24. a speech model component; 25. a text recognition component;

3. a shooting module; 31. an image capturing assembly; 32. an image processing component; 33. an animal expression and motion recognition component;

4. an animal image display module; 41. a model making component; 42. a skeletal animal component; 43. an expression conversion component; 44. an action generating component;

5. a speech synthesis module; 51. a phoneme conversion component; 52. an acoustic parameter generation component; 53. a speech synthesis model component;

6. a computer vision module; 61. an image preprocessing component; 62. a target detection component; 63. a key point identification component;

7. a context understanding module; 71. a dialog management component; 72. a conversation history tracking component;

8. a multilingual support module; 81. a language detection component; 82. a translation engine interface component; 83. a speech synthesis engine interface component.

Detailed Description

Embodiments of the present invention are described in further detail below with reference to the accompanying drawings and examples. The following examples are illustrative of the invention but are not intended to limit the scope of the invention.

Embodiment one: as shown in fig. 1 to 4: the invention provides an intelligent communication system for human and animal situation languages, which comprises a database 1, wherein animal information 11, animal model information 14, user interaction information 13 and language translation information 12 are stored in the database 1;

animal information 11: storing information such as characteristics, expressions, sounds, behaviors and the like of various animals so that the system can perform correct translation and simulation according to animal types;

animal model information 14: storing information about various animal body structural features, such as geometry, body shape, weight, bone structure, and organ size;

language translation information 12: storing the mapping relation between the human language and the animal expression action and sound so that the system can translate the human language into correct animal expression and sound;

user interaction information 13: storing information such as user input, historical interaction records and the like so that the system can perform personalized interaction or provide more accurate feedback;

a voice and text recognition module 2, the voice and text recognition module 2 including a voice signal preprocessing component 21, a feature extraction component 22, an acoustic model component 23, a voice model component 24, and a text recognition component 25; the voice signal preprocessing component 21 is used for performing noise elimination and voice enhancement on an input voice signal; the feature extraction component 22 is configured to extract spectral features in the speech signal; the acoustic model component 23 is used for speech recognition of the spectral features; the voice model component 24 is used for further correcting and optimizing the recognition result; the character recognition component 25 is used for recognizing character content;

a photographing module 3, the photographing module 3 including an image capturing component 31, an image processing component 32, and an animal expression and motion recognition component 33; the image capturing component 31 is for capturing images of animals, and may capture still images or a continuous stream of images; the image processing component 32 is used for preprocessing and enhancing the captured image, including noise removal, image enhancement, image segmentation, etc.; the animal expression and motion recognition component 33 is configured to analyze the image processed image, recognize and extract expression features of the animal therefrom, and may employ computer vision and machine learning techniques, such as face detection, key point positioning, feature extraction, etc., to recognize and quantify the expression state of the animal;

the animal image display module 4 is used for comparing the animal image shot by the shooting module 3 with the animal information 11 in the database 1, acquiring animal model information 14 corresponding to the animal information 11 in the database 1 and generating a corresponding animal model; the animal image display module 4 comprises a model making component 41, a bone animal component 42, an expression transformation component 43 and a motion generating component 44; the modeling component 41 is used to create an animal model; the skeletal animal component 42 is used to add skeletal systems to animal models that enable the animal models to perform animated performances based on the animal's behavior and expression; the expression transformation component 43 is used for adjusting the expression of the animal model, such as changing the shape of eyes and mouth; the action generation component 44 is used to generate actions of the animal model, such as walking, jumping, etc.;

a speech synthesis module 5, the speech synthesis module 5 comprising a phoneme conversion component 51, an acoustic parameter generation component 52 and a speech synthesis model component 53; the phoneme conversion component 51 is configured to convert the processed text information into a corresponding phoneme sequence; the acoustic parameter generation component 52 is configured to generate acoustic parameters from the phoneme sequence; the speech synthesis model component 53 is used to convert acoustic parameters into speech signals.

The system realizes the intelligent communication of the situation language between the human and the animal by combining the technologies of voice recognition, image processing, animal model establishment, voice synthesis and the like, can provide the feedback of the real animal for the user on the language and action of the animal, can realize personalized interaction and provide more accurate feedback, thereby improving the intelligent degree and user satisfaction of the system, providing new theoretical basis and reference for the research of animal behaviors and languages, and promoting the research and exploration of related fields.

As shown in fig. 2, the system further comprises a computer vision module 6, wherein the computer vision module 6 comprises an image preprocessing component 61, a target detection component 62 and a key point identification component 63; the image preprocessing component 61 is used for performing preprocessing operations such as denoising, uniform scale or clipping on the captured animal image, so as to provide better input for subsequent target detection and key point identification; the object detection component 62 is used to identify and frame the location of an animal object in an animal image; the keypoint identification component 63 is operative to identify keypoints of animals in the image.

Through setting up computer vision module 6 for the system has possessed the ability of discernment and analysis to animal image, thereby has promoted the intelligent level of system, through accurate positioning target, discernment key point etc. technique, the system can understand more accurately and respond to animal's action and expression, realizes more natural, true situational language intelligent communication.

As shown in fig. 3, the system further comprises a context understanding module 7, wherein the context understanding module 7 comprises a dialogue management component 71 and a dialogue history tracking component 72; the dialog management component 71 is used for managing dialog flows, including tracking and conversion of dialog states; the conversation history tracking component 72 is operative to track and analyze previous conversation histories.

By tracking the dialogue flow and history, the interaction of the system can be optimized, the response speed can be improved, and the user satisfaction can be improved.

As shown in fig. 4, the system further comprises a multi-language support module 8, wherein the multi-language support module 8 comprises a language detection component 81, a translation engine interface component 82 and a speech synthesis engine interface component 83; the language detection component 81 is used for detecting the language type of the input text; the translation engine interface component 82 is used for interacting with an external translation engine to realize translation of texts, so that the system can accurately understand the input of a user, and translate the texts of the user into different languages according to the needs, thereby realizing accurate multi-language communication; the speech synthesis engine interface component 83 is configured to interact with an external speech synthesis engine to implement speech synthesis in multiple languages, so that the system can generate speech feedback in a corresponding language according to the language selection of the user, thereby providing an interactive experience closer to the user's needs.

Embodiment two: as shown in fig. 5, an input system is provided in a panda museum, a tourist can express his/her own meaning through words or voices, the information can be collected and stored in a database and translated into sound and expression actions of the panda, and meanwhile, a digital panda or panda doll can display corresponding expressions to the panda in the panda museum according to the translation results, so that the panda can understand the meaning of the tourist.

The expression and sound of pandas can be captured by the snap camera and the microphone in the panda library, the data can be matched and interpreted through the database, then the data are converted into human language, the interpreted result can be displayed in the text on the screen of the live panda, and the corresponding dubbing is matched, so that the audience can see the expression and the action of the panda and understand the meaning of the panda.

Embodiment III: the household system is provided with an input system, the family members can express their own meaning through characters or voices, the information can be collected and stored in a family database and converted into the voice and action of the pet, and meanwhile, the digital pet or the pet doll can display the corresponding action to the pet in the household according to the conversion results, so that the user can understand the intention of the family members.

The action and sound of the pet can be captured by the snap camera and the microphone in the home, the data can be matched and interpreted through the database, then the data are converted into human language, the interpreted result can be displayed by words on a home screen and matched with corresponding dubbing, thus, family members can see the behavior and the expression of the pet and understand the meaning of the pet.

While embodiments of the present invention have been shown and described above for purposes of illustration and description, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims

1. The intelligent communication system for the human and animal situational language is characterized by comprising:

a database (1), wherein animal information (11) and animal model information (14) are stored in the database (1);

a speech and text recognition module (2), the speech and text recognition module (2) comprising a speech signal preprocessing component (21), a feature extraction component (22), an acoustic model component (23), a speech model component (24) and a text recognition component (25); the voice signal preprocessing component (21) is used for carrying out noise elimination and voice enhancement on an input voice signal; -the feature extraction component (22) is for extracting spectral features in a speech signal; -the acoustic model component (23) is for speech recognition of spectral features; the voice model component (24) is used for further correcting and optimizing the recognition result; the character recognition component (25) is used for recognizing character contents;

a shooting module (3), wherein the shooting module (3) comprises an image capturing component (31), an image processing component (32) and an animal expression and action recognition component (33); -the image capturing assembly (31) is for capturing an image of an animal; the image processing component (32) is for pre-processing and enhancing a captured image; the animal expression and action recognition component (33) is used for analyzing the image processed by the image, and recognizing and extracting expression characteristics of the animal;

the animal image display module (4), the animal image display module (4) is used for comparing the animal image shot by the shooting module (3) with the animal information (11) in the database (1), acquiring the animal model information (14) corresponding to the animal information (11) in the database (1), and generating a corresponding animal model; the animal image display module (4) comprises a model making component (41), a bone animal component (42), an expression transformation component (43) and a motion generating component (44); -the modeling component (41) is for creating an animal model; the skeletal animal component (42) is configured to add skeletal systems to the animal model such that the animal model is capable of performing an animated performance based on the animal's behavior and expression; the expression transformation component (43) is used for adjusting the expression of the animal model; the action generating component (44) is for generating actions of an animal model;

a speech synthesis module (5), the speech synthesis module (5) comprising a phoneme conversion component (51), an acoustic parameter generation component (52) and a speech synthesis model component (53); the phoneme conversion component (51) is configured to convert the processed text information into a corresponding phoneme sequence; the acoustic parameter generation component (52) is configured to generate acoustic parameters from a sequence of phonemes; the speech synthesis model component (53) is for converting acoustic parameters into a speech signal.

2. The human and animal situational language intelligent communication system of claim 1, wherein: the system also comprises a computer vision module (6), wherein the computer vision module (6) comprises an image preprocessing component (61), a target detection component (62) and a key point identification component (63); the image preprocessing component (61) is used for preprocessing the captured animal images; the object detection component (62) is used for identifying an animal object in an animal image and framing the position of the animal object; the keypoint identification component (63) is for identifying keypoints of an animal in an image.

3. The human and animal situational language intelligent communication system of claim 1, wherein: also comprises a context-aware module (7), the context-aware module (7) comprising a dialog management component (71) and a dialog history tracking component (72); the dialogue management component (71) is used for managing dialogue flows; the conversation history tracking component (72) is for tracking and analyzing previous conversation histories.

4. The human and animal situational language intelligent communication system of claim 3, wherein: the dialog flow includes tracking and transition of dialog states.

5. The human and animal situational language intelligent communication system of claim 3, wherein: user interaction information (13) is also stored in the database (1).

6. The human and animal situational language intelligent communication system of claim 1, wherein: the system also comprises a multi-language support module (8), wherein the multi-language support module (8) comprises a language detection component (81), a translation engine interface component (82) and a speech synthesis engine interface component (83); the language detection component (81) is used for detecting the language type of the input text; the translation engine interface component (82) is used for interacting with an external translation engine to realize translation of the text; the speech synthesis engine interface component (83) is configured to interact with an external speech synthesis engine to implement multilingual speech synthesis.

7. The human and animal situational language intelligent communication system of claim 6, wherein: language translation information (12) is also stored in the database (1).