CN110970037A - Pet language identification method and device, electronic equipment and readable storage medium - Google Patents

Pet language identification method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN110970037A
CN110970037A CN201911195058.5A CN201911195058A CN110970037A CN 110970037 A CN110970037 A CN 110970037A CN 201911195058 A CN201911195058 A CN 201911195058A CN 110970037 A CN110970037 A CN 110970037A
Authority
CN
China
Prior art keywords
pet
audio signal
intelligent terminal
characteristic parameters
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911195058.5A
Other languages
Chinese (zh)
Inventor
朱玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Goertek Inc
Original Assignee
Goertek Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Goertek Inc filed Critical Goertek Inc
Priority to CN201911195058.5A priority Critical patent/CN110970037A/en
Publication of CN110970037A publication Critical patent/CN110970037A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/15Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being formant information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Toys (AREA)

Abstract

The invention discloses a method, a device and a system for identifying pet languages, electronic equipment and a computer readable storage medium, the method for recognizing the pet language has the advantages that the method for recognizing the pet language is simple to operate and high in recognition rate, the intelligent terminal arranged on a pet body is used for collecting the audio signal of the pet in real time, the characteristic parameters of the audio signal are extracted and sent to the intelligent terminal arranged on the user body, the intelligent terminal on the user body is pre-stored with a model for characteristic recognition, the model can output text content capable of expressing the mood of the pet as long as the characteristic parameters of the audio signal are input, then digital-to-analog conversion is carried out and the text content is broadcasted to the user through the intelligent terminal, the user can master the meaning of the pet to be expressed in real time, can help human beings to better understand animals so as to better interact with the animals and improve the closeness between the human beings and the animals.

Description

Pet language identification method and device, electronic equipment and readable storage medium
Technical Field
The invention relates to the technical field of electronic information, in particular to a pet language identification method and device, electronic equipment and a readable storage medium.
Background
The recognition of the speaker is performed at the earliest time in the aspect of language recognition, the recognition is mainly focused on simple ear-to-ear recognition of the speaker, the real language recognition is to research and collect a language signal linear predictive coding technology and a dynamic time warping technology, and a template matching technology is mainly adopted for encouraging words. China only studied the language recognition of Mandarin Chinese since 1987, and then spoken to dialect.
With the improvement of living standard of people, the raising of pets has become a common phenomenon, the pets are used as companion animals of human beings, and are also sources for obtaining happiness and healthy life, and for some modern old people, the pets also become an inseparable part in life. However, with animals in the sunset of human beings, we cannot understand their language and understand their physiological status or emotional needs in time. Therefore, the application of the language recognition technology to the recognition of non-intellectual language can help the human to better understand the animal so as to better interact with the animal and improve the closeness between the human and the animal.
Disclosure of Invention
The invention aims to provide a pet language identification method, a pet language identification device, a pet language identification system, an electronic device and a computer readable storage medium, which are used for helping a human to better know an animal so as to better interact with the animal and improve the closeness between the human and the animal.
According to a first aspect of the invention, a pet language identification method is provided, which is executed on a first intelligent terminal, and comprises the following steps:
receiving characteristic parameters of an audio signal;
identifying the characteristic parameters according to a pre-stored model to obtain an identification result;
and carrying out voice broadcast on the recognition result.
Optionally, the characteristic parameters include: pitch period, formant parameters, and mel-frequency cepstrum coefficients of the audio signal.
Optionally, the method further includes a step of generating the pre-stored model, including:
collecting audio signals of different pet languages;
extracting characteristic parameters of the audio signal;
acquiring training parameters, wherein the training parameters comprise a target value, a learning rate and a training round;
setting a neural network model based on the training parameters;
and training the neural network model according to the characteristic parameters to obtain the prestored model.
Optionally, after the acquiring the audio signals in the different pet languages, the method further includes:
pre-processing the audio signal, the pre-processing comprising: pre-emphasis, framing, and windowing the audio signal.
Optionally, the extracting the feature parameters of the audio signal includes:
and obtaining the pitch period and the formant parameters by a cepstrum method.
Optionally, the extracting the feature parameters of the audio signal further includes:
carrying out fast Fourier transform on the audio signal to obtain frequency spectrums of all frequencies, and carrying out modular square on the frequency spectrums to obtain power spectrums of the audio signal;
and (3) enabling the power spectrum to pass through a Mel filter bank, calculating the logarithmic energy output by each filter, and performing normalization processing to obtain an N-dimensional Mel frequency cepstrum coefficient, wherein N is a positive integer and is more than 0.
According to a second aspect of the invention, a pet language identification method is provided, which is executed in a second intelligent terminal, and comprises the following steps:
collecting audio signals of the pet language;
extracting characteristic parameters of the audio signal;
and sending the characteristic parameters to the first intelligent terminal.
According to a third aspect of the present invention, there is provided a first smart terminal comprising:
the receiving module is used for receiving the characteristic parameters of the audio signal;
the identification module is used for identifying the characteristic parameters according to a pre-stored model so as to obtain an identification result;
and the broadcasting module is used for broadcasting the recognition result by voice.
Optionally, the terminal further includes:
and the generating module is used for generating the pre-stored model.
Optionally, the generating module further includes:
the acquisition unit is used for acquiring audio signals of different pet languages;
an extraction unit for extracting a feature parameter of the audio signal;
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring training parameters, and the training parameters comprise a target value, a learning rate and a training round;
the configuration unit is used for configuring the neural network model based on the training parameters;
and the training unit is used for training the neural network model according to the characteristic parameters to obtain the pre-stored model.
Optionally, the generating module further includes:
a pre-processing unit for pre-processing the audio signal, the pre-processing comprising: pre-emphasis, framing, and windowing the audio signal.
According to a fourth aspect of the present invention, there is provided a second smart terminal, comprising:
the acquisition module is used for acquiring the audio signal of the pet language;
the extraction module is used for extracting the characteristic parameters of the audio signal;
and the sending module is used for sending the characteristic parameters to the first intelligent terminal.
According to a fifth aspect of the present invention, there is provided an electronic apparatus comprising:
a first intelligent terminal provided according to a third aspect of the present invention; alternatively, the first and second electrodes may be,
a processor and a memory, the memory being configured to store executable instructions for controlling the processor to perform the pet language identification method provided by the first aspect of the invention; alternatively, the first and second electrodes may be,
a second intelligent terminal provided according to a fourth aspect of the present invention; alternatively, the first and second electrodes may be,
a processor and a memory for storing executable instructions for controlling the processor to perform the pet language identification method provided by the second aspect of the invention.
According to a sixth aspect of the present invention, there is provided a pet language identification system, the system comprising:
a first intelligent terminal provided according to a third aspect of the present invention; and
a second intelligent terminal provided according to a fourth aspect of the present invention;
and the first intelligent terminal is in communication connection with the second intelligent terminal.
According to a seventh aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the pet language identification method provided according to the first and second aspects of the present invention.
According to the embodiment of the invention, a method, a device, a system, electronic equipment and a computer readable storage medium for recognizing pet languages are provided, a second intelligent terminal arranged on a pet body is used for collecting audio signals of the pet in real time, characteristic parameters of the audio signals are extracted through the second intelligent terminal and are sent to a first intelligent terminal arranged on a user body, a model for performing characteristic recognition is prestored on the first intelligent terminal, the model is trained, as long as the characteristic parameters of the audio signals are input into the model, the model outputs character contents capable of expressing pet moods according to training results, then digital-to-analog conversion is performed, the character contents are broadcasted to the user through the first intelligent terminal, the user can master the meaning expressed by the pet in real time, the method for recognizing pet languages is simple in operation and high in recognition rate, can help human beings to better understand animals so as to better interact with the animals and improve the closeness between the human beings and the animals.
Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a system hardware configuration diagram of a pet language identification method that can be used to implement an embodiment of the present invention.
Fig. 2 is a block diagram of a hardware configuration of an electronic device according to an embodiment of the present invention.
FIG. 3 is a flowchart illustrating steps of a pet language identification method according to a first embodiment of the present invention.
FIG. 4 is a flowchart illustrating the steps of a pre-stored model generation method according to the present invention.
FIG. 5 is a flowchart illustrating a pet language identification method according to a second embodiment of the present invention.
Fig. 6 is a block diagram illustrating a first intelligent terminal according to the present invention.
Fig. 7 is a diagram illustrating a hardware configuration of the first intelligent terminal according to the present invention.
Fig. 8 is a block diagram showing a structure of a second intelligent terminal according to the present invention.
Fig. 9 is a diagram showing a hardware configuration of a second intelligent terminal according to the present invention.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
As shown in fig. 1, the pet language identification system 1000 according to the present embodiment includes a server 1100, a first smart terminal 1200, a second smart terminal 1300, and a network 1400, where the server 1100, the first smart terminal 1200, and the second smart terminal 1300 communicate with each other via the network.
The server 1100 may be, for example, a blade server, a rack server, or the like, and the server 1100 may also be a server cluster deployed in a cloud, which is not limited herein. The server may be a server providing an online transaction platform service party, or a server of the above administrative function department, which is not limited herein.
As shown in FIG. 1, server 1100 may include a processor 1110, a memory 1120, an interface device 1130, a communication device 1140, a display device 1150, and an input device 1160. Processor 1110 is configured to execute computer programs. The computer program may be written in an instruction set of an architecture such as x86, Arm, RISC, MIPS, SSE, etc. The memory 1120 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1130 includes, for example, a USB interface, a serial interface, and the like. The communication device 1140 is capable of wired or wireless communication, for example. The display device 1150 is, for example, a liquid crystal display panel. Input devices 1160 may include, for example, a touch screen, a keyboard, and the like.
In this embodiment, server 1100 may be used to participate in implementing a pet language identification method in accordance with any of the embodiments of the present invention.
In any embodiment of the present invention, the memory 1120 of the server 1100 is configured to store instructions for controlling the processor 1110 to operate so as to support the implementation of the image recognition method according to any embodiment of the present invention. The skilled person can design the instructions according to the disclosed solution. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.
Those skilled in the art will appreciate that although a number of devices are shown in FIG. 1 for server 1100, server 1100 of embodiments of the present invention may refer to only some of the devices therein, such as only processor 1110 and memory 1120.
As shown in fig. 1, the first smart terminal 1200 may include a processor 1210, a memory 1220, an interface device 1230, a communication device 1240, a display device 1250, an input device 1260, an audio output device 1270, an audio pickup device 1280, and the like. The processor 1210 may be a central processing unit CPU, a microprocessor MCU, or the like, and the processor 1210 is configured to execute a computer program. The computer program may be written in an instruction set of an architecture such as x86, Arm, RISC, MIPS, SSE, etc. The memory 1220 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1230 includes, for example, a USB interface, a headphone interface, and the like. The communication device 1240 can perform wired or wireless communication, for example. The display device 1250 is, for example, a liquid crystal display, a touch display, or the like. The input device 1260 may include, for example, a touch screen, a keyboard, and the like. The first smart terminal 1200 may output audio information through the audio output device 1270, the audio output device 1270 including a speaker, for example. The first smart terminal 1200 may pick up voice information input by the user through the audio pickup device 1280, and the audio pickup device 1280 includes, for example, a microphone.
First intelligent terminal 1200 can be any equipment that can support using the application of electricity merchant platform such as intelligent bracelet, intelligent glasses, intelligent wrist-watch, intelligent necklace.
In this embodiment, the first intelligent terminal 1200 may be configured to, during the pet language identification, obtain the characteristic parameters of the pet language audio signal sent by the second intelligent terminal 1300, and identify the mood expressed by the characteristic parameters.
In an embodiment of the present invention, the memory 1220 of the first smart terminal 1200 is configured to store instructions for controlling the processor 1210 to operate so as to support implementation of the pet language identification method according to any embodiment of the present invention. The skilled person can design the instructions according to the disclosed solution. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.
It should be understood by those skilled in the art that although a plurality of devices of the first smart terminal 1200 are illustrated in fig. 1, the first smart terminal 1200 of the embodiment of the present invention may refer to only some of the devices, for example, the processor 1210, the memory 1220, the display device 1250, the input device 1260, etc.
As shown in fig. 1, the second smart terminal 1300 may include a processor 1310, a memory 1330, an interface device 1330, a communication device 1340, a display device 1350, an input device 1360, an audio output device 1370, an audio pickup device 1380, and so on. The processor 1310 may be a central processing unit CPU, a microprocessor MCU, or the like, and the processor 1310 is configured to execute a computer program. The computer program may be written in an instruction set of an architecture such as x86, Arm, RISC, MIPS, SSE, etc. The memory 1330 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface 1330 includes, for example, a USB interface, a headphone interface, or the like. The communication device 1340 is capable of wired or wireless communication, for example. The display device 1350 is, for example, a liquid crystal display panel, a touch panel, or the like. The input device 1360 may include, for example, a touch screen, a keyboard, and the like. The second smart terminal 1300 may output audio information through an audio output device 1370, the audio output device 1370 including, for example, a speaker. The second smart terminal 1300 may pick up voice information input by the user through the audio pick-up 1380, which audio pick-up 1380 includes a microphone, for example.
The second intelligent terminal 1300 may be any device that can support the use of an e-commerce platform application, such as an intelligent collar, an intelligent foot chain, an intelligent earmuff, and an intelligent headgear.
In this embodiment, the second intelligent terminal 1300 may be configured to collect pet language audio signals during pet language identification, extract characteristic parameters of the audio signals, send the audio signals to the first intelligent terminal 1200, and identify the mood expressed by the first intelligent terminal 1200.
In an embodiment of the present invention, the memory 1320 of the second smart terminal 1300 is used for storing instructions for controlling the processor 1310 to operate so as to support the implementation of the pet language identification method according to any embodiment of the present invention. The skilled person can design the instructions according to the disclosed solution. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.
It should be understood by those skilled in the art that although a plurality of devices of the second intelligent terminal 1300 are illustrated in fig. 1, the second intelligent terminal 1300 of the embodiment of the present invention may refer to only some of the devices, for example, the processor 1310, the memory 1320, the display device 1350, the input device 1360, etc.
The communication network 1400 may be a wireless network or a wired network, and may be a local area network or a wide area network. The first intelligent terminal 1200 and the second intelligent terminal 1300 can communicate with the server 1100 through the communication network 1400, and the first intelligent terminal 1200 and the second intelligent terminal 1300 can also communicate with each other through the communication network 1400.
The system 1000 shown in FIG. 1 is illustrative only and is not intended to limit the invention, its application, or uses in any way. For example, although fig. 1 shows only one server 1100 and two terminal devices, it is not meant to limit the respective numbers, and multiple servers 1100 and/or multiple terminal devices may be included in the system 1000.
Fig. 2 is a block diagram showing a configuration of hardware of an electronic apparatus 2000, which can implement an embodiment of the present invention.
In one aspect, the electronic device 2000 may be a smart wearable device, a mobile phone, a tablet computer, or the like.
On the other hand, as shown in fig. 2, the electronic device 2000 may include the aforementioned first smart terminal 1200, for implementing the pet language identification method provided by the first embodiment of the present invention.
Alternatively, as shown in fig. 2, the electronic device 2000 may include the aforementioned second smart terminal 1300, which is used to implement the pet language identification method according to the second embodiment of the present invention.
As shown in fig. 2, the electronic device 2000 may include a processor 2100, a memory 2200, an interface device 2300, a communication device 2400, a display device 2500, an input device 2600, a speaker 2700, a microphone 2800, and the like. The processor 2100 may be a central processing unit CPU, a microprocessor MCU, or the like. The memory 2200 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 2300 includes, for example, a USB interface, a headphone interface, and the like. The communication device 2400 is capable of wired or wireless communication, for example, and may specifically include WiFi communication, bluetooth communication, 2G/3G/4G/5G communication, and the like. The display device 2500 is, for example, a liquid crystal display panel, a touch panel, or the like. The input device 2600 may include, for example, a touch screen, a keyboard, a somatosensory input, and the like. A user can input/output voice information through the speaker 2700 and the microphone 2800.
The electronic device shown in fig. 2 is merely illustrative and is in no way meant to limit the invention, its application, or uses. In an embodiment of the present invention, the memory 2200 of the electronic device 2000 is configured to store instructions for controlling the processor 2100 to perform any pet language identification method provided in the embodiment of the present invention. It will be appreciated by those skilled in the art that although a plurality of means are shown for the electronic device 2000 in fig. 2, the present invention may relate to only some of the means therein, for example the electronic device 1000 may relate to only the processor 1100 and the storage means 1200. The skilled person can design the instructions according to the disclosed solution. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.
In one embodiment of the invention, a pet language identification method is provided.
Referring to fig. 3, which is a flowchart illustrating steps of a pet language identification method according to an embodiment of the present invention, the pet language identification method may be implemented by an electronic device, for example, the electronic device 2000 shown in fig. 2, the first smart terminal 1200 shown in fig. 1, or the first smart terminal 600 shown in fig. 6.
As shown in fig. 3, the pet language identification method according to the embodiment of the present invention includes the following steps:
step 302, receiving characteristic parameters of an audio signal;
step 304, identifying the characteristic parameters according to a pre-stored model to obtain an identification result;
and step 306, performing voice broadcast on the recognition result.
In step 302, when a feature parameter of a new audio signal is introduced, the first intelligent terminal 1200 or the first intelligent terminal 600 enters an awake state from a sleep state, receives the feature parameter sent by the second intelligent terminal 1300 or the second intelligent terminal 800, and stores the feature parameter in a memory available for storage for subsequent identification, where the memory may be a memory in the first intelligent terminal 1200 or the first intelligent terminal 600, or a memory in the electronic device 2000.
In step 304, the characteristic parameters are input into a pre-stored model according to the characteristic parameters received in step 302, the model is trained by a neural network in advance, and the model is input into the model to obtain output. Illustratively, for example, to identify the meaning of the pet cat to be expressed, the second intelligent terminal 1300 or the second intelligent terminal 800 collects the cry of the pet cat within a period of time, because the frequency of the sound of the cat is less than 4KHZ, the second intelligent terminal 1300 or the second intelligent terminal 800 sets a sampling frequency and a sampling rate to sample the cat in the period of time during collection, then extracts the characteristic parameters of the sampled cat cry, inputs the characteristic parameters into the pre-stored model of the first intelligent terminal 1200 or the first intelligent terminal 600 to identify, the model collects preset audio signals during training, for example, collects the shortlived cat cats of different types of pet cats, the identified mood corresponding to the shortlived cat is the happy mood, the first intelligent terminal 1200 or the first intelligent terminal 600 recognizes that the mood of the cat to be expressed is the happy mood as long as the input characteristic parameters conform to the characteristic parameters of the shortlived cat at this time The words which represent the mood are broadcasted through the audio output device or the loudspeaker after being subjected to digital-to-analog conversion, so that the users can know the mood of the pets in time, the pets can be properly acted, and the mood between the pets is improved.
In step 306, the recognition result is voice-broadcasted, the recognition result including words representing a mood, such as words of happiness, anger, fear, excitement, and the like. Or the words can also comprise words which represent the requirement class, such as words which are hungry, thirsty and sick, so that the words which represent the mood or the requirement class can be broadcasted by voice through digital-to-analog conversion in order to facilitate the user to know the meaning of the calling voice of the pet in time, the mood state of the pet at the moment can be known without the user looking up equipment which is worn by the user, and the use of the user is facilitated.
Optionally, the characteristic parameters include: pitch period, formant parameters, and mel-frequency cepstral coefficients of the audio data.
The pitch period is a detection method for recording the time length of the pitch, the airflow makes the vocal cords generate relaxation oscillation vibration through the glottis to generate a periodic pulse airflow, the airflow excites the vocal tract to generate voiced sound which carries most of the energy in the voice, the frequency of the vocal cord vibration is called the fundamental frequency, the corresponding period is called the pitch period, and the accurate extraction of the pitch period influences the recognition rate of the voice and the accuracy rate of the voice compression coding.
Formants refer to regions in the frequency spectrum of sound where energy is relatively concentrated, and are not only determinants of sound quality, but also reflect physical characteristics of vocal tract. The formant parameters include formant frequency, frequency bandwidth, and amplitude, and formant information is included in an envelope of the audio spectrum.
Mel-frequency cepstrum coefficients, which are coefficients constituting a mel-frequency cepstrum. Derived from the cepstrum of the audio piece, the band division of the mel-frequency cepstrum is equally divided on the mel scale, which more closely approximates the human auditory system than the linearly spaced bands used in the normal log cepstrum. This frequency warping may provide a better representation of sound.
Optionally, as shown in fig. 4, the step of generating the pre-stored model according to the embodiment of the present invention includes:
step 401, collecting audio signals of different pet languages;
step 402, extracting characteristic parameters of the audio signal;
step 403, acquiring training parameters, wherein the training parameters comprise a target value, a learning rate and a training round;
step 404, setting a neural network model based on the training parameters;
and 405, training the neural network model according to the characteristic parameters to obtain a prestored model.
In step 401, the model is trained according to the expected input and output effects, so that a specific pet language audio signal is collected according to needs, for example, when the model is required to output words representing joy as a result, cat voices of different pet cats short in a certain time need to be collected for training, and after the model is finally trained, as long as the input cat voices and the short cat voices have the same 70% probability, the recognition result outputs the words representing joy. It should be noted that the pre-stored model in the present application is obtained by training a neural network model, and the training process of the neural network model is common knowledge to those skilled in the art, and therefore, details are not described herein.
In step 402, after the required audio signal is acquired through step 401, feature parameters of the audio signal are extracted, where the feature parameters include a pitch period, a formant parameter, and a mel-frequency cepstrum coefficient.
The pitch period obtaining method comprises a waveform estimation method, a correlation processing method and a transformation method, wherein the pitch period is estimated by directly using a waveform generated by an audio signal in the waveform estimation method, and a period peak value on the waveform is analyzed; the related processing method is widely used in the voice number limiting processing; the pet voice pitch period is obtained by a cepstrum method, the pet voice audio signal is converted into a frequency domain or a cepstrum domain to estimate the pitch period, the influence of a vocal tract is eliminated by using a homomorphic analysis method, information belonging to an excitation part is obtained, and the pitch period is further obtained.
The extraction method of the formants comprises a formant calculation method based on linear prediction and a cepstrum method. According to the method, the formant of the pet language audio signal is extracted by using a cepstrum method, the cepstrum method is based on inverse Fourier transform of a logarithmic power spectrum, spectrum envelope and fine structures can be separated, and formant information can be obtained very accurately.
The extraction method of the mel-frequency cepstrum coefficient comprises the steps of conducting and processing on an audio signal, then conducting fast Fourier transform to obtain frequency spectrums of all frequencies, conducting modular squaring on the frequency spectrums to obtain power spectrums of the audio signal, enabling the power spectrums to pass through a mel filter bank, calculating logarithmic energy output by each filter, and conducting normalization processing to obtain an N-dimensional mel-frequency cepstrum coefficient, wherein N is a positive integer and is larger than 0.
In step 403, after the trained feature parameters are extracted, the training parameters of the neural network model need to be set, the neural network can learn and store a large number of input-output pattern mapping relations without revealing a mathematical equation describing the mapping relations in advance, and the learning rule is to use a steepest descent method to continuously adjust the weight and the threshold of the network through back propagation so as to minimize the error square sum of the network. The training parameters include a target value, a learning rate, and a training turn, the target value being a kind of recognition result that we expect to output, and exemplary words representing states such as happy, sad, lost, hard, hungry, thirsty, sleepy, and the like.
The learning rate is an important parameter in supervised learning and deep learning, and determines whether and when the objective function can converge to a local minimum. A suitable learning rate enables the objective function to converge to a local minimum over time.
The training round is the number of times of training aiming at a target value in a period of time, the more the training round is set, the better the training round is, the input of the same category has the output of 80 percent which is the same as the expected input, the learning can be stopped, and the waste of resources is avoided.
In step 404, the training parameters are set in advance, and before the training is started, the training can be started as long as the neural network model is adjusted according to the preset training parameters.
In step 405, after the training parameters are set in the neural network model, the extracted characteristic parameters may be input into the neural network model for training, and after training for 50 times within half an hour, the neural network model is trained to obtain an expected model, which is stored for recognition. It should be noted that the training process of the model may be completed in the server, and then the model is transplanted from the server into the electronic device or the first intelligent terminal 1200 or the first intelligent terminal 600, or may be completed in the electronic device or the first intelligent terminal 1200 or the first intelligent terminal 600, and the trained model is stored for later recognition.
Optionally, after acquiring the audio signals in the different pet languages, the method further includes: pre-processing an audio signal, the pre-processing comprising: pre-emphasis, framing, and windowing the audio signal.
The pre-emphasis is to boost the high frequency part to flatten the spectrum of the signal, and to maintain the spectrum in the whole frequency band from low frequency to high frequency, so that the spectrum can be obtained with the same signal-to-noise ratio. Meanwhile, the method is also used for eliminating the vocal cords and lip effects in the generation process, compensating the high-frequency part of the audio signal which is restrained by the pronunciation system, and highlighting the formants of the high frequency.
Since the audio signal exhibits stationarity for only a short period of time (typically 10-30ms), the speech signal is divided into a short period of time, i.e. a frame. Meanwhile, in order to avoid losing the dynamic information of the audio signal, an overlapping region is required between adjacent frames, and the overlapping region is 1/2 or 1/3 of the length of the frame.
Windowing multiplies each frame by a hamming window to increase the continuity of the left and right ends of each frame.
According to the embodiment of the present invention, a method for recognizing pet languages is provided, which is implemented by a first intelligent terminal 1200 or a first intelligent terminal 600, a second intelligent terminal 1300 or a second intelligent terminal 800 collects audio signals of pets in real time, extracts characteristic parameters of the audio signals, and sends the audio signals to the first intelligent terminal 1200 or the first intelligent terminal 600 configured for a user, wherein a model for performing characteristic recognition is pre-stored thereon, the model can output text expressing pet moods as long as the characteristic parameters of the audio signals are input to the model, the text performs digital-to-analog conversion and is broadcasted to the user through the first intelligent terminal 1200 or the first intelligent terminal 600, the user can grasp the moods expressed by the pet of the user in real time, the method for recognizing pet languages provided by the present application is simple in operation and high in recognition rate, and can help the human to better understand animals, so as to better interact with animals and improve the closeness between human beings and animals.
In another embodiment of the invention, a pet language identification method is provided.
Referring to fig. 5, which is a flowchart illustrating steps of a pet language identification method according to an embodiment of the present invention, the pet language identification method may be implemented by an electronic device, for example, the electronic device 2000 shown in fig. 2, the second smart terminal 1300 shown in fig. 1, or the second smart terminal 800 shown in fig. 8.
As shown in fig. 5, the pet language identification method according to the embodiment of the present invention includes the following steps:
step 501, collecting audio signals of pet languages;
step 502, extracting characteristic parameters of the audio signal;
and step 503, sending the characteristic parameters to the first intelligent terminal.
In step 501, the electronic device 2000, the second intelligent terminal 1300, or the second intelligent terminal 800 is disposed at a place where an audio signal of a pet can be collected, and when the pet sends the audio signal, the electronic device 2000, the second intelligent terminal 1300, or the second intelligent terminal 800 switches from a sleep mode to a working mode to collect the audio signal in real time, collect the audio signal within a preset time period, and store the audio signal after collecting the audio signal for use in feature extraction in the next step.
In step 502, since the electronic device 2000, the first smart terminal 1200 or the first smart terminal 600 receives the characteristic parameters for identifying the audio signal, the extraction of the characteristic parameters is required after the audio signal is acquired, and the characteristic parameters are the pitch period, the formant parameter and the mel-frequency cepstrum coefficient of the audio signal. The above description of the method for extracting the feature parameters is not repeated herein.
In step 503, after the characteristic parameters of the audio signal are extracted, the characteristic parameters are sent to the electronic device 2000, the first smart terminal 1200, or the first smart terminal 600 through the network for identification.
According to the embodiment of the present invention, a method for recognizing pet languages is provided, which is implemented by a second intelligent terminal 1300 or a second intelligent terminal 800, the second intelligent terminal 1300 or the second intelligent terminal 800 collects audio signals of a pet in real time, extracts characteristic parameters of the audio signals, and sends the audio signals to a first intelligent terminal 1200 or a first intelligent terminal 600 configured for a user, wherein a model for performing characteristic recognition is pre-stored thereon, the model can output text expressing pet moods as long as the characteristic parameters of the audio signals are input to the model, the text performs digital-to-analog conversion and is broadcasted to the user through the first intelligent terminal 1200 or the first intelligent terminal 600, the user can grasp the moods expressed by the pet of the user in real time, the method for recognizing pet languages provided by the present application is simple in operation, has a high recognition rate, and can help the human to better know animals, so as to better interact with animals and improve the closeness between human beings and animals.
In yet another embodiment of the present invention, a first intelligent terminal 600 is provided, please refer to fig. 6, which is a block diagram illustrating a structure of the first intelligent terminal 600 according to an embodiment of the present invention. The first intelligent terminal 600 includes a receiving module 601, an identifying module 602, and a broadcasting module 603.
The receiving module 601 is used for receiving the characteristic parameters of the audio signal.
The identification module 602 is configured to identify the characteristic parameters according to a pre-stored model to obtain an identification result.
The broadcast module 603 is configured to perform voice broadcast on the recognition result.
The first intelligent terminal 600 further comprises a generating module 604 for generating a pre-stored model.
The generation module further includes an acquisition unit 6041, an extraction unit 6042, an acquisition unit 6043, a configuration unit 6044, and a training unit 6045.
The acquisition unit 6041 is used to acquire audio signals in different pet languages.
The extraction unit 6042 is configured to extract a feature parameter of the audio signal.
The acquisition unit 6043 is configured to acquire training parameters including a target value, a learning rate, and a training round.
The configuration unit 6044 is configured to configure the neural network model based on the training parameters.
The training unit 6045 is configured to train the neural network model according to the characteristic parameters to obtain the pre-stored model.
The generation module further comprises a preprocessing unit 6046, the preprocessing unit 6046 is configured to preprocess the audio signal, the preprocessing includes: the audio signal is pre-emphasized, framed and windowed.
Referring to fig. 7, in another embodiment, the first intelligent terminal 600 may further include a processor 606 and a memory 608, where the memory 608 is used for storing executable instructions for controlling the processor 606 to execute the pet language identification method according to the first embodiment of the present invention.
The various modules of the first intelligent terminal 600 in the above embodiments may be implemented by the processor 606.
According to the embodiment of the invention, the first intelligent terminal 600 and the second intelligent terminal 800 are provided for acquiring the audio signals of pets in real time, extracting the characteristic parameters of the audio signals and sending the characteristic parameters to the first intelligent terminal 600 configured for a user, the model for characteristic identification is pre-stored on the first intelligent terminal, the model can output the text expressing the mood of the pet as long as the characteristic parameters of the audio signals are input to the model, the text is subjected to digital-to-analog conversion and is broadcasted to the user through the first intelligent terminal 600, and the user can master the mood expressed by the pet of the user in real time.
In yet another embodiment of the present invention, a second intelligent terminal 800 is provided, please refer to fig. 8, which is a block diagram illustrating a structure of the second intelligent terminal 800 according to an embodiment of the present invention. The second intelligent terminal 800 includes an acquisition module 801, an extraction module 802, and a sending module 803.
The collection module 801 is used for collecting audio signals of pet languages.
The extraction module 802 is configured to extract feature parameters of the audio signal.
The sending module 803 is configured to send the feature parameters to the first intelligent terminal 600.
Referring to fig. 9, in another embodiment, the second smart terminal 800 may further include a processor 806 and a memory 808, where the memory 808 is used to store executable instructions for controlling the processor 806 to execute the pet language identification method according to the second embodiment of the present invention.
The various modules of the second intelligent terminal 800 in the above embodiments may be implemented by the processor 806.
According to the embodiment of the invention, the second intelligent terminal 800 collects the audio signals of the pets in real time, extracts the characteristic parameters of the audio signals and sends the characteristic parameters to the first intelligent terminal 600 configured for the user, the model for characteristic identification is pre-stored on the second intelligent terminal, the model can output the text expressing the mood of the pets as long as the characteristic parameters of the audio signals are input to the model, the text is subjected to digital-to-analog conversion and is broadcasted to the user through the first intelligent terminal 600, and the user can master the mood expressed by the pet of the user in real time.
Finally, according to yet another embodiment of the present invention, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the pet language identification method according to any of the embodiments of the present invention.
The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.
Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are equivalent.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims (15)

1. A pet language identification method is executed on a first intelligent terminal and is characterized by comprising the following steps:
receiving characteristic parameters of an audio signal;
identifying the characteristic parameters according to a pre-stored model to obtain an identification result;
and carrying out voice broadcast on the recognition result.
2. The method of claim 1, wherein the characteristic parameters comprise: pitch period, formant parameters, and mel-frequency cepstrum coefficients of the audio signal.
3. The method of claim 1, wherein the method further comprises the step of generating the pre-stored model comprising:
collecting audio signals of different pet languages;
extracting characteristic parameters of the audio signal;
acquiring training parameters, wherein the training parameters comprise a target value, a learning rate and a training round;
setting a neural network model based on the training parameters;
and training the neural network model according to the characteristic parameters to obtain the prestored model.
4. The method of claim 3, wherein after said capturing audio signals in a different pet-specific language, the method further comprises:
pre-processing the audio signal, the pre-processing comprising: pre-emphasis, framing, and windowing the audio signal.
5. The method of claim 3, wherein extracting the feature parameters of the audio signal comprises:
and obtaining the pitch period and the formant parameters by a cepstrum method.
6. The method of claim 3, wherein extracting the feature parameters of the audio signal further comprises:
carrying out fast Fourier transform on the audio signal to obtain frequency spectrums of all frequencies, and carrying out modular square on the frequency spectrums to obtain power spectrums of the audio signal;
and (3) enabling the power spectrum to pass through a Mel filter bank, calculating the logarithmic energy output by each filter, and performing normalization processing to obtain an N-dimensional Mel frequency cepstrum coefficient, wherein N is a positive integer and is more than 0.
7. A pet language identification method is executed on a second intelligent terminal, and is characterized by comprising the following steps:
collecting audio signals of the pet language;
extracting characteristic parameters of the audio signal;
and sending the characteristic parameters to the first intelligent terminal.
8. A first intelligent terminal, comprising:
the receiving module is used for receiving the characteristic parameters of the audio signal;
the identification module is used for identifying the characteristic parameters according to a pre-stored model so as to obtain an identification result;
and the broadcasting module is used for broadcasting the recognition result by voice.
9. The terminal of claim 8, wherein the terminal further comprises:
and the generating module is used for generating the pre-stored model.
10. The terminal of claim 9, wherein the generating module further comprises:
the acquisition unit is used for acquiring audio signals of different pet languages;
an extraction unit for extracting a feature parameter of the audio signal;
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring training parameters, and the training parameters comprise a target value, a learning rate and a training round;
the configuration unit is used for configuring the neural network model based on the training parameters;
and the training unit is used for training the neural network model according to the characteristic parameters to obtain the pre-stored model.
11. The terminal of claim 10, wherein the generating module further comprises:
a pre-processing unit for pre-processing the audio signal, the pre-processing comprising: pre-emphasis, framing, and windowing the audio signal.
12. A second intelligent terminal, comprising:
the acquisition module is used for acquiring the audio signal of the pet language;
the extraction module is used for extracting the characteristic parameters of the audio signal;
and the sending module is used for sending the characteristic parameters to the first intelligent terminal.
13. An electronic device, comprising:
a first intelligent terminal according to any one of claims 8 to 11; alternatively, the first and second electrodes may be,
a processor and a memory for storing executable instructions for controlling the processor to perform the pet language identification method of any one of claims 1 to 6; alternatively, the first and second electrodes may be,
the second smart terminal of claim 12; alternatively, the first and second electrodes may be,
a processor and a memory for storing executable instructions for controlling the processor to perform the pet language identification method of claim 7.
14. A pet language identification system, the system comprising:
a first intelligent terminal according to any one of claims 8 to 11; and
the second smart terminal of claim 12;
and the first intelligent terminal is in communication connection with the second intelligent terminal.
15. A computer-readable storage medium, characterized in that a computer program is stored thereon, which when executed by a processor implements the pet language identification method according to any one of claims 1 to 7.
CN201911195058.5A 2019-11-28 2019-11-28 Pet language identification method and device, electronic equipment and readable storage medium Pending CN110970037A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911195058.5A CN110970037A (en) 2019-11-28 2019-11-28 Pet language identification method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911195058.5A CN110970037A (en) 2019-11-28 2019-11-28 Pet language identification method and device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN110970037A true CN110970037A (en) 2020-04-07

Family

ID=70032045

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911195058.5A Pending CN110970037A (en) 2019-11-28 2019-11-28 Pet language identification method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN110970037A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110082574A1 (en) * 2009-10-07 2011-04-07 Sony Corporation Animal-machine audio interaction system
CN104700829A (en) * 2015-03-30 2015-06-10 中南民族大学 System and method for recognizing voice emotion of animal
KR20150114310A (en) * 2014-04-01 2015-10-12 백승연 Interpretation system for interpretion movement of animal
CN106297808A (en) * 2015-06-07 2017-01-04 高博文 A kind of Canis familiaris L. volume discriminating conduct comprehensively analyzed with Semantic mapping based on PNN
CN106340309A (en) * 2016-08-23 2017-01-18 南京大空翼信息技术有限公司 Dog bark emotion recognition method and device based on deep learning
CN106531173A (en) * 2016-11-11 2017-03-22 努比亚技术有限公司 Terminal-based animal data processing method and terminal
CN109272986A (en) * 2018-08-29 2019-01-25 昆明理工大学 A kind of dog sound sensibility classification method based on artificial neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110082574A1 (en) * 2009-10-07 2011-04-07 Sony Corporation Animal-machine audio interaction system
KR20150114310A (en) * 2014-04-01 2015-10-12 백승연 Interpretation system for interpretion movement of animal
CN104700829A (en) * 2015-03-30 2015-06-10 中南民族大学 System and method for recognizing voice emotion of animal
CN106297808A (en) * 2015-06-07 2017-01-04 高博文 A kind of Canis familiaris L. volume discriminating conduct comprehensively analyzed with Semantic mapping based on PNN
CN106340309A (en) * 2016-08-23 2017-01-18 南京大空翼信息技术有限公司 Dog bark emotion recognition method and device based on deep learning
CN106531173A (en) * 2016-11-11 2017-03-22 努比亚技术有限公司 Terminal-based animal data processing method and terminal
CN109272986A (en) * 2018-08-29 2019-01-25 昆明理工大学 A kind of dog sound sensibility classification method based on artificial neural network

Similar Documents

Publication Publication Date Title
US11475897B2 (en) Method and apparatus for response using voice matching user category
CN109741732B (en) Named entity recognition method, named entity recognition device, equipment and medium
US20120116756A1 (en) Method for tone/intonation recognition using auditory attention cues
CN110970036B (en) Voiceprint recognition method and device, computer storage medium and electronic equipment
Chaki Pattern analysis based acoustic signal processing: a survey of the state-of-art
CN110136726A (en) A kind of estimation method, device, system and the storage medium of voice gender
CN111179910A (en) Speed of speech recognition method and apparatus, server, computer readable storage medium
CN110931023A (en) Gender identification method, system, mobile terminal and storage medium
Usman et al. Heart rate detection and classification from speech spectral features using machine learning
CN108962226B (en) Method and apparatus for detecting end point of voice
CN110930975A (en) Method and apparatus for outputting information
CN113539243A (en) Training method of voice classification model, voice classification method and related device
US20230317092A1 (en) Systems and methods for audio signal generation
Nirjon et al. sMFCC: exploiting sparseness in speech for fast acoustic feature extraction on mobile devices--a feasibility study
Laghari et al. Robust speech emotion recognition for sindhi language based on deep convolutional neural network
Chi et al. Spectro-temporal modulation energy based mask for robust speaker identification
Chen et al. CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application
CN110970037A (en) Pet language identification method and device, electronic equipment and readable storage medium
CN114724589A (en) Voice quality inspection method and device, electronic equipment and storage medium
CN114302301A (en) Frequency response correction method and related product
CN113129926A (en) Voice emotion recognition model training method, voice emotion recognition method and device
Gutkin Eidos: an open-source auditory periphery modeling toolkit and evaluation of cross-lingual phonemic contrasts
CN111899718A (en) Method, apparatus, device and medium for recognizing synthesized speech
Krishnaveni et al. An Optimal Speech Recognition Module for Patient's Voice Monitoring System in Smart Healthcare Applications
Sharma et al. Emotion Recognition based on audio signal using GFCC Extraction and BPNN Classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200407