CN106791010B - Information processing method and device and mobile terminal - Google Patents

Information processing method and device and mobile terminal Download PDF

Info

Publication number
CN106791010B
CN106791010B CN201611069130.6A CN201611069130A CN106791010B CN 106791010 B CN106791010 B CN 106791010B CN 201611069130 A CN201611069130 A CN 201611069130A CN 106791010 B CN106791010 B CN 106791010B
Authority
CN
China
Prior art keywords
voice data
voice
information
predefined
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611069130.6A
Other languages
Chinese (zh)
Other versions
CN106791010A (en
Inventor
汤涌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Anyun Century Technology Co Ltd
Original Assignee
Beijing Anyun Century Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Anyun Century Technology Co Ltd filed Critical Beijing Anyun Century Technology Co Ltd
Priority to CN201611069130.6A priority Critical patent/CN106791010B/en
Publication of CN106791010A publication Critical patent/CN106791010A/en
Application granted granted Critical
Publication of CN106791010B publication Critical patent/CN106791010B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/72418User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality for supporting emergency services
    • H04M1/72421User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality for supporting emergency services with automatic activation of emergency service functions, e.g. upon sensing an alarm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/74Details of telephonic subscriber devices with voice recognition means

Abstract

The invention discloses an information processing method, an information processing device and a mobile terminal, and particularly comprises the steps of receiving voice data when a voice monitoring mode is started; and matching the voice characteristics in the voice data content information with a predefined voice data model, wherein the predefined voice data model is generated according to voice data pre-recorded by a user, and if the matching result is obtained, sending specified information to a preset contact person. Therefore, when an emergency happens, the user can ask for help urgently without contacting the mobile phone, the appointed information is sent to the preset contact person, and the preset contact person can take necessary rescue actions quickly after receiving the information, so that the help-asking operation steps are simplified, the time for the user to be rescued is shortened, the safety coefficient of the user when the user meets the emergency is improved, and the user is not easy to find the emergency by criminals.

Description

Information processing method and device and mobile terminal
Technical Field
The present invention relates to the field of speech recognition technology, and more particularly, to a method, an apparatus, and a mobile terminal for processing information.
Background
With the rapid development of communication technology, mobile terminals become an essential device for daily life, and meanwhile, the development of mobile communication terminals is mature day by day, so that various application operations can be realized, and users can send help-seeking information through the mobile communication terminals when encountering danger.
In the prior art, a user needs to contact a mobile terminal for help seeking by using a mobile communication terminal to send alarm information or call out a call, recording on the spot is manually operated, the user cannot automatically send help seeking information in emergency or specific occasions, and the action of self rescue by the user when the user intends to contact the mobile terminal can be identified and prevented by dangerous molecules. It is more common that children leave the parent's sight, with stranger's date, women walk night or various reasons such as the car is taken at night all can cause the lost problem of population, if again, in order to avoid the danger of taking the car at night, the women is taken the car as far as possible and is rented when taking the car, but also can't avoid meeting bad car owner. In addition, when the party is in danger during the trip process, the distress signal cannot be sent to family or friends in time, even the alarm cannot be given, and even if the distress signal is sent, the distress signal is possibly in a strange place and cannot be informed to the place of the family or friends in time, so that the timely rescue is influenced.
Disclosure of Invention
In view of the above problems, the present invention provides an information processing method, apparatus and mobile terminal that overcome or at least partially solve the above problems, improve the safety factor of a user when encountering an emergency, and are not easily discovered by criminals.
In order to solve the technical problem, the embodiment of the invention discloses the following technical scheme:
in a first aspect, an embodiment of the present invention provides an information processing method, including:
receiving voice data when a voice monitoring mode is started;
acquiring voice characteristics in the voice data content information;
matching the voice features with a predefined voice data model, wherein the predefined voice data model is generated according to voice data pre-recorded by a user;
and if the contact person is matched with the contact person, sending specified information to the preset contact person.
With reference to the first aspect, in a first implementation manner of the first aspect, the voice features include pitch, tone strength, duration, and timbre of the voice.
With reference to the first aspect, in a second implementation manner of the first aspect, after the receiving the voice data, the method further includes,
splitting the voice data into a single element and a single element of a predefined voice data model for recognition, and discarding the voice data if the recognition error is not less than a preset value;
the recognition error value is preset at 35%.
With reference to the first aspect, in a third implementation manner of the first aspect, the predefined speech data model is generated according to speech data pre-entered by a user, and includes,
pre-inputting user voice to generate first voice data;
extracting voice characteristics according to the first voice data content information;
converting the voice characteristics of the first voice data content information into digital information, and splitting the digital information into a plurality of component elements;
recombining the plurality of constituent elements into second voice data;
the first speech data and the second speech data are formed into a predefined speech data model.
With reference to the second implementation manner of the first aspect, in a fourth implementation manner of the first aspect, the predefined speech data model is generated according to speech data pre-entered by a user, and includes,
pre-inputting user voice to generate first voice data;
extracting voice characteristics according to the first voice data content information;
converting the voice characteristics of the first voice data content information into digital information, and splitting the digital information into a plurality of component elements;
recombining the plurality of constituent elements into second voice data;
the first speech data and the second speech data are formed into a predefined speech data model.
With reference to the first aspect, the present invention provides, in a fifth implementation manner of the first aspect, the matching the speech feature with a predefined speech data model includes,
judging whether the matching ambiguity of the voice features and a predefined voice data model is greater than a preset value, and if so, confirming the matching;
the ambiguity setting is 65%.
With reference to the first aspect, the present invention provides, in a sixth implementation manner of the first aspect, the matching the speech with a predefined speech data model, including,
extracting voice characteristics according to the content information of the received voice data;
judging whether the ambiguity of the voice features and the matching according to the predefined voice data model is greater than a preset value, and if not, confirming that the ambiguity is not matched;
and if the confirmation is not matched, discarding the voice data.
With reference to the first aspect, in a seventh implementation manner of the first aspect, the sending, if matching, specific information to a predetermined contact includes,
if the contact person information is matched with the preset contact person information, triggering and awakening the preset application program to send the specified information to the preset contact person;
the triggering and awakening are that a triggering instruction is transmitted to a preset application program in an intent mode, and the preset application program is awakened at the same time.
With reference to the first aspect, in an eighth implementation manner of the first aspect, the sending the specific information to the predetermined contact if the matching is performed further includes,
recording an environment sound;
calling a GPS to position the current position;
adding the GPS positioning information and the environmental record into the designated information;
and triggering and awakening a preset application program to send the specified information to a preset contact person.
With reference to the seventh implementation manner of the first aspect, in a ninth implementation manner of the first aspect, the sending the specific information to the predetermined contact if the matching is performed further includes,
recording an environment sound;
calling a GPS to position the current position;
adding the GPS positioning information and the environmental record into the designated information;
and triggering and awakening the preset application program.
In a second aspect, the present invention provides an apparatus for information processing, comprising,
the monitoring module is used for monitoring voice to obtain voice data;
the receiving module is used for receiving the voice data;
the acquisition module is used for acquiring the voice characteristics in the voice data content information;
the matching module is used for matching the voice characteristics with a predefined voice data model, wherein the predefined voice data model is generated according to voice data which is pre-recorded by a user;
the setting module is used for presetting at least more than one contact person to receive the designated information; and the processing module is used for sending the specified information to the preset contact person when the matching is determined.
With reference to the second aspect, in a first implementation manner of the second aspect, the voice features include pitch, tone strength, duration, and timbre of the voice.
With reference to the second aspect, in a second implementation manner of the second aspect, after the receiving the voice data, the method further includes,
splitting the voice data into a single element and a single element of a predefined voice data model for recognition, and discarding the voice data if the recognition error is not less than a preset value;
the recognition error value is preset at 35%.
With reference to the second aspect, in a third implementation manner of the second aspect, the present invention further includes a storage module, where the storage module is configured to store a predefined voice data model, where the predefined voice data model is generated according to voice data pre-entered by a user, and includes,
pre-inputting user voice to generate first voice data;
extracting voice characteristics according to the content information of the first voice data;
converting the voice characteristics of the first voice data content information into digital information, and splitting the digital information into a plurality of component elements;
recombining the plurality of constituent elements into second voice data;
the first speech data and the second speech data are formed into a predefined speech data model.
With reference to the second implementation manner of the second aspect, in a fourth implementation manner of the second aspect, the present invention further includes a storage module, where the storage module is configured to store a predefined voice data model, and the predefined voice data model is generated according to voice data pre-entered by a user, and includes,
pre-inputting user voice to generate first voice data;
extracting voice characteristics according to the content information of the first voice data;
converting the voice characteristics of the first voice data content information into digital information, and splitting the digital information into a plurality of component elements;
recombining the plurality of constituent elements into second voice data;
the first speech data and the second speech data are formed into a predefined speech data model.
With reference to the second aspect, in a fifth implementation form of the second aspect,
the matching module matches the speech features to a predefined speech data model, including,
judging whether the matching ambiguity of the voice features and a predefined voice data model is greater than a preset value, and if so, confirming the matching;
the preset value of the degree of blur is 65%.
With reference to the second aspect, the present invention provides in a sixth implementation manner of the second aspect, wherein the matching module matches the speech feature with a predefined speech data model, including,
judging whether the ambiguity of the voice features and the matching according to the predefined voice data model is greater than a preset value, and if not, confirming that the ambiguity is not matched;
and if the confirmation is not matched, discarding the voice data.
With reference to the second aspect, in a seventh implementation manner of the second aspect, if the processing module matches, the processing module sends the specific information to a predetermined contact, including,
if the contact person information is matched with the preset contact person information, triggering and awakening the preset application program to send the specified information to the preset contact person;
the triggering and awakening are that a triggering instruction is transmitted to a preset application program in an intent mode, and the preset application program is awakened at the same time.
With reference to the second aspect, the present invention, in an eighth implementation manner of the second aspect, further includes,
the recording module is used for recording the environmental sound in real time;
the calling module is used for calling the GPS to position the current position;
the information module is used for adding the environmental recording and the GPS positioning information into the specified information;
and the triggering module is used for triggering and awakening a preset application program to send the specified information to a preset contact person.
With reference to the seventh implementation manner of the second aspect, in a ninth implementation manner of the second aspect, the present invention further includes,
the recording module is used for recording the environmental sound in real time;
the calling module is used for calling the GPS to position the current position;
the information module is used for adding the environmental recording and the GPS positioning information into the specified information;
and the triggering module is used for triggering and awakening a preset application program to send the specified information to a preset contact person.
In a third aspect, the present invention provides a mobile terminal comprising,
presetting a function key, a processor, a memory and a microphone;
the function keys are used for a user to trigger and generate an operation instruction in a screen locking state;
the microphone is used for receiving a control instruction and starting or ending recording;
the memory is used for storing a program for the recording device to execute the information processing method in any one of the first aspect in a state of supporting screen locking;
the processor is configured to execute programs stored in the memory.
Compared with the prior art, the technical scheme provided by the invention at least has the following advantages:
when an emergency happens, the user can ask for help urgently without contacting the mobile phone, the preset contact person sends the appointed information, and the preset contact person can quickly take necessary rescue actions after receiving the information, so that the help asking operation steps are simplified, the time for the user to be rescued is shortened, the safety coefficient of the user when the user meets the emergency is improved, and the user is not easy to find by criminals.
These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 shows a flow chart of a method of information processing of the present invention.
FIG. 2 shows the basic structure of a speech model HMM of a method of information processing in one embodiment of the present invention.
Fig. 3 shows a flow diagram of a Sphinx continuous speech recognition system of a method of information processing in an embodiment of the invention.
Fig. 4 shows a block diagram of an information processing apparatus of the present invention.
Fig. 5 shows a block diagram of a mobile terminal according to the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.
In some of the flows described in the present specification and claims and in the above figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they occur herein, with the order of the operations being indicated as 101, 102, etc. merely to distinguish between the various operations, and the order of the operations by themselves does not represent any order of performance. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
For explaining the method implementation process of the information processing method, apparatus and mobile terminal of the present invention, please refer to fig. 1, the method includes the following steps:
s101: receiving voice data when a voice monitoring mode is started;
when a user starts the mobile terminal, the user starts to monitor surrounding voice data in real time;
receiving voice data once the voice data is monitored;
preferably, the voice data is divided into a single element and a single element of a predefined voice data model for recognition, and if the recognition error is not less than a preset value, the voice data is discarded;
the recognition error value is preset at 35%.
S102: acquiring voice characteristics in the voice data content information;
the voice features comprise the pitch, the tone intensity, the duration and the tone color of the voice.
S103: and matching the voice characteristics with a predefined voice data model, wherein the predefined voice data model is generated according to voice data pre-recorded by a user.
Preferably, at least one predefined voice data model is constructed, for example, a voice data model preset by an importing manufacturer, such as "SOS", "lifesaving" and "help", and other "lifesaving" languages and/or dialects of other countries;
the user can also set up the speech data model according to the demand of oneself by oneself, specifically include:
the method comprises the steps of pre-inputting user voice to generate first voice data, and enabling a user to input voice under normal conditions, such as: inputting a help-seeking language which is not easy to be identified by a gangster, namely 'shouting me to return home to eat' and the like, and if the input is not successful or needs to be input for a plurality of times, repeatedly inputting the help-seeking language;
extracting voice characteristics according to the content information of the first voice data;
converting the voice characteristics of the first voice data content information into digital information, and splitting the digital information into a plurality of component elements;
recombining said plurality of constituent elements into second speech data;
the first speech data and the second speech data are formed into a predefined speech data model.
Specifically, HTM (HMM Tools kit) and CMUSpanx are used for illustration.
The htm (HMM Tools kit) was developed by the Speech Vision and Robotics Group (Speech Vision and Robotics Group) of the engineering system of cambridge university, england, and is an experimental kit specifically used for building and processing HMMs, mainly applied to the field of Speech recognition, and also used for testing and analyzing Speech models. The specific training steps are as follows:
(1) data preparation
And collecting a corpus of the standard mandarin Chinese, marking the voice in the corpus, and creating a voice recognition module or unit element list file.
(2) Feature extraction
The system adopts MFCC to extract the characteristic parameters of voice, and converts each voice file into MFCC format by using a tool HCopy in training.
(3) HMM definition
Given the initial framework of models in training an HMM model, the HMM models in the present system select the same structure, as shown in fig. 2. The model contains 4 active states S2, S3, S4, S5), start and end (here S1, S6), non-divergent states. The observation function bi is a gaussian distribution with a diagonal matrix and the possible transitions of the states are denoted by aij.
(4) HMM training
Before the training process starts, in order to make the training algorithm converge quickly and accurately, HMM model parameters must be initialized correctly according to the training data. The HTK provides 2 different initialization tools: the Hinit tool and the HCompv tool initialize HMM models first by using the HINit tool and then initialize models flat by using the HCompv tool. Each state of the HMM model is computed globally over the entire training set, given the same mean and variation vectors. And finally, estimating the optimal value of the HMM model parameter by using multiple estimation iterations of HRest, and integrating the single HMM model obtained by training into an hmmdsef.
Sphinx is a large vocabulary, non-human specific, continuous English speech recognition system developed by the university of Chimerlon, U.S.A. A continuous speech recognition system can be roughly divided into four parts: feature extraction, acoustic model training, language model training and a decoder. As shown in fig. 3:
(1) a preprocessing module:
the method comprises the steps of processing an input original voice signal, filtering unimportant information and background noise, and carrying out end point detection (finding out the beginning and the end of the voice signal), voice framing (approximately considering that the voice signal is short-time and stable within 10-30ms, dividing the voice signal into a section for analysis), pre-emphasis (improving a high-frequency part) and the like on the voice signal.
(2) Feature extraction:
redundant information which is useless for voice recognition in the voice signal is removed, information which can reflect the voice essential characteristics is reserved and is represented in a certain form. That is, extracting key feature parameters reflecting the features of the speech signal to form a feature vector sequence for subsequent processing.
(3) Training an acoustic model:
and training the acoustic model parameters according to the characteristic parameters of the training voice library. During recognition, the characteristic parameters of the speech to be recognized can be matched with the acoustic model to obtain a recognition result.
(4) Training a language model:
the language model is a probabilistic model for calculating a probability of occurrence of a sentence. It is mainly used to decide which word sequence is more likely or to predict the content of the next upcoming word in case several words are present. Stated another way, the language model is used to constrain word searches. It defines which words can follow an already recognized word (matching is a sequential process) and thus can exclude impossible words for the matching process.
Further, matching the speech features to a predefined speech data model, including,
and judging whether the matching ambiguity of the voice features and the predefined voice data model is greater than a preset value, if so, confirming the matching, wherein the preset ambiguity value is 65%.
Further, the matching the speech features with the predefined speech data models includes:
judging whether the ambiguity of the voice features and the matching according to the predefined voice data model is greater than a preset value, and if not, confirming that the ambiguity is not matched;
and if the confirmation is not matched, discarding the voice data.
For example, the user sends out that "mom yells you to eat home", that is, "mom yells you to eat home" can be matched with "something, i.e. my yell me to eat home";
when the ambiguity matches not less than 65%, a match is confirmed.
S103: if the contact person is matched with the contact person, sending specified information to a preset contact person;
preferably, if the contact person information is matched with the preset contact person information, the preset application program is triggered and awakened to send the specified information to the preset contact person;
the triggering and awakening are that a triggering instruction is transmitted to a preset application program in an intent mode, and the preset application program is awakened at the same time.
Further, if the matching is confirmed, environment sound is input, the GPS is called to position the current position, the GPS positioning information and the environment recording are added to the designated information, and the designated information is sent to a preset contact person;
the GPS positioning information includes latitude information and longitude information of the current location.
Further, if the confirmation does not match, the voice data is discarded.
Example two
Referring to fig. 4, for explaining the composition of the detailed modules of the client, the present embodiment at least includes the following modules:
the system comprises a monitoring module 401, a receiving module 402, an obtaining module 403, an identifying module 404, a matching module 405, a processing module 406, a recording module 407, a calling module 408, an information module 409, a triggering module 410, a storage module 411 and a setting module 412.
The monitoring module 401 is configured to monitor voice to obtain voice data, and start to monitor surrounding voice data in real time when a user starts a mobile device;
the receiving module 402 is configured to receive the voice data;
the obtaining module 403 is configured to obtain a voice feature in the voice data content information;
further, the voice features include pitch, tone strength, duration and tone of the voice.
Preferably, the recognition module 404 is further included for splitting the voice data into a single element and a single element of a predefined voice data model for recognition.
Further, if the recognition error of the recognition module for recognizing the single voice data is not less than a preset value, discarding the voice data;
the recognition error value is preset at 35%.
The matching module 405 is configured to match the voice feature with a predefined voice data model, where the predefined voice data model is generated according to voice data pre-entered by a user;
preferably, at least one predefined voice data model is constructed, and voice data models preset by a manufacturer are imported, such as "SOS", "lifesaving" and "help", and other "lifesaving" languages and/or dialects of various other countries;
the voice data model can also be set according to the self requirements of the user, and the method specifically comprises the following steps:
the storage module 411 is configured to store the constructed at least one predefined speech data model 4111;
the constructed at least one predefined speech data model 4111:
the user voice data is entered, and the user can enter voice under normal conditions, such as: inputting a help-seeking language which is not easy to be identified by a gangster, namely 'shouting me to return home to eat' and the like, and if the system prompts that the input is unsuccessful or needs to be input for a plurality of times, the help-seeking language can be repeatedly input;
extracting voice characteristics according to the content information of the user voice data;
converting the voice characteristics of the first voice data content information into digital information, and splitting the digital information into a plurality of component elements;
recombining the constituent elements of the plurality into second voice data;
the first speech data and the second speech data are formed into a predefined speech data model.
Further, the ambiguities are set according to a predefined speech data model, said ambiguities being set to 65%.
Further, matching the speech features to a predefined speech data model includes:
judging whether the ambiguity of the voice features and the matching according to the predefined voice data model is greater than a preset value, and if not, confirming that the ambiguity is not matched;
and if the confirmation is not matched, discarding the voice data.
For example, the user sends out that "mom yells you to eat home", that is, "mom yells you to eat home" can be matched with "something, i.e. my yell me to eat home";
when the ambiguity matching is not less than 65%, confirming the matching; otherwise, it is determined as not matching.
The processing module 406 is configured to send the specific information to the predetermined contact person when the matching is confirmed.
The recording module 407 is configured to record an environmental sound in real time;
the calling module 408 is configured to call a GPS to locate the current position; the positioning information comprises latitude information and longitude information of the current position;
the information module 409 is used for adding the environmental recording and the GPS positioning information into the designated information;
preferably, the triggering module 410 is configured to trigger and wake up a preset application program to send the specified information to a predetermined contact person when the matching is confirmed;
further, when the matching is confirmed, environment sound is input, the GPS is called to position the current position, GPS positioning information and environment recording are added into the designated information, and a preset application program is triggered and awakened to send the designated information to a preset contact person.
The setting module 412 is configured to preset at least one contact 4121 to receive the specific information.
Further, the processing module 406 continues to listen when it determines a mismatch.
EXAMPLE III
To illustrate that a method and apparatus for information processing according to the present invention are implemented in a mobile terminal as described below,
the mobile terminal comprises a preset function key, a processor, a memory and a microphone;
the function keys are used for a user to trigger and generate an operation instruction in a screen locking state;
the microphone is used for receiving a control instruction and starting or ending recording;
the memory is used for storing a program of the method for executing the information processing by the sound recording device in the screen locking state;
the processor is configured to execute programs stored in the memory.
The detailed description is as follows:
the system of the embodiment comprises at least one device and a preset contact for receiving the help-seeking information sent by the device.
The mobile terminal may be included in any mobile terminal such as a mobile phone, a tablet computer, a PDA (Personal Digital Assistant), etc., and the following description is given by taking a mobile terminal mobile phone as an example:
as shown in fig. 5, the mobile terminal handset includes: radio Frequency (RF) circuitry 510, memory 520, input module or unit 530, display module or unit 540, sensor 550, audio circuitry 560, wireless fidelity (WiFi) module 570, processor 580, and power supply 590. Those skilled in the art will appreciate that the handset configuration shown in fig. 5 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
The following describes each component of the mobile phone in detail with reference to fig. 5:
the RF circuit 510 may be used for receiving and transmitting signals during a message transmission or call, and in particular, for receiving downlink information of a base station and processing the received downlink information, and for transmitting data designed for uplink to the base station, the RF circuit 510 may include, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (L w noise amplifier, &lttttransmission = L "&ttt/t &gttna), a duplexer, etc. furthermore, the RF circuit 510 may communicate with a network and other devices through wireless communication, which may use any communication standard or protocol, including, but not limited to, a global system for Mobile communication (GSM), a General Packet radio Service (General Packet radio Service, GPRS), a Code Division Multiple Access (Code Division Multiple Access, Wideband CDMA), a Code Division Multiple Access (CDMA), a Short Service Access (SMS Service, L), a long Term Evolution (SMS) message, L, a Service, a Short Service (Service), a WCDMA, a Mobile communication system, a Mobile communication, a wireless.
The memory 520 may be used to store software programs and modules, and the processor 580 executes various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 520. The memory 520 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 520 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
The input module or unit 530 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone. Specifically, the input module or unit 530 may include a touch panel 531 and other input devices 532. The touch panel 531, also called a touch screen, can collect touch operations of a user on or near the touch panel 531 (for example, operations of the user on or near the touch panel 531 by using any suitable object or accessory such as a finger or a stylus pen), and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 531 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, and sends the touch point coordinates to the processor 580, and can receive and execute commands sent by the processor 580. In addition, the touch panel 531 may be implemented by various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the touch panel 531, the input module or unit 530 may include other input devices 532. In particular, other input devices 532 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.
The Display module or unit 540 may include a Display panel 541, and optionally, the Display panel 541 may be configured in the form of a liquid Crystal Display (L liquid Crystal Display, L CD), an Organic light-Emitting Diode (O L ED), and the like, further, the touch panel 531 may cover the Display panel 541, and when the touch panel 531 detects a touch operation on or near the touch panel 531, the touch panel 531 may transmit the touch operation to the processor 580 to determine the type of the touch event, and then the processor 580 may provide a corresponding visual output on the Display panel 541 according to the type of the touch event.
The handset may also include at least one sensor 550, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 541 according to the brightness of ambient light, and the proximity sensor may turn off the display panel 541 and/or the backlight when the mobile phone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.
Audio circuitry 560, speaker 561, and microphone 562 may provide an audio interface between a user and a cell phone. The audio circuit 560 may transmit the electrical signal converted from the received audio data to the speaker 561, and convert the electrical signal into a sound signal by the speaker 561 for output; on the other hand, the microphone 562 converts the collected sound signals into electrical signals, which are received by the audio circuit 560 and converted into audio data, which are then processed by the audio data output processor 580, and then passed through the RF circuit 510 to be sent to, for example, another cellular phone, or output to the memory 520 for further processing.
WiFi belongs to short distance wireless transmission technology, and the mobile phone can help the user to send and receive e-mail, browse web pages, access streaming media, etc. through the WiFi module 570, which provides wireless broadband internet access for the user. Although fig. 5 shows the WiFi module 570, it is understood that it does not belong to the essential constitution of the handset, and can be omitted entirely as needed within the scope not changing the essence of the invention.
The processor 580 is a control center of the mobile phone, connects various parts of the entire mobile phone by using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 520 and calling data stored in the memory 520, thereby performing overall monitoring of the mobile phone. Alternatively, processor 580 may include one or more processing modules or units; preferably, the processor 580 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 580.
The handset also includes a power supply 590 (e.g., a battery) for powering the various components, which may preferably be logically coupled to the processor 580 via a power management system, such that the power management system may be used to manage charging, discharging, and power consumption.
Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which are not described herein.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the method, the apparatus, and the mobile terminal described above may refer to corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus and mobile terminal may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions in actual implementation, for example, a plurality of modules or units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules or units, and may be in an electrical, mechanical or other form.
The modules or units described as separate parts may or may not be physically separate, and parts displayed as modules or units may or may not be physical modules or units, may be located in one place, or may be distributed on a plurality of network modules or units. Some or all of the modules or units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, each functional module or unit in each embodiment of the present invention may be integrated into one processing module or unit, each module or unit may exist alone physically, or two or more modules or units may be integrated into one module or unit. The integrated modules or units may be implemented in the form of hardware, or may be implemented in the form of software functional modules or units.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, or the like.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by hardware that is instructed to implement by a program, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
While the method, the apparatus and the mobile terminal for processing information provided by the present invention have been described in detail above, for those skilled in the art, according to the idea of the embodiment of the present invention, there may be variations in the specific implementation and application scope, and in summary, the content of the present description should not be construed as limiting the present invention.
In summary, the technical scheme provided by the invention is summarized as follows:
a1, a method for information processing, comprising:
receiving voice data when a voice monitoring mode is started;
acquiring voice characteristics in the voice data content information;
matching the voice features with a predefined voice data model, wherein the predefined voice data model is generated according to voice data pre-recorded by a user;
and if the contact person is matched with the contact person, sending specified information to a preset contact person.
A2, the method as in A1, the voice characteristics including pitch, intensity, duration, and timbre of the voice.
A3, the method according to A1, further comprising,
splitting the voice data into a single element and a single element of a predefined voice data model for recognition, and discarding the voice data if the recognition error is not less than a preset value;
the recognition error value is preset at 35%.
A4, the method according to A1 or A3, wherein the predefined speech data model is generated according to speech data pre-entered by a user, comprising,
pre-inputting user voice to generate first voice data;
extracting voice characteristics according to the content information of the first voice data;
converting the voice characteristics of the first voice data content information into digital information, and splitting the digital information into a plurality of component elements;
recombining the plurality of constituent elements into second voice data;
the first speech data and the second speech data are formed into a predefined speech data model.
A5, the method of A1, the matching the speech features with predefined speech data models, comprising,
judging whether the matching ambiguity of the voice features and a predefined voice data model is greater than a preset value, and if so, confirming the matching;
the preset value of the degree of blur is 65%.
A6, the method of A1, the matching the speech features with predefined speech data models, comprising,
judging whether the ambiguity of the voice features and the matching according to the predefined voice data model is greater than a preset value, and if not, confirming that the ambiguity is not matched;
and if the confirmation is not matched, discarding the voice data.
A7, method according to A1,
and if the contact person is matched with the contact person, sending specified information to the preset contact person, including,
if the contact person information is matched with the preset contact person information, triggering and awakening the preset application program to send the specified information to the preset contact person;
the triggering and awakening are that a triggering instruction is transmitted to a preset application program in an intent mode, and the preset application program is awakened at the same time.
A8, the method according to a1 or a7, wherein if matching, sending the designated information to the predetermined contact further comprises:
recording an environment sound;
calling a GPS to position the current position;
adding the GPS positioning information and the environmental record into the designated information;
and triggering and awakening a preset application program to send the specified information to a preset contact person.
B9, an information processing apparatus comprising:
the monitoring module is used for monitoring voice to obtain voice data;
the receiving module is used for receiving the voice data;
the acquisition module is used for acquiring the voice characteristics in the voice data content information;
the matching module is used for matching the voice characteristics with a predefined voice data model, wherein the predefined voice data model is generated according to voice data which is pre-recorded by a user;
and the processing module is used for sending the specified information to the preset contact person when the matching is determined.
B10, the device as B9, the voice characteristics include pitch, tone intensity, duration and tone of voice.
B11, the apparatus as in B9, after receiving the voice data, further comprising,
splitting the voice data into a single element and a single element of a predefined voice data model for recognition, and discarding the voice data if the recognition error is not less than a preset value;
the recognition error value is preset at 35%.
B12, device as described in B9 or B11,
the device also comprises a storage module used for storing a predefined voice data model, wherein the predefined voice data model is generated according to voice data pre-recorded by a user and comprises,
pre-inputting user voice to generate first voice data;
extracting voice characteristics according to the content information of the first voice data;
converting the voice characteristics of the first voice data content information into digital information, and splitting the digital information into a plurality of component elements;
recombining the plurality of constituent elements into second voice data;
the first speech data and the second speech data are formed into a predefined speech data model.
B13, the apparatus of B9, the matching module matching the speech features to predefined speech data models, including,
judging whether the matching ambiguity of the voice features and the predefined voice data model is greater than a preset value, if so, confirming the matching,
the preset value of the degree of blur is 65%.
B14, the apparatus of B9, the matching module matching the speech features to predefined speech data models, including,
judging whether the ambiguity of the voice features and the matching according to the predefined voice data model is greater than a preset value, and if not, confirming that the ambiguity is not matched;
and if the confirmation is not matched, discarding the voice data.
B15, the apparatus of claim B9, the processing module sending specific information to the predetermined contact if there is a match, including,
if the contact person information is matched with the preset contact person information, triggering and awakening the preset application program to send the specified information to the preset contact person;
the triggering and awakening are that a triggering instruction is transmitted to a preset application program in an intent mode, and the preset application program is awakened at the same time.
B16, the device of B9, further comprising,
the recording module is used for recording the environmental sound in real time;
the calling module is used for calling the GPS to position the current position;
the information module is used for adding the environmental recording and the GPS positioning information into the specified information;
the setting module is used for presetting at least more than one contact person to receive the designated information;
and the triggering module is used for triggering and awakening a preset application program to send the specified information to a preset contact person.
C17, a mobile terminal, comprising:
presetting a function key, a processor, a memory and a microphone;
the function keys are used for a user to trigger and generate an operation instruction in a screen locking state;
the microphone is used for receiving a control instruction and starting or ending recording;
the memory is used for storing a program for supporting the sound recording device in the screen locking state to execute the information processing method of any one of the items A1 to A8;
the processor is configured to execute programs stored in the memory.

Claims (13)

1. A method of information processing, comprising,
when the terminal is started, entering a voice monitoring mode, and receiving voice data when the voice monitoring mode is started;
splitting the voice data into a single element and a single element of a predefined voice data model for recognition, and discarding the voice data if the recognition error is not less than a preset value; the preset identification error value is 35%;
if the recognition error is smaller than the preset value, acquiring the voice characteristics in the voice data content information;
matching the voice characteristics with a predefined voice data model, wherein the predefined voice data model is generated according to first voice data and second voice data which are pre-recorded by a user; the second voice data is obtained by splitting and recombining the characteristics of the first voice data;
if the matching is carried out, the environment sound is input; calling a GPS to position the current position; adding the GPS positioning information and the environmental record into the designated information; and triggering and awakening a preset application program to send the specified information to a preset contact.
2. The method of claim 1, wherein the speech features include pitch, duration, and timbre of speech.
3. The method of claim 1, wherein the predefined speech data model is generated from first speech data and second speech data previously entered by a user, including,
pre-inputting user voice to generate first voice data;
extracting voice characteristics according to the content information of the first voice data;
converting the voice characteristics of the first voice data content information into digital information, and splitting the digital information into a plurality of component elements;
recombining the plurality of constituent elements into second voice data;
the first speech data and the second speech data are formed into a predefined speech data model.
4. The method of claim 1, wherein said matching the speech features to a predefined speech data model comprises,
judging whether the matching ambiguity of the voice features and a predefined voice data model is greater than a preset value, and if so, confirming the matching;
the preset value of the degree of blur is 65%.
5. The method of claim 1, wherein said matching the speech features to a predefined speech data model comprises,
judging whether the ambiguity of the voice features and the matching according to the predefined voice data model is greater than a preset value, and if not, confirming that the ambiguity is not matched;
and if the confirmation is not matched, discarding the voice data.
6. The method of claim 1, further comprising,
if the contact person information is matched with the preset contact person information, triggering and awakening the preset application program to send the specified information to the preset contact person;
the triggering and awakening are that a triggering instruction is transmitted to a preset application program in an intent mode, and the preset application program is awakened at the same time.
7. An information processing apparatus, comprising,
the monitoring module is used for entering a voice monitoring mode when the terminal is started, and monitoring voice to acquire voice data;
the receiving module is used for receiving the voice data;
the recognition module is used for splitting the voice data into a single element and a single element of a predefined voice data model for recognition, and if the recognition error is not less than a preset value, the voice data is discarded; the preset identification error value is 35%;
the acquisition module is used for acquiring the voice characteristics in the voice data content information if the recognition error is smaller than the preset value;
the matching module is used for matching the voice characteristics with a predefined voice data model, wherein the predefined voice data model is generated according to first voice data and second voice data which are input by a user in advance; the second voice data is obtained by splitting and recombining the characteristics of the first voice data;
the setting module is used for presetting at least more than one contact person to receive the designated information;
the processing module is used for recording environment sounds when the matching is determined; calling a GPS to position the current position; adding the GPS positioning information and the environmental record into the designated information; and triggering and awakening a preset application program to send the specified information to a preset contact.
8. The apparatus of claim 7, wherein the voice characteristics include pitch, intensity, duration, and timbre of the voice.
9. The apparatus of claim 7, further comprising a storage module for storing a predefined voice data model generated from first voice data and second voice data pre-entered by a user, including,
pre-inputting user voice to generate first voice data;
extracting voice characteristics according to the content information of the first voice data;
converting the voice characteristics of the first voice data content information into digital information, and splitting the digital information into a plurality of component elements;
recombining the plurality of constituent elements into second voice data;
the first speech data and the second speech data are formed into a predefined speech data model.
10. The apparatus of claim 7, wherein the matching module matches the speech features to a predefined speech data model, comprising,
judging whether the matching ambiguity of the voice features and the predefined voice data model is greater than a preset value, if so, confirming the matching,
the preset value of the degree of blur is 65%.
11. The apparatus of claim 7, wherein the matching module matches the speech features to a predefined speech data model, comprising,
judging whether the ambiguity of the voice features and the matching according to the predefined voice data model is greater than a preset value, and if not, confirming that the ambiguity is not matched;
and if the confirmation is not matched, discarding the voice data.
12. The apparatus of claim 7, wherein the processing module is further to,
if the contact person information is matched with the preset contact person information, triggering and awakening the preset application program to send the specified information to the preset contact person;
the triggering and awakening are that a triggering instruction is transmitted to a preset application program in an intent mode, and the preset application program is awakened at the same time.
13. A mobile terminal is characterized by comprising preset function keys, a processor, a memory and a microphone;
the function keys are used for a user to trigger and generate an operation instruction in a screen locking state;
the microphone is used for receiving a control instruction and starting or ending recording;
the memory is used for storing a program for the recording device to execute the information processing method of any one of claims 1 to 6 in a state of supporting screen locking;
the processor is configured to execute programs stored in the memory.
CN201611069130.6A 2016-11-28 2016-11-28 Information processing method and device and mobile terminal Active CN106791010B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611069130.6A CN106791010B (en) 2016-11-28 2016-11-28 Information processing method and device and mobile terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611069130.6A CN106791010B (en) 2016-11-28 2016-11-28 Information processing method and device and mobile terminal

Publications (2)

Publication Number Publication Date
CN106791010A CN106791010A (en) 2017-05-31
CN106791010B true CN106791010B (en) 2020-07-10

Family

ID=58904160

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611069130.6A Active CN106791010B (en) 2016-11-28 2016-11-28 Information processing method and device and mobile terminal

Country Status (1)

Country Link
CN (1) CN106791010B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107483069A (en) * 2017-07-05 2017-12-15 广东小天才科技有限公司 Voice help-asking method, apparatus, equipment and the storage medium of Wearable
CN107371085B (en) * 2017-09-01 2020-01-21 深圳市沃特沃德股份有限公司 Safety protection method and device and intelligent sound box
CN107645607A (en) * 2017-10-12 2018-01-30 上海展扬通信技术有限公司 A kind of computer-readable recording medium and mobile terminal for voice-control alarm

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102739863A (en) * 2012-06-14 2012-10-17 中兴通讯股份有限公司 Emergency call method and device
CN103366737A (en) * 2012-03-30 2013-10-23 株式会社东芝 An apparatus and a method for using tone characteristics in automatic voice recognition

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077714B (en) * 2013-01-29 2015-07-08 华为终端有限公司 Information identification method and apparatus
CN104575496A (en) * 2013-10-14 2015-04-29 中兴通讯股份有限公司 Method and device for automatically sending multimedia documents and mobile terminal
CN105321514A (en) * 2014-05-28 2016-02-10 西安中兴新软件有限责任公司 Alarm method and terminal
CN105991820A (en) * 2015-02-02 2016-10-05 西安酷派软件科技有限公司 Terminal control method and device
CN105744064A (en) * 2016-01-29 2016-07-06 宇龙计算机通信科技(深圳)有限公司 Automatic communication help calling method, automatic communication help calling and terminal
CN105721700A (en) * 2016-02-22 2016-06-29 上海理工大学 Voice help-seeking system based on Android phone

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103366737A (en) * 2012-03-30 2013-10-23 株式会社东芝 An apparatus and a method for using tone characteristics in automatic voice recognition
CN102739863A (en) * 2012-06-14 2012-10-17 中兴通讯股份有限公司 Emergency call method and device

Also Published As

Publication number Publication date
CN106791010A (en) 2017-05-31

Similar Documents

Publication Publication Date Title
US11670302B2 (en) Voice processing method and electronic device supporting the same
US10909982B2 (en) Electronic apparatus for processing user utterance and controlling method thereof
EP3392877B1 (en) Device for performing task corresponding to user utterance
CN108021572B (en) Reply information recommendation method and device
CN110890093B (en) Intelligent equipment awakening method and device based on artificial intelligence
US20190267001A1 (en) System for processing user utterance and controlling method thereof
CN108806669B (en) Electronic device for providing voice recognition service and method thereof
CN110570840B (en) Intelligent device awakening method and device based on artificial intelligence
CN108694944B (en) Method and apparatus for generating natural language expressions by using a framework
US11537360B2 (en) System for processing user utterance and control method of same
CN108984535B (en) Statement translation method, translation model training method, device and storage medium
US20190019509A1 (en) Voice data processing method and electronic device for supporting the same
CN108877780B (en) Voice question searching method and family education equipment
CN106791010B (en) Information processing method and device and mobile terminal
CN111723855A (en) Learning knowledge point display method, terminal equipment and storage medium
KR20180081922A (en) Method for response to input voice of electronic device and electronic device thereof
CN110717026A (en) Text information identification method, man-machine conversation method and related device
CN111597804B (en) Method and related device for training entity recognition model
CN111738100A (en) Mouth shape-based voice recognition method and terminal equipment
CN114283793A (en) Voice wake-up method, device, electronic equipment, medium and program product
KR20180108321A (en) Electronic device for performing an operation for an user input after parital landing
CN112017670B (en) Target account audio identification method, device, equipment and medium
CN111522592A (en) Intelligent terminal awakening method and device based on artificial intelligence
CN112086094A (en) Method for correcting pronunciation, terminal equipment and computer readable storage medium
CN111638788A (en) Learning data output method and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20170720

Address after: 100102, 18 floor, building 2, Wangjing street, Beijing, Chaoyang District, 1801

Applicant after: BEIJING ANYUN SHIJI SCIENCE AND TECHNOLOGY CO., LTD.

Address before: 100088 Beijing city Xicheng District xinjiekouwai Street 28, block D room 112 (Desheng Park)

Applicant before: Beijing Qihu Technology Co., Ltd.

GR01 Patent grant
GR01 Patent grant