CN111081275A

CN111081275A - Terminal processing method and device based on sound analysis, storage medium and terminal

Info

Publication number: CN111081275A
Application number: CN201911325074.1A
Authority: CN
Inventors: 李岩
Original assignee: Huizhou TCL Mobile Communication Co Ltd
Current assignee: Huizhou TCL Mobile Communication Co Ltd
Priority date: 2019-12-20
Filing date: 2019-12-20
Publication date: 2020-04-28
Anticipated expiration: 2039-12-20
Also published as: CN111081275B

Abstract

The embodiment of the application discloses a terminal processing method and device based on sound analysis, a storage medium and a terminal. The method comprises the following steps: collecting sound information under a current scene; analyzing the sound information to obtain an analysis result; determining the type of the current scene based on the analysis result; and executing the operation instruction corresponding to the type of the current scene. According to the scheme, the current scene state is intelligently identified, the operation instruction of the corresponding service is executed, the intelligence of the terminal is improved, and convenience is brought to a user.

Description

Terminal processing method and device based on sound analysis, storage medium and terminal

Technical Field

The present application relates to the field of terminal technologies, and in particular, to a terminal processing method and apparatus based on voice analysis, a storage medium, and a terminal.

Background

The artificial intelligence technology is increasingly popularized to intelligent terminal equipment, and new intelligent experience is brought to consumers. The core of the artificial intelligence technology is to enable a machine to establish the intelligent decision-making capability through a large amount of data learning, so that the artificial intelligence technology is applied to the daily life of a user and brings convenience to the user.

Disclosure of Invention

The embodiment of the application provides a terminal processing method and device based on sound analysis, a storage medium and a terminal, and can improve the intelligence of the terminal.

In a first aspect, an embodiment of the present application provides a terminal processing method based on voice analysis, including:

collecting sound information under a current scene;

analyzing the sound information to obtain an analysis result;

determining the type of the current scene based on the analysis result;

and executing the operation instruction corresponding to the type of the current scene.

In some embodiments, the determining the type of the current scene based on the analysis result includes:

determining the type of the current scene from a plurality of sample scene types according to the analysis result, wherein the plurality of sample scene types at least comprise: noisy scenes, dangerous scenes, and quiet scenes.

In some embodiments, the executing the operation instruction corresponding to the type of the current scene includes:

if the type of the current scene is a noisy scene, acquiring a current geographic position;

determining whether the geographic location is within a preset area;

if so, adjusting the current information reminding mode to vibration reminding.

In some embodiments, after determining that the geographic location is within the preset area, the method further comprises:

and when detecting that the terminal plays the first audio signal, carrying out noise reduction processing on the first audio signal.

if the type of the current scene is a dangerous scene, acquiring current environment image information;

determining the position coverage range of the dangerous area based on the current geographic position and the environment image information;

generating route guidance information according to the position coverage range;

and performing information prompt on the user according to the route guidance information.

if the type of the current scene is a quiet scene, when a second audio signal to be played is detected, adjusting a volume parameter value of the second audio signal to be lower than the current volume parameter value.

In some embodiments, the analyzing the sound information to obtain an analysis result includes:

carrying out feature extraction on the sound information to obtain sound features;

identifying the sound information according to the sound characteristics;

determining the sound type, the number of sound types, the number of sound sources and the sound characteristic parameter contained in the sound information based on the identification result;

and generating the analysis result at least according to the sound type, the sound type quantity, the sound source quantity and the sound characteristic parameter.

In a second aspect, an embodiment of the present application provides a terminal processing apparatus based on voice analysis, including:

the acquisition unit is used for acquiring sound information under the current scene;

the analysis unit is used for analyzing the sound information to obtain an analysis result;

a determining unit, configured to determine a type to which a current scene belongs based on the analysis result;

and the processing unit is used for executing the operation instruction corresponding to the type of the current scene.

In some embodiments, the determining unit is to:

In some embodiments, the processing unit is to:

determining whether the geographic location is within a preset area;

if so, adjusting the current information reminding mode to vibration reminding.

In some embodiments, the processing unit is further to:

after the geographic position is determined to be in the preset area, when a first audio signal played by the terminal is detected, carrying out noise reduction processing on the first audio signal

In some embodiments, the processing unit is to:

generating route guidance information according to the position coverage range;

In some embodiments, the processing unit is to:

In some embodiments, the analysis unit comprises:

the extraction subunit is used for carrying out feature extraction on the sound information to obtain sound features;

the identification subunit is used for identifying the sound information according to the sound characteristics;

a determining subunit configured to determine, based on the recognition result, a sound type, a number of sound types, a number of sound sources, and a sound characteristic parameter included in the sound information;

and the generating subunit is used for generating the analysis result at least according to the sound type, the sound type quantity, the sound source quantity and the sound characteristic parameter.

In a third aspect, the present application further provides a computer-readable storage medium, where a plurality of instructions are stored, where the instructions are adapted to be loaded by a processor to execute the above terminal processing method based on sound analysis.

In a fourth aspect, an embodiment of the present application further provides a terminal, including a processor and a memory, where the processor is electrically connected to the memory, the memory is used to store instructions and data, and the processor is used to execute the terminal processing method based on sound analysis.

In the implementation of the application, the analysis result is obtained by collecting the sound information in the current scene and analyzing the sound information. Then, the type to which the current scene belongs is determined based on the analysis result, and an operation instruction corresponding to the type to which the current scene belongs is executed. According to the scheme, the current scene state is intelligently identified, the operation instruction of the corresponding service is executed, the intelligence of the terminal is improved, and convenience is brought to a user.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a terminal processing method based on voice analysis according to an embodiment of the present application.

Fig. 2 is a scene schematic diagram of a terminal processing method based on sound analysis according to an embodiment of the present application.

Fig. 3 is a schematic structural diagram of a terminal processing device based on sound analysis according to an embodiment of the present application.

Fig. 4 is another schematic structural diagram of a terminal processing device based on sound analysis according to an embodiment of the present application.

Fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present application.

Fig. 6 is another schematic structural diagram of a terminal according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides a terminal processing method and device based on sound analysis, a storage medium and a terminal. The details will be described below separately.

In an embodiment, a terminal processing method based on voice analysis is provided, and is applied to terminal devices such as smart phones, tablet computers, and notebook computers. Referring to fig. 1, a specific flow of the terminal processing method based on the sound analysis may be as follows:

101. and collecting sound information under the current scene.

Sound is a sound wave generated by the vibration of an object, a wave phenomenon that propagates through a medium (air or solid, liquid) and can be perceived by the human or animal auditory organs. Where sound is generated by the vibration of an object, which sound propagates in the form of waves, the object being sounded is called the sound source. The number of times an object vibrates within one second is called the frequency, and is given in hertz (Hz). The human ear can hear sounds at 20 Hz-20000 Hz, the most sensitive is sounds between 1000Hz and 3000 Hz.

In the embodiment of the application, a microphone is arranged in the terminal device, and the terminal can specifically collect sound information in the external environment through the microphone.

In some embodiments, the microphone may also be an external device, which may establish a wireless link with a terminal device, and send the collected sound information to the terminal device through the wireless link, so that the terminal obtains the sound information in the current scene.

102. And analyzing the sound information to obtain an analysis result.

In this embodiment, an AI (Artificial Intelligence) technology needs to be utilized in advance to mark and learn common environmental sound data, so that the terminal device can recognize the current environmental sound.

Specifically, firstly, inputting common sound source data to an AI algorithm model for model training, marking common sound classification, and identifying different sound sources generating sound. After a large amount of learning, the terminal equipment has the capability of recognizing various different sounds.

In practical applications, the sound information may be analyzed and processed by a neural network method, a hidden markov model method, a VQ clustering method, a polynomial classifier method, or the like. For example, in some embodiments, the step "analyzing the sound information to obtain the analysis result" may include the following processes:

(11) carrying out feature extraction on the sound information to obtain sound features;

(12) recognizing the sound information according to the sound characteristics;

(13) determining the sound type, the number of sound types, the number of sound sources and the sound characteristic parameter contained in the sound information based on the identification result;

(14) and generating an analysis result at least according to the sound type, the sound type quantity, the sound source quantity and the sound characteristic parameter.

Specifically, the collected sound information may be input into a trained AI algorithm model to extract features such as frequency spectrum, velocity, timbre, pitch, volume, loudness, and the like from the collected sound information. Then, the voice is classified and identified according to the extracted features, so that all voices are identified from the current voice information, and an identification result is obtained. Further, the type of sound, the number of types of sound, the number of sound sources, and the sound characteristic parameter included in the sound information are determined from the recognition result, and an analysis result is generated based on the obtained information.

The sound category may include sounds generated by different sound sources, such as human speech, animal call, wind, automobile whistle, ringing, etc. The sound characteristic parameters may include parameter information such as timbre, volume, loudness, and the like.

In specific implementation, before feature extraction is performed on the sound information, denoising processing may be performed on the sound information in advance to remove some atypical (i.e., non-classifiable) sounds, so as to improve the accuracy of the subsequent sound analysis result.

103. And determining the type of the current scene based on the analysis result.

Specifically, in some embodiments, the step "determining the type of the current scene based on the analysis result" may include the following steps:

In this embodiment, a plurality of sample scenes need to be preset, and the types of sounds that can be included in different scenes are different. Therefore, the sound composition components in different sample scenes need to be prepared in advance. Then, the type of the scene to which the current scene belongs is determined from the constructed sample scenes based on the analysis result.

In this embodiment, several typical scenes may be constructed in advance, such as a noisy scene (containing various different types of sounds and a certain number of sound sources, such as sounds generated in the environments of downtown, dish market, and the like), a quiet scene (with lower sound decibel), and a dangerous scene (containing special sounds generated by special sound sources, such as explosion sounds).

104. And executing the operation instruction corresponding to the type of the current scene.

Specifically, with reference to fig. 1 and fig. 2, the operation instruction to be executed is different according to different scene types in the actual application.

In some embodiments, if the type to which the current scene belongs is a noisy scene, the step "execute an operation instruction corresponding to the type to which the current scene belongs" may include the following steps:

(21) acquiring a current geographic position;

(22) determining whether the geographic location is within a preset area;

(23) if so, adjusting the current information reminding mode to vibration reminding.

Specifically, a Positioning device may be disposed in the terminal device, and the current geographic location information may be obtained by using Positioning technologies such as GPS (Global Positioning System), wifi (Wireless Fidelity), and bluetooth Positioning. And determining whether the current scene is in a preset area with generally noisy sounds such as a downtown place, a restaurant and a dish market by combining the current geographic position and the current scene type. If the terminal is in the preset area, sound prompt information such as incoming call ringing, message reminding and the like of the terminal is difficult to perceive due to noisy sound of the current scene. Therefore, the information reminding mode of the current terminal can be changed into a vibration reminding mode, so that the user can more easily perceive the state of the terminal.

Of course, if the current information reminding mode is vibration reminding, the adjustment mode is not needed.

In some embodiments, after the geographic location is determined to be within the preset area, if it is detected that the terminal plays the first audio signal, noise reduction processing may be further performed on the first audio signal, so that the user can hear specific content of the audio more clearly.

The first audio signal may specifically be an audio signal generated when the terminal performs a voice call or a video call with another terminal, or an audio signal carried in a voice message sent by a terminal of the other party and received by the terminal.

In some embodiments, if the type to which the current scene belongs is a dangerous scene, the step "execute an operation instruction corresponding to the type to which the current scene belongs" may include the following steps:

(31) acquiring current environment image information;

(32) determining the position coverage range of the dangerous area based on the current geographic position and the environmental image information;

(33) generating route guidance information according to the position coverage range;

(34) and performing information prompt on the user according to the route guidance information.

Specifically, a camera can be arranged in the terminal, and the terminal can acquire the current external environment image information by opening the camera. In addition, the current geographical position is determined through the positioning function of the terminal, and the position coverage of the dangerous area is determined by combining the acquired external environment image, the current geographical position and the analysis result (mainly sound loudness, sound type and the like).

After the coverage area of the dangerous area is determined, route guidance information can be generated by combining road condition information, building information and the like under the current environment so as to prompt the user to be guided to transfer to a safe area, and the safety of the user is guaranteed.

In some embodiments, if the type of the current scene is a quiet scene, when the operation instruction corresponding to the type of the current scene is executed, specifically, if the second audio signal to be played is detected, the volume parameter value of the second audio signal is adjusted to be lower than the current volume parameter value, so as to reduce the volume of the current audio playing, avoid generating excessive sound in the quiet environment, and do not need a user to manually adjust the volume, which is simple and convenient.

As can be seen from the above, the terminal processing method based on sound analysis provided in this embodiment acquires sound information in a current scene, and analyzes the sound information to obtain an analysis result. Then, the type to which the current scene belongs is determined based on the analysis result, and an operation instruction corresponding to the type to which the current scene belongs is executed. According to the scheme, the current scene state is intelligently identified, the operation instruction of the corresponding service is executed, the intelligence of the terminal is improved, and convenience is brought to a user.

In another embodiment of the present application, a terminal processing apparatus based on voice analysis is further provided, where the terminal processing apparatus based on voice analysis may be integrated in a terminal in a form of software or hardware, and the terminal may specifically include a mobile phone, a tablet computer, a notebook computer, and the like. As shown in fig. 3, the terminal processing device 300 based on sound analysis may include: an acquisition unit 301, an analysis unit 302, a determination unit 303 and a processing unit 304, wherein:

the acquisition unit 301 is configured to acquire sound information in a current scene;

an analysis unit 302, configured to analyze the sound information to obtain an analysis result;

a determining unit 303, configured to determine a type to which the current scene belongs based on the analysis result;

and the processing unit 304 is used for executing the operation instruction corresponding to the type of the current scene.

In some embodiments, the determining unit 303 may be configured to:

In some embodiments, the processing unit 304 may be configured to:

determining whether the geographic location is within a preset area;

if so, adjusting the current information reminding mode to vibration reminding.

In some embodiments, the processing unit 304 may be further configured to:

after the geographic position is determined to be in the preset area, when a first audio signal played by the terminal is detected, noise reduction processing is carried out on the first audio signal.

In some embodiments, the processing unit 304 may be configured to:

generating route guidance information according to the position coverage range;

In some embodiments, the processing unit 304 may be configured to:

Referring to fig. 4, in some embodiments, the analysis unit 302 may include:

an extracting subunit 3021, configured to perform feature extraction on the sound information to obtain sound features;

an identifying subunit 3022, configured to identify the sound information according to the sound feature;

a determining subunit 3023 configured to determine the sound type, the number of sound types, the number of sound sources, and the sound characteristic parameter included in the sound information based on the recognition result;

a generating subunit 3024, configured to generate the analysis result according to at least the sound type, the number of sound types, the number of sound sources, and a sound characteristic parameter.

As can be seen from the above, the terminal processing device based on sound analysis provided in the embodiment of the present application acquires sound information in a current scene, and analyzes the sound information to obtain an analysis result. Then, the type to which the current scene belongs is determined based on the analysis result, and an operation instruction corresponding to the type to which the current scene belongs is executed. According to the scheme, the current scene state is intelligently identified, the operation instruction of the corresponding service is executed, the intelligence of the terminal is improved, and convenience is brought to a user.

In another embodiment of the present application, a terminal is further provided, where the terminal may be a terminal device such as a smart phone and a tablet computer. As shown in fig. 5, the terminal 400 includes a processor 401 and a memory 402. The processor 401 is electrically connected to the memory 402.

The processor 401 is a control center of the terminal 400, connects various parts of the entire terminal using various interfaces and lines, and performs various functions of the terminal and processes data by running or loading an application stored in the memory 402 and calling data stored in the memory 402, thereby performing overall monitoring of the terminal.

In this embodiment, the processor 401 in the terminal 400 loads instructions corresponding to one or more application processes into the memory 402 according to the following steps, and the processor 401 runs the application stored in the memory 402, thereby implementing various functions:

collecting sound information under a current scene;

analyzing the sound information to obtain an analysis result;

determining the type of the current scene based on the analysis result;

In some embodiments, in determining the type of the current scene based on the analysis result, processor 401 further performs the steps of:

In some embodiments, when executing the operation instruction corresponding to the type to which the current scene belongs, the processor 401 further performs the following steps:

determining whether the geographic location is within a preset area;

if so, adjusting the current information reminding mode to vibration reminding.

In some embodiments, after determining that the geographic location is within the preset area, processor 401 further performs the steps of:

In some embodiments, if the type of the current scene is a dangerous scene, the processor 401 further performs the following steps:

acquiring current environment image information;

generating route guidance information according to the position coverage range;

In some embodiments, if the type of the current scene is a quiet scene, when the second audio signal to be played is detected, the processor 401 further performs the following steps:

and adjusting the volume parameter value of the second audio signal to be lower than the current volume parameter value.

In some embodiments, when analyzing the sound information to obtain an analysis result, the processor 401 further performs the following steps:

identifying the sound information according to the sound characteristics;

The memory 402 may be used to store applications and data. The memory 402 stores applications containing instructions executable in the processor. Applications may constitute various functional modules. The processor 401 executes various functional applications and terminal processing based on sound analysis by running applications stored in the memory 402.

In some embodiments, as shown in fig. 6, the terminal 400 further includes: display 403, control circuitry 404, radio frequency circuitry 405, microphone 406, sensor 408, and power supply 409. The processor 401 is electrically connected to the display 403, the control circuit 404, the rf circuit 405, the microphone 406, the camera 407, the sensor 408, and the power source 409.

The display screen 403 may be used to display information input by or provided to the user as well as various graphical user interfaces of the terminal, which may be constituted by images, text, icons, video, and any combination thereof.

The control circuit 404 is electrically connected to the display 403, and is configured to control the display 403 to display information.

The rf circuit 405 is used for transceiving rf signals to establish wireless communication with a terminal or other terminals through wireless communication, and to transceive signals with a server or other terminals.

A microphone (i.e., microphone) 406, and an energy conversion device for detecting a sound signal in the external environment and converting the sound signal into an electrical signal. Among them, microphones can be classified into two types, an electrodynamic microphone and a capacitance microphone, according to their transduction principle.

The camera 407 may be used to collect image information. The camera may be a single camera with one lens, or may have two or more lenses.

The sensor 408 is used to collect external environmental information. The sensors 408 may include ambient light sensors, acceleration sensors, light sensors, motion sensors, and other sensors.

The power supply 409 is used to power the various components of the terminal 400. In some embodiments, the power source 409 may be logically connected to the processor 401 through a power management system, so that functions of managing charging, discharging, and power consumption are implemented through the power management system.

Although not shown in fig. 6, the terminal 400 may further include a speaker, a bluetooth module, and the like, which will not be described in detail herein.

Therefore, the terminal provided by the embodiment of the application acquires the sound information in the current scene and analyzes the sound information to obtain the analysis result. Then, the type to which the current scene belongs is determined based on the analysis result, and an operation instruction corresponding to the type to which the current scene belongs is executed. According to the scheme, the current scene state is intelligently identified, the operation instruction of the corresponding service is executed, the intelligence of the terminal is improved, and convenience is brought to a user.

In some embodiments, a computer-readable storage medium is also provided, having stored therein a plurality of instructions adapted to be loaded by a processor to perform any of the above-described voice analysis based terminal processing methods.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

The terminal processing method, device, storage medium and terminal based on voice analysis provided by the embodiments of the present application are described in detail above, and a specific example is applied in the present application to explain the principle and implementation of the present application, and the description of the above embodiments is only used to help understanding the method and core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A terminal processing method based on sound analysis is characterized by comprising the following steps:

collecting sound information under a current scene;

analyzing the sound information to obtain an analysis result;

determining the type of the current scene based on the analysis result;

2. The method according to claim 1, wherein the determining the type of the current scene based on the analysis result comprises:

3. The terminal processing method based on sound analysis according to claim 1, wherein the executing the operation instruction corresponding to the type of the current scene comprises:

determining whether the geographic location is within a preset area;

if so, adjusting the current information reminding mode to vibration reminding.

4. The terminal processing method based on sound analysis according to claim 3, further comprising, after determining that the geographical location is within a preset area:

5. The terminal processing method based on sound analysis according to claim 1, wherein the executing the operation instruction corresponding to the type of the current scene comprises:

generating route guidance information according to the position coverage range;

6. The terminal processing method based on sound analysis according to claim 1, wherein the executing the operation instruction corresponding to the type of the current scene comprises:

7. The terminal processing method based on voice analysis according to any one of claims 1 to 6, wherein the analyzing the voice information to obtain an analysis result comprises:

identifying the sound information according to the sound characteristics;

8. A terminal processing apparatus based on sound analysis, comprising:

9. A computer-readable storage medium, characterized in that a plurality of instructions are stored in the storage medium, said instructions being adapted to be loaded by a processor to perform the method for terminal processing based on sound analysis according to any of claims 1-7.

10. A terminal is characterized by comprising a processor and a memory, wherein the processor is electrically connected with the memory, and the memory is used for storing instructions and data; the processor is used for executing the terminal processing method based on the sound analysis of any one of claims 1-7.