CN112803828A

CN112803828A - Motor control method, control system and control chip

Info

Publication number: CN112803828A
Application number: CN202011622932.1A
Authority: CN
Inventors: 缪丽林; 李慧
Original assignee: Shanghai Awinic Technology Co Ltd
Current assignee: Shanghai Awinic Technology Co Ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-05-14
Anticipated expiration: 2040-12-31
Also published as: CN112803828B

Abstract

The invention provides a motor control method, a control system and a control chip, comprising the following steps: acquiring stereo audio data of a current running game; judging whether the stereo audio data contains specific voice or not; if the specific voice is contained, judging whether the specific voice is a first specific voice, wherein the specific voice comprises a gunshot, and the first specific voice comprises the own gunshot of the player currently running the game; if first specific pronunciation, playing during first specific pronunciation, the control motor vibrates to can only control the motor and produce the vibration when the player oneself shoots the gunshot, shield the mistake that other people shot the gunshot and brought and vibrate, strengthen man-machine interaction, promote the enjoyment of shooting type recreation gunshot vibration.

Description

Motor control method, control system and control chip

Technical Field

The present invention relates to the field of motor control technologies, and in particular, to a motor control method, a motor control system, and a motor control chip.

Background

The mobile phone has become a tool almost required by everyone as a symbol of the scientific and technological progress of the modern society, and the mobile phone game gradually becomes the daily leisure and entertainment activities of people. In the shooting type mobile phone game, a player controls the movement and shooting of a character to compete with other players, and in the process, the player can obtain certain game feedback from game sound played by a loudspeaker and can obtain more dimensional game feedback from the mobile phone along with the vibration of gunshot.

However, the prior art has the disadvantage that in the game competition process, not only the shooting gunshot of the player himself but also the shooting gunshot of other players can vibrate, and the requirement of the player for generating vibration only by shooting the gunshot of the player himself cannot be met.

Disclosure of Invention

In view of the above, the present invention provides a motor control method, a control system and a control chip to control a motor to generate corresponding vibration only when a player shoots a gunshot by himself.

In order to achieve the purpose, the invention provides the following technical scheme:

a motor control method comprising:

acquiring stereo audio data of a current running game;

judging whether the stereo audio data contains specific voice or not;

if the specific voice is contained, judging whether the specific voice is a first specific voice, wherein the specific voice comprises a gunshot, and the first specific voice comprises the own gunshot of the player currently running the game;

if the first specific voice is the first specific voice, controlling a motor to vibrate when the first specific voice is played.

Optionally, before determining whether the stereo audio data includes the specific voice, the method further includes:

dividing the stereo audio data into a first group of data, a second group of data and a third group of data, wherein the first group of data is the smallest absolute value of left channel data and right channel data in the stereo audio data, the second group of data is the left channel data in the stereo audio data, and the third group of data is the right channel data in the stereo audio data;

judging whether the stereo audio data contains the specific voice comprises the following steps: judging whether the stereo audio data contains specific voice according to the first group of data;

judging whether the specific voice is a first specific voice comprises the following steps: and judging whether the specific voice is the first specific voice according to the second group of data and the third group of data.

Optionally, before determining whether the stereo audio data includes a specific voice according to the first group of data, the method further includes:

dividing the first group of data, the second group of data and the third group of data into multi-frame data, wherein every N data is a frame, and N is a natural number greater than 1;

performing low-pass filtering processing on each frame of data in the first group of data, performing band-pass filtering processing on each frame of data in the second group of data and the third group of data, and reserving data of a frequency band where a specific voice is located in each frame of data;

and taking absolute values of N data in each frame of data in the first group of data, the second group of data and the third group of data after filtering processing, summing the absolute values, and calculating an average value of the N data in each frame of data in the first group of data, the second group of data and the third group of data.

Optionally, the determining whether the stereo audio data includes a specific voice according to the first group of data includes:

judging that the point of the ith frame data in the first group of data is a fast peak point, and if i is more than or equal to 0, the ith frame data contains the specific voice;

judging that the point of the ith frame data in the first group of data is not a fast peak point, but the point of the ith-1 frame data in the first group of data is a peak point, and if i is more than or equal to 1, the ith-1 frame data contains the specific voice;

if the i-th frame data or the i-1 th frame data contains the specific voice, the stereo audio data contains the specific voice.

Optionally, the determining that the point of the ith frame data in the first group of data is a fast peak point includes:

judging whether two conditions that the average value of the ith frame data in the first group of data is greater than or equal to a first preset value and the average value of the data between the ith frame data in the first group of data and the adjacent valley point before the ith frame data in the first group of data is less than or equal to the first preset value are met simultaneously, wherein i is greater than or equal to 0;

if yes, the ith frame data point in the first group of data is the fast peak point.

Optionally, the determining whether the wave trough point is a valley point comprises:

judging whether two conditions that the average value of the i-2 th frame data in the first group of data is larger than the average value of the i-1 th frame data and the average value of the i-1 th frame data is smaller than or equal to the average value of the i-1 th frame data are simultaneously satisfied, wherein i is larger than or equal to 2;

if yes, the point of the i-1 th frame data in the first group of data is judged to be a valley point.

Optionally, the determining whether the i-1 th frame data in the first group of data is a peak point includes:

judging whether two conditions that the average value of the i-2 th frame data in the first group of data is smaller than the average value of the i-1 th frame data and the average value of the i-1 th frame data is larger than or equal to the average value of the i-1 th frame data are simultaneously satisfied, wherein i is larger than or equal to 2;

if yes, the point of the i-1 th frame data in the first group of data is judged to be a peak point.

Optionally, the determining whether the specific voice is the first specific voice according to the second set of data and the third set of data includes:

judging whether the specific voice parameter of the jth frame data is smaller than a second preset value or not, if so, judging that the specific voice in the jth frame data is the first specific voice;

if the average value of the jth frame data in the second group of data is greater than or equal to the average value of the jth frame data in the third group of data, the specific voice parameter is equal to the ratio of the average value of the jth frame data in the second group of data to the average value of the jth frame data in the third group of data; if the average value of the jth frame data in the second group of data is smaller than the average value of the jth frame data in the third group of data, the specific voice parameter is equal to the ratio of the average value of the jth frame data in the third group of data to the average value of the jth frame data in the second group of data.

Optionally, after determining that the specific voice is the first specific voice, the method further includes:

and determining the vibration sense of the motor according to the average value of the j frame data in the first group of data so as to control the motor to vibrate in a corresponding vibration sense.

A motor control system comprising:

the audio acquisition module is used for acquiring the stereo audio data of the current running game;

a voice recognition module, configured to determine whether the stereo audio data includes a specific voice and whether the specific voice is a first specific voice, and output a control instruction, where the specific voice includes a gunshot, and the first specific voice includes a gunshot of a player currently running a game;

the motor driving chip is used for receiving the control instruction and controlling the motor to vibrate;

and the audio power amplifier module is used for receiving the stereo audio data acquired by the audio acquisition module and controlling a loudspeaker to play the stereo audio data.

Optionally, the voice recognition module is further configured to divide the stereo audio data into a first group of data, a second group of data, and a third group of data, where the first group of data is a smallest absolute value of left channel data and right channel data in the stereo audio data, the second group of data is left channel data in the stereo audio data, and the third group of data is right channel data in the stereo audio data, and determine whether the stereo audio data includes a specific voice according to the first group of data, and determine whether the specific voice is a first specific voice according to the second group of data and the third group of data.

Optionally, before determining whether the stereo audio data includes the specific voice according to the first group of data, the voice recognition module is further configured to divide the first group of data, the second group of data, and the third group of data into multiple frames of data, perform low-pass filtering on each frame of data in the first group of data, perform band-pass filtering on each frame of data in the second group of data and the third group of data, retain data of a frequency band in which the specific voice is located in each frame of data, sum absolute values of N pieces of data in each frame of data in the first group of data, the second group of data, and the third group of data after filtering, and calculate an average value of N pieces of data in each frame of data in the first group of data, the second group of data, and the third group of data;

wherein, every N data is a frame, and N is a natural number greater than 1.

Optionally, the speech recognition module is configured to determine that the stereo audio data includes the specific speech when i ≧ 0 is when the point of the ith frame data in the first group of data is a fast peak point, or determine that the stereo audio data includes the specific speech when i ≧ 1 is when the point of the ith frame data in the first group of data is not a fast peak point but i-1 frame data in the first group of data is a peak point.

Optionally, the speech recognition module is configured to determine that a point of an ith frame data in the first group of data is a fast peak point when an average value of the ith frame data in the first group of data is greater than or equal to a first preset value and average values of data between the ith frame data and an adjacent preceding valley point in the first group of data are less than or equal to the first preset value; i is greater than or equal to 0.

Optionally, the speech recognition module is configured to determine that a point of an i-1 th frame data in the first group of data is a valley point when an average value of the i-2 th frame data in the first group of data is greater than an average value of the i-1 th frame data and the average value of the i-1 th frame data is less than or equal to the average value of the i-1 th frame data; i is more than or equal to 2.

Optionally, the speech recognition module is configured to determine that a point of an i-1 th frame data in the first group of data is a peak point when an average value of the i-2 th frame data in the first group of data is smaller than an average value of the i-1 th frame data and the average value of the i-1 th frame data is greater than or equal to the average value of the i-1 th frame data; i is more than or equal to 2.

Optionally, the speech recognition module is configured to determine that a specific speech in a jth frame of data is the first specific speech when a specific speech parameter of the jth frame of data is smaller than a second preset value;

Optionally, after the specific voice is determined to be the first specific voice, the voice recognition module is further configured to determine a vibration sense of the motor according to an average value of jth frame data in the first group of data, so as to control the motor to vibrate according to the vibration sense through the control instruction.

A motor control chip comprises a processor and a memory;

the memory is used for storing computer execution instructions;

the processor is configured to perform the motor control method as described in any one of the above.

Compared with the prior art, the technical scheme provided by the invention has the following advantages:

the motor control method, the control system and the control chip provided by the invention can be used for acquiring the audio data of the current running game, judging whether the audio data contains specific voice, judging whether the specific voice is the first specific voice if the audio data contains the specific voice, controlling the motor to vibrate when the first specific voice is played, so that the motor can be controlled to generate corresponding vibration only when the player shoots the gunshot by himself, the false vibration caused by shooting the gunshot by other people is shielded, the man-machine interaction is enhanced, and the interest of gunshot vibration of shooting games is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of a motor control method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a motor control system according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, so that the above is the core idea of the present invention, and the above objects, features and advantages of the present invention can be more clearly understood. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

An embodiment of the present invention provides a motor control method, as shown in fig. 1, including:

s101: acquiring stereo audio data of a current running game;

taking the mobile phone as an example for running the game, in the running process of the game, the game can generate and play different voices along with the operation of the player, for example, when the player shoots, the game can generate and play a gunshot. Based on this, in the embodiment of the present invention, audio data of a currently running game, that is, a stereoscopic audio signal, may be obtained from the mobile phone system end, where the audio data may be an audio data stream output after decoding audio files in various formats, and then the audio data stream is processed into a 16-bit signed integer, and here, taking an absolutely survival-oriented stimulation battlefield game as an example, a stereoscopic audio data stream with a sampling rate of 48k and a sampling depth of 16bit is obtained from the mobile phone system end. Of course, in other embodiments, the sampling rate and the sampling depth may be set according to actual situations, and are not described herein again.

S102: judging whether the audio data stream contains specific voice, if so, entering S103;

s103: judging whether the specific voice is a first specific voice, wherein the specific voice comprises a gunshot, the first specific voice comprises the own gunshot of a player currently running the game, and if the specific voice is the first specific voice, entering S104;

s104: and controlling the motor to vibrate when the first specific voice is played.

After the stereo audio data of the current running game is acquired, the voice to be played generated by the current running game can be acquired from the stereo audio data, whether the required specific voice such as a gunshot is contained in the voice can be identified, whether the specific voice is a first specific voice is judged, whether the gunshot is the own gunshot of a player is judged, if the specific voice is the own gunshot of the player, the motor is controlled to vibrate while the first specific voice, namely the own gunshot of the player, is played, and if the specific voice is not the own gunshot of the player but the gunshot of another player, the motor is not controlled to vibrate. In the embodiment of the present invention, only the specific voice is the gunshot, and the first specific voice is the player's own gunshot, but the present invention is not limited thereto.

When the game plays the voice, the three-dimensional voice with different left and right channel parameters can be generated according to the distance between the voice-emitting position and the player and the angle between the voice-emitting position and the player, for example, the voice emitted by the player, the full-band voice volume difference of the left and right channels of the player is not large, the voice emitted by other players on the left side of the player is large, the high-frequency part frequency band voice volume of the left channel of the player is small, the high-frequency part frequency band voice volume of the right channel of the player is large, and the high-frequency part frequency band voice volume of the left channel of the player is small, so that the player can judge the voice-emitting position according to the three-dimensional voice.

Based on this, in the embodiment of the present invention, it may be determined whether the current voice includes the specific voice according to the stereo audio data, and whether the specific voice is the first specific voice according to the left channel data and the right channel data in the stereo audio data. In addition, in order to reduce the data amount for determining whether the current speech in the stereo audio data includes the specific speech and improve the calculation efficiency, in some embodiments of the present invention, the left channel data and the right channel data in the stereo audio data are reduced, so as to determine whether the current speech includes the specific speech by calculating smaller data.

That is, in some embodiments of the present invention, before determining whether the stereo audio data includes the specific voice, the method further includes: the stereo audio data is divided into a first group of data, a second group of data and a third group of data, wherein the first group of data is the minimum absolute value of left channel data and right channel data in the stereo audio data, the second group of data is the left channel data in the stereo audio data, and the third group of data is the right channel data in the stereo audio data.

And, judging whether the stereo audio data includes the specific voice includes: judging whether the audio data contains specific voice according to the first group of data; the judging whether the specific voice is the first specific voice comprises the following steps: and judging whether the specific voice is the first specific voice according to the second group of data and the third group of data.

Wherein, the calculation formula of the minimum absolute value in the left channel data and the right channel data is as follows:

(iv) if abs (Audio _ l (n)) is ≦ abs (Audio _ r (n)), then Audio (n) ═ Audio _ l (n));

if abs (Audio _ l (n)) > abs (Audio _ r (n)), Audio (n) ═ Audio _ r (n));

where n is 0, 1, and 2 …, Audio (n) represents the first set of data, Audio _ l (n) represents the left channel data, Audio _ r (n) represents the right channel data, and abs () represents the absolute value.

On the basis, before judging whether the stereo audio data contains the specific voice according to the first group of data, the method further comprises the following steps:

dividing the first group of data, the second group of data and the third group of data into multi-frame data, wherein each N data is a frame, the end of less than N data is subjected to 0 complementing treatment, and N is a natural number greater than 1; for convenience of calculation, N may be 1024; wherein, the number of frames into which the first group of data, the second group of data and the third group of data are divided is the same;

each frame of data in the first group of data is subjected to low-pass filtering processing, each frame of data in the second group of data and the third group of data is subjected to band-pass filtering processing, data of a frequency band where a specific voice is located in each frame of data are reserved, data of other frequency bands are filtered, and the specific voice comprises the first specific voice. Because the main frequency point of the gunshot in the game is in the range of 60Hz-200Hz, the cut-off frequency of the low-pass filtering is 225Hz in some embodiments of the invention, and most of the human voice and the background sound in the game are filtered. Because the stereo gunshot by other players in the game is attenuated within the frequency band range of 2800Hz-3600Hz, in some embodiments of the invention, the band-pass filtering range is 2800Hz-3600Hz to filter out some gunshots of other players, thereby improving the identification efficiency of the gunshots of the players.

And then, taking absolute values of N data in each frame of data in the first group of data, the second group of data and the third group of data after filtering processing, summing the absolute values, and calculating an average value of the N data in each frame of data in the first group of data, the second group of data and the third group of data.

The method comprises the steps of obtaining absolute values of N data in each frame of data in a first group of data after low-pass filtering, summing to obtain SUM (i), calculating an average value AVE (i) of the N data in each frame of data in the first group of data, obtaining absolute values of the N data in each frame of data in a second group of data after band-pass filtering, summing to obtain SUM _ L (i), calculating an average value AVE _ L (i) of the N data in each frame of data in the second group of data, obtaining absolute values of the N data in each frame of data in a third group of data after band-pass filtering, summing to obtain SUM _ R (i), and calculating an average value AVE _ R (i) of the N data in each frame of data in the third group of data; in addition, i ═ 0, 1, and 2 …, AVE (i) ═ SUM (i)/N, AVE _ l (i) ═ SUM _ l (i)/N, and AVE _ r (i) ═ SUM _ r (i)/N.

After obtaining the average value ave (i) of N data in each frame of data in the first group of data, it may be determined whether the stereo audio data includes the specific voice according to the average value ave (i), in some embodiments of the present invention, determining whether the stereo audio data includes the specific voice according to the first group of data includes:

judging that the position of the ith frame data in the first group of data is a fast peak point, and if i is more than or equal to 0, determining that the ith frame data contains specific voice;

judging that the point of the ith frame data in the first group of data is not a fast peak point, but the point of the ith-1 frame data in the first group of data is a peak point, wherein i is more than or equal to 1, and the ith-1 frame data contains specific voice;

if the i-th frame data or the i-1 th frame data contains a specific voice, the stereo audio data contains the specific voice.

Namely, judging whether the position of the ith frame data in the first group of data is a fast peak point, wherein i is more than or equal to 0, and if the position is the fast peak point, the ith frame data contains specific voice; if the data is not the fast peak point, judging whether the i-1 frame data in the first group of data is the peak point, wherein i is more than or equal to 1, and if the data is the peak point, the i-1 frame data contains specific voice; if the i-th frame data or the i-1 th frame data contains a specific voice, the stereo audio data contains the specific voice.

In some embodiments of the present invention, determining that the point of the ith frame data in the first group of data is a fast peak point includes:

judging whether two conditions that the average value of the ith frame data in the first group of data is greater than or equal to a first preset value and the average value of the data between the ith frame data in the first group of data and the previous adjacent valley point is less than or equal to the first preset value are met simultaneously, wherein i is greater than or equal to 0; if yes, the ith frame data point in the first group of data is the fast peak point.

When the specific voice is the gunshot, the first preset value is a trigger threshold value for judging the strongest vibration sense through gunshot vibration. Optionally, the first preset value is 3500, which is not limited to this, and the first preset value may be set according to actual situations in different application scenarios.

In some embodiments of the present invention, determining whether the i-1 th frame data in the first set of data is a peak point includes:

judging whether two conditions that the average value of the i-2 th frame data in the first group of data is smaller than the average value of the i-1 th frame data and the average value of the i-1 th frame data is larger than or equal to the average value of the i-1 th frame data are simultaneously satisfied, wherein i is larger than or equal to 2; if yes, the point of the i-1 th frame data in the first group of data is judged to be the peak point.

In some embodiments of the present invention, determining whether the peak is a valley point comprises:

judging whether two conditions that the average value of the i-2 th frame data in the first group of data is larger than the average value of the i-1 th frame data and the average value of the i-1 th frame data is smaller than or equal to the average value of the i-1 th frame data are simultaneously satisfied, wherein i is larger than or equal to 2; if yes, the point of the i-1 th frame data in the first group of data is judged to be a valley point.

It should be noted that, when the data includes a specific voice such as a gunshot, the average value of the data is large, the point where the data is located is generally a peak point, and the point between adjacent specific voices such as adjacent gunshots is generally a valley point. However, when determining whether a specific voice exists in the data, if determining whether the point of the ith frame data is a peak point, the determination needs to be performed after the mobile phone system end outputs the (i + 1) th frame data, which results in a long determination time.

Based on this, in some embodiments of the present invention, whether the point of the ith frame data is a fast peak point is determined by determining whether the average value of the ith frame data is greater than or equal to a first preset value, where the fast peak point is located at a position to reach the peak point, and whether the peak point exists can be determined by determining whether the fast peak point exists, so as to determine whether a specific voice exists, so that it is not necessary to wait for the mobile phone system to output the (i + 1) th frame data, and thus it is possible to quickly determine whether the specific voice exists in the data.

It should be noted that after the average value of the ith frame data in the first group of data is determined to be greater than or equal to the first preset value, it is also determined that the average values of the data between the ith frame data and the adjacent valley point before the ith frame data in the first group of data are all determined to be less than or equal to the first preset value, so as to determine whether the point of the ith frame data is the fast peak point. This is because not only the point on the left side of the peak point has a mean value equal to or greater than the first preset value, but also the point on the right side of the peak point has a mean value equal to or greater than the first preset value. If the average value of the data between the ith frame data and the adjacent preceding valley point in the first group of data is greater than the first preset value, it is determined that the fast peak point on the left side of the peak point has been determined, and it is determined that the data contains the specific voice, so that the determination does not need to be repeated. However, if the average value of the data between the ith frame data and the adjacent preceding valley point in the first group of data is less than or equal to the first preset value, it indicates that the peak point is not determined to contain the specific voice, and therefore, the subsequent determination process needs to be performed.

It should be noted that the reason why the data between the i-th frame data and the adjacent preceding valley point in the first group of data is defined is that the audio data may include a plurality of specific voices, such as a plurality of gunshots, and the point between the adjacent specific voices, such as the adjacent gunshots, is generally a valley point, so that, in order to avoid determining the specific voice that has been previously determined, the determined data is defined as the data between the i-th frame data and the adjacent preceding valley point in the first group of data when determining the fast peak point.

In some embodiments, determining whether the stereo audio data includes the specific voice according to the first group of data includes:

judging whether two conditions that the average value AVE (i-2) of the i-th frame data in the first group of data is smaller than the average value AVE (i-1) of the i-th frame data in the first group of data and the average value AVE (i-1) of the i-th frame data is larger than or equal to the average value AVE (i) of the i-th frame data are simultaneously satisfied, wherein i is larger than or equal to 2;

if yes, the point of the i-1 frame data in the first group of data is judged to be a Peak point, and a Peak mark Peak _ falg is 1 and a trough mark Valley _ flag is 0;

if not, judging whether two conditions that the average value AVE (i-2) of the i-th frame data (i-1) in the first group of data is larger than the average value AVE (i-1) of the i-th frame data (i-1) and the average value AVE (i-1) of the i-th frame data (i-1) is smaller than or equal to the average value AVE (i) of the i-th frame data are simultaneously satisfied;

if yes, determining that the point of the i-1 th frame data in the first group of data is a Valley point, and enabling a fast Peak mark FastPeak _ flag to be 0, a Peak mark Peak _ falg to be 0 and a Valley mark Valley _ flag to be 1;

if not, judging whether two conditions, namely the average value AVE (i) of the ith frame data in the first group of data is larger than a first preset value and the fast peak mark FastPeak _ flag is 0, are simultaneously satisfied, wherein i is larger than or equal to 0;

if yes, determining that the position of the ith frame data in the first group of data is a fast peak point, and enabling a fast peak mark FastPeak _ flag to be 1 and a trough mark Valley _ flag to be 0;

and judging whether the fast peak mark FastPeak _ flag is 1, if so, judging that the ith frame data contains specific voice, and if not, judging that the ith frame data does not contain the specific voice.

In some embodiments of the present invention, after determining that the i-th frame data does not contain the specific speech, the method further includes:

judging whether two conditions of a Peak mark Peak _ falg being equal to 1 and a fast Peak mark FastPeak _ flag being equal to 0 are simultaneously satisfied or not; if yes, the frame data i-1 contains the specific voice, and if not, the frame data i-1 does not contain the specific voice.

Similarly, if the i-th frame data or the i-1 th frame data includes a specific speech, it is determined that the stereo audio data includes the specific speech.

After the data of the ith frame is judged to contain the specific voice, the method further comprises the following steps: judging whether the specific voice contained in the ith frame data is the first specific voice or not; after judging that the i-1 th frame data contains the specific voice, the method further comprises the following steps: it is determined whether or not the specific voice included in the i-1 th frame data is the first specific voice.

In some embodiments of the present invention, determining whether the specific voice is the first specific voice according to the second set of data and the third set of data includes:

judging whether the specific voice parameter of the jth frame data is smaller than a second preset value or not, and if so, judging that the specific voice in the jth frame data is a first specific voice;

if the average value of the jth frame data in the second group of data is larger than or equal to the average value of the jth frame data in the third group of data, the specific voice parameter is equal to the ratio of the average value of the jth frame data in the second group of data to the average value of the jth frame data in the third group of data; if the average value of the j frame data in the second group of data is smaller than the average value of the j frame data in the third group of data, the specific voice parameter is equal to the ratio of the average value of the j frame data in the third group of data to the average value of the j frame data in the second group of data.

In some embodiments, determining whether the specific voice is the first specific voice according to the second set of data and the third set of data includes:

judging whether the average value AVE _ L (j) of the jth frame data in the second group of data is larger than or equal to the average value AVE _ R (j) of the jth frame data in the third group of data, wherein j is more than or equal to 0;

if yes, namely when the average value AVE _ l (j) of the j-th frame data in the second group of data is greater than or equal to the average value AVE _ r (j) of the j-th frame data in the third group of data, the specific voice parameter Self _ shot _ flag is equal to the ratio of the average value AVE _ l (j) of the j-th frame data in the second group of data to the average value AVE _ r (j) of the j-th frame data in the third group of data, that is, the Self _ shot _ flag is equal to AVE _ l (j)/AVE _ r (j);

if not, namely when the average value AVE _ l (j) of the j-th frame data in the second group of data is smaller than the average value AVE _ r (j) of the j-th frame data in the third group of data, the specific voice parameter Self _ shot _ flag is made equal to the ratio of the average value AVE _ r (j) of the j-th frame data in the third group of data to the average value AVE _ l (j) of the j-th frame data in the second group of data, namely the Self _ shot _ flag is made equal to AVE _ r (j)/AVE _ l (j);

and judging whether the specific voice parameter Self _ shot _ flag is smaller than a second preset value or not, and if so, judging that the jth frame data contains the first specific voice. Wherein, when the specific voice is a gunshot, the second preset value can be selected to be 1.08.

That is, if the point where the ith frame data is located is determined as the fast peak point, it may be determined that the ith frame data contains the specific voice, and then it is determined whether the average value AVE _ l (i) of the ith frame data in the second group of data is greater than or equal to the average value AVE _ r (i) of the ith frame data in the third group of data, where i is greater than or equal to 0; if yes, let Self _ shot _ flag ═ AVE _ l (i)/AVE _ r (i); if not, let Self _ shot _ flag ═ AVE _ r (i)/AVE _ l (i); and further judging whether the specific voice parameter Self _ shot _ flag is smaller than a second preset value, if so, judging that the ith frame data contains the first specific voice.

When the point where the i-1 frame data is located is judged to be a peak point and not a fast peak point, the i-1 frame data can be judged to contain specific voice, and then whether the average value AVE _ L (i-1) of the i-1 frame data in the second group of data is larger than or equal to the average value AVE _ R (i-1) of the i-1 frame data in the third group of data or not is judged, wherein i-1) is larger than or equal to 0; if yes, let Self _ shot _ flag be AVE _ L (i-1)/AVE _ R (i-1); if not, let Self _ shot _ flag be AVE _ R (i-1)/AVE _ L (i-1); and further judging whether the specific voice parameter Self _ shot _ flag is smaller than a second preset value, if so, judging that the i-1 frame data contains the first specific voice.

In some embodiments of the present invention, after determining that the jth frame data includes the first specific speech, the method further includes: and determining the vibration sense of the motor according to the average value of the j frame data in the first group of data so as to control the motor to vibrate according to the vibration sense.

Specifically, when the point where the jth frame data is located is a fast peak point and it is determined that the jth frame data contains the first specific voice, the motor is controlled to vibrate in a corresponding vibration sense in the jth frame; and when the point of the j-1 th frame data is a peak point and is not a rapid peak point, and the j-1 th frame data is judged to contain the first specific voice, controlling the motor to vibrate in a corresponding vibration sense in the j frame.

Because the voices emitted by different types of guns are different, in some embodiments of the invention, the motors are controlled to vibrate in different vibration senses according to different gun sounds, so that the game experience effect is improved. When j frame data contains first specific voice, making motor vibration sense judging data PeakData equal to average value AVE (j) of j frame data in first group data, i.e. making PeakData equal to AVE (j), if 1000 ≤ PeakData < 2100, making motor vibration sense be first value; if 2100 is not more than PeakData and is less than 2500, the vibration sense of the motor is made to be a second value; if the PeakData is less than 3500 and is not less than 2500, the vibration sense of the motor is made to be a third value; if 3500 is less than or equal to PeakData, making the vibration sense of the motor as a fourth value; PeakData equals the other values, the motor does not vibrate or stops vibrating.

Alternatively, the first value, the second value, the third value, and the fourth value are sequentially increased, and the vibration intensity of the motor is stronger as the average value ave (j) is larger. However, the present invention is not limited thereto, and in other embodiments, the vibration intensities of the motors having different average values ave (j) are the same, but the vibration duration of the motors is longer as the average value ave (j) is larger, so that the vibration intensities of the motors having different average values ave (j) are different. It should be noted that, in the embodiment of the present invention, the stereo audio data of the currently running game is continuously acquired, the stereo audio data is continuously grouped, framed, filtered, and the like, each frame of data is continuously determined according to a time sequence, and if it is determined that one frame of data includes the first specific voice, the motor is controlled to perform corresponding vibration. Whether the current data contains the first specific voice or not in the judgment result is judged, and the subsequently output stereo audio data is judged until the current running game is stopped or closed.

An embodiment of the present invention further provides a motor control system, as shown in fig. 2, including:

the audio acquisition module 20 is configured to acquire stereo audio data of a currently running game;

the voice recognition module 21 is configured to determine whether the stereo audio data includes a specific voice and whether the specific voice is a first specific voice, and output a control instruction, where the specific voice includes a gunshot, and the first specific voice includes a gunshot of a player currently running a game;

the motor driving chip 23 is used for receiving the control instruction output by the voice recognition module 21 and controlling the motor 25 to vibrate;

and the audio power amplifier module 22 is configured to receive the stereo audio data acquired by the audio acquisition module 20, and control the speaker 24 to play the stereo audio data.

It should be noted that after the audio obtaining module 20 obtains the stereo audio data of the currently running game, the stereo audio data is transmitted to the audio power amplifier module 22, and the audio power amplifier module 22 controls the speaker 24 to play the stereo audio data. The audio acquisition module 20 transmits the stereo audio data to the audio power amplifier module 22, and simultaneously transmits the stereo audio data to the voice recognition module 21, the voice recognition module 21 determines whether the stereo audio data contains specific voice and whether the specific voice is the first specific voice, if the stereo audio data contains the specific voice and the specific voice is the first specific voice, the control instruction is output to the motor driving chip 23, and the motor driving chip 23 controls the motor 25 to vibrate after receiving the control instruction. Since the recognition speed of the voice recognition module 21 is fast, the motor 25 may be controlled to vibrate while the speaker 24 plays the stereo audio data.

Taking the mobile phone as an example for running the game, in the running process of the game, the game can generate and play different voices along with the operation of the player, for example, when the player shoots, the game can generate and play a gunshot. Based on this, the audio obtaining module 20 obtains the audio data of the currently running game, i.e. the stereo audio signal, from the mobile phone system end, where the audio data may be an audio data stream output after decoding audio files in various formats.

In some embodiments of the present invention, the voice recognition module 21 is further configured to divide the stereo audio data into a first group of data, a second group of data, and a third group of data, where the first group of data is a minimum absolute value of left channel data and right channel data in the stereo audio data, the second group of data is left channel data in the stereo audio data, and the third group of data is right channel data in the stereo audio data, and determine whether the stereo audio data includes a specific voice according to the first group of data, and determine whether the specific voice is a first specific voice according to the second group of data and the third group of data.

On this basis, before judging whether the stereo audio data contains the specific voice according to the first group of data, the voice recognition module 21 is further configured to divide the first group of data, the second group of data and the third group of data into multiple frames of data, perform low-pass filtering on each frame of data in the first group of data, perform band-pass filtering on each frame of data in the second group of data and the third group of data, retain data of a frequency band where the specific voice is located in each frame of data, sum up N absolute values of each frame of data in the first group of data, the second group of data and the third group of data after filtering, and calculate an average value of N data in each frame of data in the first group of data, the second group of data and the third group of data;

wherein, every N data is a frame, and N is a natural number greater than 1.

That is, the speech recognition module 21 is further configured to divide the first group of data, the second group of data, and the third group of data into multiple frames of data, where each N number of data is a frame, and the end of less than N numbers of data is subjected to 0 complementing processing, where N is a natural number greater than 1; each frame of data in the first group of data is subjected to low-pass filtering processing, each frame of data in the second group of data and each frame of data in the third group of data are subjected to band-pass filtering processing, data of a frequency band where a specific voice is located in each frame of data are reserved, data of other frequency bands are filtered, and the specific voice comprises first specific voice; taking absolute values of N data in each frame of data in the first group of data after low-pass filtering processing, summing to obtain SUM (i), calculating an average value AVE (i) of the N data in each frame of data in the first group of data, taking absolute values of the N data in each frame of data in the second group of data after band-pass filtering processing, summing to obtain SUM _ L (i), calculating an average value AVE _ L (i) of the N data in each frame of data in the second group of data, taking absolute values of the N data in each frame of data in the third group of data after band-pass filtering processing, summing to obtain SUM _ R (i), and calculating an average value AVE _ R (i) of the N data in each frame of data in the third group of data. In addition, i ═ 0, 1, and 2 …, AVE (i) ═ SUM (i)/N, AVE _ l (i) ═ SUM _ l (i)/N, and AVE _ r (i) ═ SUM _ r (i)/N.

In some embodiments of the present invention, the speech recognition module 21 is configured to determine that the stereo audio data includes a specific speech when i ≧ 0 is when the point of the ith frame data in the first group of data is a fast peak point, or determine that the stereo audio data includes a specific speech when i ≧ 1 is when the point of the ith frame data in the first group of data is not a fast peak point but the point of the i-1 frame data in the first group of data is a peak point.

In some embodiments of the present invention, the speech recognition module 21 is configured to determine that the point of the ith frame data in the first group of data is a fast peak point when the average value of the ith frame data in the first group of data is greater than or equal to a first preset value and the average values of the data between the ith frame data and the adjacent preceding valley point in the first group of data are less than or equal to the first preset value; i is greater than or equal to 0.

In some embodiments of the present invention, the speech recognition module 21 is configured to determine that the point of the i-1 th frame data in the first set of data is a valley point when the average value of the i-2 th frame data in the first set of data is greater than the average value of the i-1 th frame data and the average value of the i-1 th frame data is less than or equal to the average value of the i-1 th frame data; i is more than or equal to 2.

In some embodiments of the present invention, the speech recognition module 21 is configured to determine that the point of the i-1 th frame data in the first group of data is a peak point when the average value of the i-2 th frame data in the first group of data is smaller than the average value of the i-1 th frame data and the average value of the i-1 th frame data is greater than or equal to the average value of the i-1 th frame data; i is more than or equal to 2.

In some embodiments, the speech recognition module 21 is configured to determine whether two conditions, i ≧ 2, are satisfied simultaneously, where the average AVE (i-2) of the i-2 th frame data in the first group of data is smaller than the average AVE (i-1) of the i-1 th frame data, and the average AVE (i-1) of the i-1 th frame data is greater than or equal to the average AVE (i) of the i-th frame data;

On this basis, after the frame data i is judged not to contain the specific voice, the voice recognition module 21 is further configured to judge whether two conditions, namely, a Peak flag Peak _ falg is equal to 1 and a fast Peak flag FastPeak _ flag is equal to 0, are satisfied simultaneously; if yes, the frame data i-1 contains the specific voice, and if not, the frame data i-1 does not contain the specific voice. If the i-th frame data is determined to contain the specific voice or the i-1 th frame data is determined to contain the specific voice, the voice recognition module 21 determines that the stereo audio data contains the specific voice.

On this basis, in some embodiments of the present invention, the speech recognition module 21 is further configured to determine that the specific speech in the jth frame of data is the first specific speech when the specific speech parameter of the jth frame of data is smaller than the second preset value;

In some embodiments, the speech recognition module 21 is configured to determine whether an average AVE _ l (j) of the jth frame data in the second group of data is greater than or equal to an average AVE _ r (j) of the jth frame data in the third group of data, where j is greater than or equal to 0; if yes, the specific voice parameter seif _ shot _ flag is equal to the ratio of the average value AVE _ l (j) of the j-th frame data in the second group of data to the average value AVE _ r (j) of the j-th frame data in the third group of data, that is, the seif _ shot _ flag is AVE _ l (j)/AVE _ r (j); if not, the specific voice parameter Self _ shot _ flag is equal to the ratio of the average value AVE _ r (j) of the j-th frame data in the third group of data to the average value AVE _ l (j) of the j-th frame data in the second group of data, that is, the Self _ shot _ flag is AVE _ r (j)/AVE _ l (j); and judging whether the specific voice parameter Self _ shot _ flag is smaller than a second preset value or not, and if so, judging that the jth frame data contains the first specific voice. Wherein, when the specific voice is a gunshot, the second preset value can be selected to be 1.08.

That is, after determining that the ith frame data contains the specific voice, the voice recognition module 21 is configured to determine whether an average value AVE _ l (i) of the ith frame data in the second set of data is greater than or equal to an average value AVE _ r (i) of the ith frame data in the third set of data, i ≧ 0;

if yes, let Self _ shot _ flag ═ AVE _ l (i)/AVE _ r (i);

if not, let Self _ shot _ flag ═ AVE _ r (i)/AVE _ l (i);

and judging whether the specific voice parameter Self _ shot _ flag is smaller than a second preset value or not, and if so, judging that the ith frame data contains the first specific voice.

After determining that the i-1 th frame data contains the specific voice, the voice recognition module 21 is configured to determine whether an average AVE _ L (i-1) of the i-1 th frame data in the second group of data is greater than or equal to an average AVE _ R (i-1), (i-1) is greater than or equal to 0 of the i-1 th frame data in the third group of data;

if yes, let Self _ shot _ flag be AVE _ L (i-1)/AVE _ R (i-1);

if not, let Self _ shot _ flag be AVE _ R (i-1)/AVE _ L (i-1);

and judging whether the specific voice parameter Self _ shot _ flag is smaller than a second preset value or not, and if so, judging that the i-1 th frame data contains the first specific voice.

In some embodiments of the present invention, after determining that the specific voice is the first specific voice, the voice recognition module 21 is further configured to determine a vibration sense of the motor according to an average value of jth frame data in the first group of data, so as to control the motor to vibrate according to the vibration sense through the control instruction.

In some embodiments, after the voice recognition module 21 determines that the j-th frame data includes the first specific voice, the motor vibration sense determination data PeakData is equal to an average value ave (j) of the j-th frame data in the first group of data, that is, PeakData is equal to ave (j), then, it is determined whether PeakData is greater than or equal to 1000 and less than 2100, if yes, the motor vibration sense is set to the first value, if no, it is determined whether PeakData is greater than or equal to 2100 and less than 2500, if yes, the motor vibration sense is set to the second value, if no, it is determined whether PeakData is greater than or equal to 2500 and less than 3500, if yes, the motor vibration sense is set to the third value, if no, it is determined whether PeakData is greater than or equal to 3500, if yes, the motor vibration sense is set to the fourth value, and if no, the motor does not vibrate or stop.

The embodiment of the invention also provides a motor control chip, which comprises a processor and a memory;

the memory is used for storing computer execution instructions;

when the processor executes the computer execution instructions, the processor executes the motor control method provided by any one of the above embodiments. That is, the processor is configured to execute the motor control method provided in any of the above embodiments.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A motor control method, comprising:

acquiring stereo audio data of a current running game;

judging whether the stereo audio data contains specific voice or not;

2. The method of claim 1, wherein determining whether the stereo audio data includes a specific voice further comprises:

3. The method of claim 2, wherein before determining whether the stereo audio data includes the specific speech according to the first set of data, further comprising:

4. The method of claim 3, wherein determining whether the stereo audio data includes a specific voice according to the first set of data comprises:

5. The method of claim 4, wherein determining that the point at which the ith frame of data in the first set of data is a fast peak point comprises:

6. The method of claim 5, wherein determining whether it is a valley point comprises:

7. The method of claim 4, wherein determining whether the i-1 th frame data in the first set of data is a peak point comprises:

8. The method of claim 4, wherein determining whether the specific voice is a first specific voice according to the second set of data and the third set of data comprises:

9. The method of claim 1, wherein after determining that the specific speech is the first specific speech, further comprising:

10. A motor control system, comprising:

11. The system of claim 10, wherein the voice recognition module is further configured to divide the stereo audio data into a first group of data, a second group of data, and a third group of data, the first group of data is a smallest absolute value of left channel data and right channel data in the stereo audio data, the second group of data is left channel data in the stereo audio data, and the third group of data is right channel data in the stereo audio data, determine whether the stereo audio data contains a specific voice according to the first group of data, and determine whether the specific voice is a first specific voice according to the second group of data and the third group of data.

12. The system according to claim 11, wherein before determining whether the stereo audio data includes a specific voice according to the first group of data, the voice recognition module is further configured to divide the first group of data, the second group of data, and the third group of data into multiple frames of data, perform low-pass filtering on each frame of data in the first group of data, perform band-pass filtering on each frame of data in the second group of data and the third group of data, retain data in a frequency band in which the specific voice is located in each frame of data, sum up N absolute values of each frame of data in the first group of data, the second group of data, and the third group of data after filtering, and calculate an average value of N data in each frame of data in the first group of data, the second group of data, and the third group of data;

wherein, every N data is a frame, and N is a natural number greater than 1.

13. The system of claim 12, wherein the speech recognition module is configured to determine that the stereo audio data includes the specific speech if i ≧ 0 when a point of an i-th frame data in the first group of data is a fast peak point, or determine that the stereo audio data includes the specific speech if i ≧ 1 when the point of the i-th frame data in the first group of data is not a fast peak point but i-1-th frame data in the first group of data is a peak point.

14. The system of claim 13, wherein the speech recognition module is configured to determine that the point of the ith frame data in the first group of data is a fast peak point when the average value of the ith frame data in the first group of data is greater than or equal to a first preset value and the average values of the data between the ith frame data and the adjacent preceding valley point in the first group of data are less than or equal to the first preset value; i is greater than or equal to 0.

15. The system of claim 14, wherein the speech recognition module is configured to determine that the point of the i-1 th frame data in the first set of data is a valley point when the average value of the i-2 th frame data in the first set of data is greater than the average value of the i-1 th frame data and the average value of the i-1 th frame data is less than or equal to the average value of the i-1 th frame data; i is more than or equal to 2.

16. The system of claim 13, wherein the speech recognition module is configured to determine that the point of the i-1 frame data in the first set of data is a peak point when the average value of the i-2 frame data in the first set of data is less than the average value of the i-1 frame data and the average value of the i-1 frame data is greater than or equal to the average value of the i-1 frame data; i is more than or equal to 2.

17. The system according to claim 13, wherein the speech recognition module is configured to determine that the specific speech in the jth frame of data is the first specific speech when the specific speech parameter in the jth frame of data is smaller than a second preset value;

18. The system according to claim 12, wherein after determining that the specific speech is the first specific speech, the speech recognition module is further configured to determine a vibration sense of the motor according to an average value of j-th frame data in the first group of data, so as to control the motor to vibrate according to the vibration sense through the control instruction.

19. A motor control chip is characterized by comprising a processor and a memory;

the memory is used for storing computer execution instructions;

the processor is used for executing the motor control method according to any one of claims 1 to 9.