CN110136718A - The method and apparatus of voice control - Google Patents
The method and apparatus of voice control Download PDFInfo
- Publication number
- CN110136718A CN110136718A CN201910473077.3A CN201910473077A CN110136718A CN 110136718 A CN110136718 A CN 110136718A CN 201910473077 A CN201910473077 A CN 201910473077A CN 110136718 A CN110136718 A CN 110136718A
- Authority
- CN
- China
- Prior art keywords
- image
- control
- interface
- target
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0484—Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Abstract
The embodiment of the invention provides a kind of method and apparatus of voice control, are related to voice control technology field, and the method includes obtaining interface image corresponding with current display interface;Phonetic order identifies the control image for including in the interface image based on the received, and determines target widget image corresponding with the phonetic order;According to position of the target widget image in the interface image, determine that target corresponding with the target widget image can trigger target position of the control in the current display interface;The cursor is moved to the target position.The embodiment of the present invention is based on image recognition technology and realizes that voice control cursor is mobile, to improve the interactivity of speech recognition technology and operation interface.
Description
Technical field
The present invention relates to voice control technology fields, method and a kind of voice control more particularly to a kind of voice control
Device.
Background technique
With the continuous development of mechanics of communication, the control mode of terminal is more and more abundant, also more and more intelligent.In man-machine friendship
Mutually in application, speech recognition technology progresses into our daily life, such as existing smart phone, tablet computer, intelligence electricity
Depending on etc. be all accompanied with speech identifying function.Although list relatively simple in specified range may be implemented in existing speech recognition technology
One instruction task, such as the operation of " opening camera ".But shown on existing speech recognition technology and current operation interface
Content is unrelated, and interactivity is small, and not can control cursor and move in operation interface, can not directly trigger on current operation interface
Corresponding triggerable control.
Summary of the invention
In view of the above problems, it proposes the embodiment of the present invention and overcomes the above problem or at least partly in order to provide one kind
A kind of method of the voice control to solve the above problems and a kind of corresponding device of voice control.
To solve the above-mentioned problems, the embodiment of the invention discloses a kind of methods of voice control, including
Obtain interface image corresponding with current display interface;
Phonetic order identifies the control image for including in the interface image based on the received, and determination refers to the voice
Enable corresponding target widget image;
According to position of the target widget image in the interface image, determination is corresponding with the target widget image
Target can trigger target position of the control in the current display interface;
The cursor is moved to the target position.
In a preferred embodiment, the phonetic order based on the received identifies the control for including in the interface image
Image, and the step of determining target widget image corresponding with the phonetic order, comprising:
It identifies the interface image, corresponding voice identifier is matched to the control image that the interface image includes;Wherein,
It can trigger control in the control image and the current display interface to correspond;
Control image corresponding with the matched voice identifier of the phonetic order is determined as target widget image.
In a preferred embodiment, the control for including in the interface image is identified in the phonetic order based on the received
Part image, and before the step of determining target widget image corresponding with the phonetic order, comprising:
The interface image is divided into several regions;
Corresponding area identification is shown in the region.
In a preferred embodiment, the phonetic order includes area voice and control voice;It is described based on the received
Phonetic order identifies the control image for including in the interface image, and determines target widget figure corresponding with the phonetic order
The step of picture, comprising:
By parsing the area voice in the phonetic order, corresponding target area is determined;
It identifies the target area in the interface image, corresponding language is matched to the control image that the target area includes
Phonetic symbol is known;Wherein, the control image and the triggerable control of corresponding region in the current display interface correspond;
Control image corresponding with the voice identifier of control voice match in the phonetic order is determined as target control
Part image.
In a preferred embodiment, the phonetic order based on the received identifies the control for including in the interface image
Image, and the step of determining target widget image corresponding with the phonetic order, further includes:
When control image corresponding with the phonetic order includes more than two, in the interface image to two with
On control image be numbered;
Receive the voice selecting instruction comprising number;
The control image that reference numeral is determined according to the number in voice selecting instruction is target widget image.
In a preferred embodiment, the step of step for obtaining interface image corresponding with current display interface,
Include:
Start voice control mode;
To current display interface screenshotss, interface image corresponding with current display interface is obtained.
In a preferred embodiment, in the position according to the target widget image in the interface image,
Determine that target corresponding with the target widget image can trigger the step of target position of the control in the current display interface
Before rapid, comprising:
Obtain the display resolution of the current display interface;
Image coordinate system corresponding with the interface image is established according to the display resolution.
According to described image coordinate system, the coordinate of the target widget image is determined.
In a preferred embodiment, it after the described the step of cursor is moved to the target position, also wraps
It includes:
Receive orientation phonetic order;
According to the mobile cursor of the orientation phonetic order.
In a preferred embodiment, after the described the step of cursor is moved to the target position, comprising:
The target for triggering the target position can trigger control.
To solve the above-mentioned problems, the device mobile the embodiment of the invention discloses a kind of voice control cursor, comprising:
Interface image obtains module, for obtaining interface image corresponding with current display interface;
Picture recognition module identifies the control image for including in the interface image for phonetic order based on the received,
And determine target widget image corresponding with the phonetic order;
Target position determining module is determined for the position according to the target widget image in the interface image
Target corresponding with the target widget image can trigger target position of the control in the current display interface;
Cursor control module, for the cursor to be moved to the target position.
Compared with prior art, the embodiment of the present invention includes following advantages:
In the embodiment of the present invention, by obtaining interface image corresponding with current display interface;Then language based on the received
The control image for including in sound instruction identification interface image, and determine corresponding with phonetic order target widget image, then root
According to position of the target widget image in interface image, determine that target corresponding with target widget image can trigger control current
The cursor is finally moved to target position by the target position in display interface;The embodiment of the present invention is by by image recognition
Technology realizes that voice control cursor is mobile, in conjunction with speech recognition technology to improve the friendship of speech recognition technology and operation interface
Mutual property.
Detailed description of the invention
Fig. 1 is a kind of step flow chart of the method for voice control of the embodiment of the present invention one;
Fig. 2 is a kind of step flow chart of the method for voice control of the embodiment of the present invention two;
Fig. 3 is the exemplary flow chart of steps of one of which in the embodiment of the present invention two;
Fig. 4 a-4b is the interface image schematic diagram of the method for voice control corresponding with Fig. 3;
Fig. 5 is the exemplary flow chart of steps of another kind in the embodiment of the present invention two;
Fig. 6 a-6b is the interface image schematic diagram of the method for voice control corresponding with Fig. 5;
Fig. 7 is the interface image schematic diagram of the method for voice control of the embodiment of the present invention;
Fig. 8 is a kind of structural block diagram of the device of voice control of the embodiment of the present invention three;
Fig. 9 is a kind of structural block diagram of the device of voice control of the embodiment of the present invention four;
Figure 10 is the exemplary structural block diagram of one of which in the embodiment of the present invention four;
Figure 11 is the exemplary structural block diagram of another kind in the embodiment of the present invention four.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real
Applying mode, the present invention is described in further detail.
One of the core concepts of the embodiments of the present invention is, by obtaining interface image corresponding with current display interface;
Then the control image for including in phonetic order identification interface image based on the received, and determine target corresponding with phonetic order
Control image determines target corresponding with target widget image then according to position of the target widget image in interface image
It can trigger target position of the control in current display interface, the cursor be finally moved to target position;To realize language
Sound control cursor is mobile, to improve the interactivity of speech recognition technology and operation interface.
In the following, being described in detail respectively to the solution of the present invention by following specific embodiments.
Embodiment one:
Referring to Fig.1, a kind of step flow chart of the embodiment of the method one of voice control of the invention is shown, it specifically can be with
Include the following steps:
Step 101, interface image corresponding with current display interface is obtained.
In embodiments of the present invention, the display of terminal can support mouse mode and/or support control mode touch mode.Cursor
Referred to as vernier, for position of the display highlighting control input equipment in operation interface, cursor is generally divided into explicit cursor and hidden
Formula cursor.In a mouse mode, cursor is generally explicit cursor, i.e. the cursor upper layer that is shown in current display interface, and can be with
It is moved to any position of display;Under control mode touch mode, display is touch control display, and the touch control display includes touching
Screen, handwriting screen etc., cursor are generally implicit cursor, can be moved to any position of display, hidden when meeting certain condition
Formula cursor is visible.Current display interface includes that at least one can trigger control, can trigger control and is used to interact with user, with
Realize corresponding function.Specifically, can trigger control can be picture or text with link, it is also possible to the tool icon etc..
Interface image be with the one-to-one image of current display interface size figure, can be by being obtained to current display interface screenshotss
It arrives, current display interface can also be replicated and obtained, the translucent top layer for being presented on current display interface of interface image.
Step 102, phonetic order identifies the control image for including in the interface image, and determining and institute based on the received
State the corresponding target widget image of phonetic order.
In embodiments of the present invention, speech voice input function may be implemented in terminal, to receive phonetic order.Language based on the received
Sound instruction, is handled the part in the entirety or interface image of interface image by image recognition technology, to identify
The control image for including in the entirety of interface image or the part of interface image, control image and current display interface are current aobvious
Show that the corresponding local triggerable control in interface corresponds.When in control image including text, pass through image recognition technology
Voice identifier of the corresponding text as the control image is directly acquired, and is stored in default sound bank;When control image does not wrap
When containing text, presets the corresponding voice identifier of the control image, and is stored in default sound bank.It include control language in phonetic order
Sound matches the control voice in phonetic order with the voice identifier in default sound bank, if successful match, and the voice
Identify corresponding control image be one when, then using the control image as target widget image;If the voice identifier is corresponding
When control image is more than two, then more than two control images corresponding to the voice identifier carry out the in interface image
One visual cues, First look mark, which can be, to be amplified display to control image, is highlighted or numbers display;Pass through shifting
Dynamic cursor continues to voice selecting instruction to determine only one control image as target widget image.
Step 103, the position according to the target widget image in the interface image, the determining and target widget
The corresponding target of image can trigger target position of the control in the current display interface.
After determining target widget image, position of the target widget image in interface image is determined, it can be by building
Vertical image coordinate system or the modes such as camera coordinates system or world coordinate system calculate position of the target widget image in interface image
It sets.Due to interface image be with the one-to-one image of current display interface size figure, target widget image is unique, and
It is also uniquely, accordingly, it is determined that target widget image is in interface image that target corresponding with target widget image, which can trigger control,
In position, so that it may determine target can trigger control in the target position of current display interface.
Step 104, the cursor is moved to the target position.
In the present embodiment, target position is any position of the triggerable control of target in current display interface, i.e. cursor
Being moved to target can trigger any position of control;When cursor is implicit cursor, the determination of target position can also be preset
Target can be touched at this point, target position is the target triggerable control in current display interface for the visible condition of implicit cursor
It sends out control and the second visual cues is provided, which is the visible form of implicit cursor, and the second visual cues can be
To target can trigger control amplify display, be highlighted, tint display etc..It should be noted that First look mark and
Second visual cues needs distinguish, for example, the second visual cues can be adopted when First look is identified using number display
With except number display in addition to other display modes, such as amplify show, be highlighted, display of tinting.
The embodiment of the present invention is by obtaining interface image corresponding with current display interface;Then voice refers to based on the received
The control image for including in identification interface image is enabled, and determines target widget image corresponding with phonetic order, then according to mesh
Position of the control image in interface image is marked, determines that target corresponding with target widget image can trigger control and show currently
The cursor is finally moved to target position by the target position in interface;The embodiment of the present invention is by by image recognition technology
In conjunction with speech recognition technology, realize that voice control cursor is mobile, to improve the interactivity of speech recognition technology and operation interface.
Embodiment two:
Referring to Fig. 2, a kind of step flow chart of the embodiment of the method two of voice control of the invention is shown, it specifically can be with
Include the following steps:
Step 201, start voice control mode.
After terminal starts voice control mode, user can input phonetic order to terminal, realize voice control function.With
Family can start voice control mode by the corresponding key of triggering terminal, can also be by issuing to terminal speech input module
Voice control mode start command is to start voice control mode.The voice control mode start command can be shifted to an earlier date by terminal
Default, storage is into terminal after can also being customized by the user setting.
Step 202, to current display interface screenshotss, interface image corresponding with the current display interface is obtained.
After terminal starts voice control mode, start full frame screenshotss program automatically, to current display interface screenshotss, obtains
Current display interface screenshot be interface image corresponding with current display interface;As an example, interface image is proper
Current display interface is completely covered well, and the transparency of interface image is translucent.For example, using transparency be 100 as
The understanding standard of pellucidity, translucent can be understood as transparency between 30~70.In the present embodiment, interface image
Obtained by the full frame screenshotss of current display interface, and be translucent and state and cut current display interface is completely covered, in order to
Family operates current display interface according to the information in interface image, plays the role of prompt and guidance.
Step 203, phonetic order identifies the control image for including in the interface image, and determining and institute based on the received
State the corresponding target widget image of phonetic order.
Referring to figure 3., as an example, the phonetic order based on the received, which identifies in the interface image, includes
Control image, and the step of determining target widget image corresponding with the phonetic order, including following sub-step:
Step 2031, it identifies the interface image, corresponding voice is matched to the control image that the interface image includes
Mark;Wherein, it can trigger control one-to-one correspondence in the control image and the current display interface.
After terminal receives phonetic order, i.e., image recognition is carried out to interface image, image recognition, which refers to, utilizes computer
Image is handled, analyzed and is understood, to identify the target of various different modes and the technology of object.Pass through image recognition skill
Art obtains the control image in interface image, can trigger control in control image and current display interface and corresponds, Ke Yili
It solves, the control image in interface image is the image that can trigger control in current display interface;When in control image comprising text
When word, voice identifier of the corresponding text as the control image is directly acquired by image recognition technology, and be stored in default language
In sound library;When control image does not include text, the corresponding voice identifier of the control image is preset, and is stored in default sound bank
In.
It include the first control image 41, the second control image 42, third control image 43 in interface image 1 such as Fig. 4 a,
In, the first control image 41 and the second control image 42 include identical text " science ", and third control image 43 includes text
" technology ", obtains the first control image 41 by image recognition technology and the matched voice identifier of the second control image 42 is " section
Learn ", the matched voice of third control image 43 is expressed as " technology ", and is that " science " and " technology " deposit is default by voice identifier
In sound bank.
Step 2032, control image corresponding with the matched voice identifier of the phonetic order is determined as target widget figure
Picture.
Including the corresponding voice identifier of control images all in interface image in default sound bank, parse in phonetic order
Control voice matches control voice with the voice identifier in default sound bank, if successful match, by the voice identifier
Corresponding control image is determined as target widget image.
Specifically, the control image is then determined as target control when the corresponding control image of the voice identifier is one
Part image;In Fig. 4 a, a control image, i.e. third control image 43 are only corresponded to when voice identifier is " technology ", then will
Third control image 43 is determined as target widget image.
When the corresponding control image of the voice identifier is more than two, to more than two control figures in interface image
As being numbered;The voice selecting comprising label is received to instruct;According to voice selecting instruct in number determine reference numeral
Control image is target widget image.In Fig. 4 a, when voice identifier is " science ", corresponding two control images, i.e., first
Control image 41 and the second control image 42, at this point, two control images are numbered in interface image 1, such as Fig. 4 b institute
Show, by number display in the corresponding position of corresponding control image;The corresponding position can be on the side of control image, can also be with
Cover control image;Then the voice selecting instruction comprising number is received, for example, include number 1 in voice selecting instruction, according to
Number 1 in voice selecting instruction determines that the control image of number 1, i.e. the first control image 41 are target widget image.
Referring to figure 5., it as another example, identifies in the interface image and wraps in the phonetic order based on the received
The control image contained, and before the step of determining target widget image corresponding with the phonetic order, including following sub-step:
Step 2033, the interface image is divided into several regions.
It, can be in one fixed partitioned mode of systemic presupposition, according to the partitioned mode of preset fixation in the present embodiment
Interface image is divided, and in interface image display area line of demarcation;For example, presetting a fixed partitioned mode
For interface image is divided into 9 regions in matrix distribution.It subregion identification can also instruct based on the received by interface image
Be divided into several regions, and in interface image display area line of demarcation, wherein subregion identification instruction in include region number
Amount, for example, being 4 by interface image equal part or random division when the quantity that the identification instruction of received subregion includes region is 4
Region.
Step 2034, corresponding area identification is shown in the region.
In the present embodiment, area identification can be number, letter etc., and area identification is associated with corresponding region and one by one
It is corresponding.Such as Fig. 6 a, interface image 1 is divided into 3 regions, the display area mark 1 in first area 61, in second area 62
Interior display area mark 2, the display area mark 3 in third region 63.By setting area identification corresponding with region, and will
Area identification is shown in corresponding region, and the accuracy of voice control can be improved.
In the present embodiment, phonetic order includes area voice and control voice, the identification of phonetic order based on the received
The control image for including in the interface image, and the step of determining target widget image corresponding with the phonetic order, packet
Include following sub-step:
Step 2035, by parsing the area voice in the phonetic order, corresponding target area is determined.
Parse phonetic order in area voice, area voice is matched with the set of area identification, when matching at
When function, region corresponding with the matched area identification of the area voice is determined as target area.Such as Fig. 6 a, when in phonetic order
The corresponding target area of area voice be second area 62 when, second area 62 is determined as target area.
Step 2036, the target area in the interface image is identified, the control image for including to the target area
With corresponding voice identifier;Wherein, the triggerable control one of the control image and corresponding region in the current display interface
One is corresponding.
Target area is a portion in the interface image, is obtained in target area by image recognition technology
Control image, and match corresponding voice identifier to control image, voice identifier is stored in default sound bank, control image with
The triggerable control of corresponding region corresponds in the current display interface.
Such as Fig. 6 b, the control image in target area is obtained by image recognition technology, i.e., in acquisition second area 62
Control image includes the second control image 42 in second area 62, and the matched voice identifier of the second control image 42 is " section
Learn ", and be that " science " is stored in default sound bank by voice identifier.
Step 2037, control image corresponding with the voice identifier of control voice match in the phonetic order is determined
For target widget image.
Including the corresponding voice identifier of control images all in target area in default sound bank, parse in phonetic order
Control voice matches control voice with the voice identifier in default sound bank, if successful match, by the voice identifier
Corresponding control image is determined as target widget image.
As Fig. 6 b only corresponds to a control image, i.e., the second control when voice identifier is " science " in second area 62
Second control image 42 is then determined as target widget image by part image 42.
If in the target area, above-mentioned steps 2032 can be referred to when the corresponding control image more than one of voice identifier
In description, further screening determine unique control image be target widget image, details are not described herein again.
Step 204, the display resolution of the current display interface is obtained.
Display resolution is the resolution ratio of display when displaying an image, is measured with pixel.The numerical value of display resolution
Refer to the quantity of horizontal pixel and vertical pixel in all effective areas of whole display.For example, the display of 1920*1080 is differentiated
Rate refers to that the horizontal pixel for the current display interface that display is shown has 1920, and vertical pixel has 1080.
Step 205, image coordinate system corresponding with the interface image is established according to the display resolution.
It is that origin establishes rectangular coordinate system as unit of pixel by the lower left corner of interface image in the present embodiment, it can be with
Understand, the unit length of abscissa and ordinate in image coordinate system is the width of a pixel in interface image.Such as figure
7, the lower-left angular coordinate of interface image 1 is (0,0), and bottom right angular coordinate is (1920,0), and top left co-ordinate is (0,1080), upper right
Angular coordinate is (1920,1080).
Step 206, according to described image coordinate system, the coordinate of the target widget image is determined.
The coordinate of target widget image can be the coordinate at any point in feeling the pulse with the finger-tip mark control image, or refer to target widget
The coordinate at the edge of image can also refer in particular to the coordinate of target widget image center.General control image is in rectangle, passes through meter
The height and width of corresponding control image can be obtained in the coordinate for calculating vertex, and calculate the marginal point of control image coordinate and
The coordinate of central point.
Such as Fig. 7, when target widget image is the second control image 42, the lower-left angular coordinate of the second control image 42 is
(700,428), top left co-ordinate are (700,612), and upper right angular coordinate is (980,612), and bottom right angular coordinate is (980,428),
The coordinate that central point is calculated is (840,520).
Step 207, the position according to the target widget image in the interface image, the determining and target widget
The corresponding target of image can trigger target position of the control in the current display interface.
Due to interface image be with the one-to-one image of current display interface size figure, target widget image is unique
, and it is also uniquely, accordingly, it is determined that target widget image is at interface that target corresponding with target widget image, which can trigger control,
Position in image, so that it may determine that target can trigger control in the target position of current display interface.
Step 208, the cursor is moved to the target position.
According to the target position that above-mentioned steps determine, cursor, which is moved to, can trigger the corresponding target position of control with target
It sets, can refer to the description of above-mentioned steps 104, therefore repeat no more.
Preferably, further include following sub-step after the described the step of cursor is moved to the target position:
Receive orientation phonetic order;
According to the mobile cursor of the orientation phonetic order.
Orientation phonetic order includes direction instruction and digital command, and cursor is according to the orientation phonetic order received towards direction
Instruct the mobile step number corresponding with digital command in corresponding direction.Unit step number can be set to the quantity for being separated by triggerable control,
Or the width of a pixel, it is not construed as limiting herein.Such as Fig. 4, when the instruction of received orientation is " when the right side 1 ", cursor is from Fig. 4
First control image, 41 position is moved to 42 position of the second control image.
Preferably, further include following sub-step after the described the step of cursor is moved to the target position:
The target for triggering the target position can trigger control.
In the present embodiment, triggering the target and can trigger control includes recalling target to can trigger the corresponding function dish of control
Interface that is single, can trigger control link into target etc.;In a mouse mode, triggering the target can trigger the operation packet of control
Include the operation such as single left button mouse click, double left button mouse click, a mouse click mail;Under control mode touch mode, display be include touch
Screen or handwriting screen etc., triggering the target and can trigger the operation of control includes the behaviour such as clicking screen, double-clicking screen, long-pressing screen
Make.When triggering target can trigger control, cancel interface image, i.e. interface image is removed from the top layer of current display interface.
Preferably, the method for the voice control further include:
Exit voice control mode.
In the present embodiment, the operation for exiting voice control mode can be when starting any one after voice control mode
It carves and carries out, user can exit voice control mode by the corresponding key of triggering terminal, can also be by defeated to terminal speech
Enter module sending voice control mode to exit command to exit voice control mode.The voice control mode exits command can be with
By terminal preset in advance, storage is into terminal after can also being customized by the user setting.Voice control mode is exited when receiving
Operation when, cancel interface image, i.e., interface image from the top layer of current display interface remove.
The present embodiment obtains and current display interface pair current display interface screenshotss by starting voice control mode
The interface image answered;The control image for including in phonetic order identification interface image based on the received, and determining and phonetic order
Corresponding target widget image;It is established by obtaining the display resolution of current display interface, and according to the display resolution
Image coordinate system corresponding with interface image determines the coordinate of target widget image, according to target widget according to image coordinate system
Position of the image in interface image determines that target corresponding with target widget image can trigger control in current display interface
Target position, the cursor is finally moved to the target position, the embodiment of the present invention by by image recognition technology with
Speech recognition technology combines, and realizes that voice control cursor is mobile, to improve the interactivity of speech recognition technology and operation interface.
It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method
It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to
According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should
Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented
Necessary to example.
Embodiment three:
Referring to Fig. 8, a kind of structural block diagram of the device of voice control of the embodiment of the present invention three is shown, specifically can wrap
It includes with lower module:
Interface image obtains module 301, for obtaining interface image corresponding with current display interface;
Picture recognition module 302 identifies the control figure for including in the interface image for phonetic order based on the received
Picture, and determine target widget image corresponding with the phonetic order;
Target position determining module 303, for the position according to the target widget image in the interface image, really
Fixed target corresponding with the target widget image can trigger target position of the control in the current display interface;
Cursor control module 304, for the cursor to be moved to the target position.
The device of the present embodiment is used to execute the method and step in above-described embodiment, and details are not described herein.
The embodiment of the present invention obtains module by interface image and obtains interface image corresponding with current display interface, then
By picture recognition module, phonetic order identifies the control image for including in the interface image, and determining and institute based on the received
The corresponding target widget image of phonetic order is stated, then passes through target position determining module according to the target widget image in institute
The position in interface image is stated, determines that target corresponding with the target widget image can trigger control in current display circle
Target position in face, finally, the cursor is moved to the target position by cursor control module;The embodiment of the present invention
By the way that image recognition technology in conjunction with speech recognition technology, is realized that voice control cursor is mobile, to improve speech recognition technology
With the interactivity of operation interface.
Example IV:
Referring to Fig. 9, a kind of structural block diagram of the device of voice control of the embodiment of the present invention four is shown, specifically can wrap
It includes with lower module:
Voice control starting module 401, for starting voice control mode.
After terminal starts voice control mode, user can input phonetic order to terminal, realize voice control function.With
Family can start voice control mode by the corresponding key of triggering terminal, can also be by issuing to terminal speech input module
Voice control mode start command is to start voice control mode.The voice control mode start command can be shifted to an earlier date by terminal
Default, storage is into terminal after can also being customized by the user setting.
Interface screen capture module 402, for obtaining boundary corresponding with the current display interface to current display interface screenshotss
Face image.
After terminal starts voice control mode, start full frame screenshotss program automatically, to current display interface screenshotss, obtains
Current display interface screenshot be interface image corresponding with current display interface;As an example, interface image is proper
Current display interface is completely covered well, and the transparency of interface image is translucent.For example, using transparency be 100 as
The understanding standard of pellucidity, translucent can be understood as transparency between 30~70.In the present embodiment, interface image
Obtained by the full frame screenshotss of current display interface, and be translucent and state and cut current display interface is completely covered, in order to
Family operates current display interface according to the information in interface image, plays the role of prompt and guidance.
Picture recognition module 403 identifies the control figure for including in the interface image for phonetic order based on the received
Picture, and determine target widget image corresponding with the phonetic order.
Figure 10 is please referred to, as an example, described image identification module 403 includes following submodule:
Full frame identification submodule 4031, for identification interface image, the control image for including to the interface image
Match corresponding voice identifier;Wherein, it can trigger control one-to-one correspondence in the control image and the current display interface.
After terminal receives phonetic order, i.e., image recognition is carried out to interface image, image recognition, which refers to, utilizes computer
Image is handled, analyzed and is understood, to identify the target of various different modes and the technology of object.Pass through image recognition skill
Art obtains the control image in interface image, can trigger control in control image and current display interface and corresponds, Ke Yili
It solves, the control image in interface image is the image that can trigger control in current display interface;When in control image comprising text
When word, voice identifier of the corresponding text as the control image is directly acquired by image recognition technology, and be stored in default language
In sound library;When control image does not include text, the corresponding voice identifier of the control image is preset, and is stored in default sound bank
In.
Full frame target widget image determines submodule 4032, and being used for will be corresponding with the matched voice identifier of the phonetic order
Control image be determined as target widget image.
Including the corresponding voice identifier of control images all in interface image in default sound bank, parse in phonetic order
Control voice matches control voice with the voice identifier in default sound bank, if successful match, by the voice identifier
Corresponding control image is determined as target widget image.
Specifically, the control image is then determined as target control when the corresponding control image of the voice identifier is one
Part image;When the corresponding control image of the voice identifier is more than two, to more than two control figures in interface image
As being numbered;The voice selecting comprising label is received to instruct;According to voice selecting instruct in number determine reference numeral
Control image is target widget image.
Figure 11 is please referred to, as another example, described device further includes following submodule:
Image divides submodule 4033, for the interface image to be divided into several regions.
It, can be in one fixed partitioned mode of systemic presupposition, according to the partitioned mode of preset fixation in the present embodiment
Interface image is divided, and in interface image display area line of demarcation;For example, presetting a fixed partitioned mode
For interface image is divided into 9 regions in matrix distribution.It subregion identification can also instruct based on the received by interface image
Be divided into several regions, and in interface image display area line of demarcation, wherein subregion identification instruction in include region number
Amount, for example, being 4 by interface image equal part or random division when the quantity that the identification instruction of received subregion includes region is 4
Region.
Area identification display sub-module 4034, for showing corresponding area identification in the region.
In the present embodiment, area identification can be number, letter etc., and area identification is associated with corresponding region and one by one
It is corresponding.
In the present embodiment, phonetic order includes area voice and control voice, and described image identification module 403 includes as follows
Submodule:
Target area determines submodule 4035, for determining and corresponding to by parsing the area voice in the phonetic order
Target area.
Parse phonetic order in area voice, area voice is matched with the set of area identification, when matching at
When function, region corresponding with the matched area identification of the area voice is determined as target area.
Target area identifies submodule 4036, for identification the target area in the interface image, to the target area
The control image that domain includes matches corresponding voice identifier;Wherein, the control image is corresponding with the current display interface
The triggerable control in region corresponds.
Target area is a portion in the interface image, is obtained in target area by image recognition technology
Control image, and match corresponding voice identifier to control image, voice identifier is stored in default sound bank, control image with
The triggerable control of corresponding region corresponds in the current display interface.
Regional aim control image determines submodule 4037, for by with the control voice match in the phonetic order
The corresponding control image of voice identifier is determined as target widget image.
Including the corresponding voice identifier of control images all in target area in default sound bank, parse in phonetic order
Control voice matches control voice with the voice identifier in default sound bank, if successful match, by the voice identifier
Corresponding control image is determined as target widget image.If in the target area, the corresponding control image of voice identifier more than one
When a, the control image in the target area of interface image is numbered;The voice selecting comprising label is received to instruct;According to
Number in voice selecting instruction determines that the control image of reference numeral is target widget image.
Resolution ratio obtains module 404, for obtaining the display resolution of the current display interface.
Display resolution is the resolution ratio of display when displaying an image, is measured with pixel.The numerical value of display resolution
Refer to the quantity of horizontal pixel and vertical pixel in all effective areas of whole display.For example, the display of 1920*1080 is differentiated
Rate refers to that the horizontal pixel for the current display interface that display is shown has 1920, and vertical pixel has 1080.
Image coordinate system establishes module 405, corresponding with the interface image for being established according to the display resolution
Image coordinate system.
It is that origin establishes rectangular coordinate system as unit of pixel by the lower left corner of interface image in the present embodiment, it can be with
Understand, the unit length of abscissa and ordinate in image coordinate system is the width of a pixel in interface image.
Target widget image coordinate determining module 406, for determining the target widget figure according to described image coordinate system
The coordinate of picture.
The coordinate of target widget image can be the coordinate at any point in feeling the pulse with the finger-tip mark control image, or refer to target widget
The coordinate at the edge of image can also refer in particular to the coordinate of target widget image center.General control image is in rectangle, passes through meter
The height and width of corresponding control image can be obtained in the coordinate for calculating vertex, and calculate the marginal point of control image coordinate and
The coordinate of central point.
Target position determining module 407, for the position according to the target widget image in the interface image, really
Fixed target corresponding with the target widget image can trigger target position of the control in the current display interface.
Due to interface image be with the one-to-one image of current display interface size figure, target widget image is unique
, and it is also uniquely, accordingly, it is determined that target widget image is at interface that target corresponding with target widget image, which can trigger control,
Position in image, so that it may determine that target can trigger control in the target position of current display interface.
Cursor control module 408, for the cursor to be moved to the target position.
According to the target position that above-mentioned target position determining module 407 determines, cursor, which is moved to, can trigger control with target
The corresponding target position of part.
Preferably, described device further includes following module:
Orientation speech reception module, for receiving orientation phonetic order;
Orientation speech control module, for according to the mobile cursor of the orientation phonetic order.
Orientation phonetic order includes direction instruction and digital command, and cursor is according to the orientation phonetic order received towards direction
Instruct the mobile step number corresponding with digital command in corresponding direction.Unit step number can be set to the quantity for being separated by triggerable control,
Or the width of a pixel, it is not construed as limiting herein.
Preferably, described device further includes following module:
Trigger module, the target for triggering the target position can trigger control.
In the present embodiment, triggering the target and can trigger control includes recalling target to can trigger the corresponding function dish of control
Interface that is single, can trigger control link into target;Triggering the target and can trigger the operation of control includes clicking, double-clicking, growing
It is operated by equal.When triggering target can trigger control, cancel interface image, i.e. interface image is moved from the top layer of current display interface
It removes.
Preferably, described device further includes following module:
Module is exited, for exiting voice control mode.
In the present embodiment, the operation for exiting voice control mode can be when starting any one after voice control mode
It carves and carries out, user can exit voice control mode by the corresponding key of triggering terminal, can also be by defeated to terminal speech
Enter module sending voice control mode to exit command to exit voice control mode.The voice control mode exits command can be with
By terminal preset in advance, storage is into terminal after can also being customized by the user setting.Voice control mode is exited when receiving
Operation when, cancel interface image, i.e., interface image from the top layer of current display interface remove.
The embodiment of the present invention four starts voice control mode by voice control starting module, passes through interface screen capture module pair
Current display interface screenshotss obtain interface image corresponding with current display interface;Based on the received by picture recognition module
The control image for including in phonetic order identification interface image, and determine target widget image corresponding with phonetic order;Pass through
Resolution ratio obtains the display resolution that module obtains current display interface, establishes module according to the display by image coordinate system
Resolution ratio establishes image coordinate system corresponding with interface image, by target widget image coordinate determining module according to image coordinate
System, determine the coordinate of target widget image, by target position determining module according to target widget image in interface image
Position determines that target corresponding with target widget image can trigger target position of the control in current display interface, finally leads to
It crosses cursor control module and the cursor is moved to the target position, the embodiment of the present invention is by by image recognition technology and language
Sound identification technology combines, and realizes that voice control cursor is mobile, to improve the interactivity of speech recognition technology and operation interface.
For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple
Place illustrates referring to the part of embodiment of the method.
Method to a kind of voice control provided by the present invention and a kind of device of voice control above have carried out in detail
It introduces, used herein a specific example illustrates the principle and implementation of the invention, the explanation of above embodiments
It is merely used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, according to this
The thought of invention, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification is not answered
It is interpreted as limitation of the present invention.
Claims (10)
1. a kind of method of voice control characterized by comprising
Obtain interface image corresponding with current display interface;
Phonetic order identifies the control image for including in the interface image, and the determining and phonetic order pair based on the received
The target widget image answered;
According to position of the target widget image in the interface image, mesh corresponding with the target widget image is determined
It marks and can trigger target position of the control in the current display interface;
The cursor is moved to the target position.
2. the method according to claim 1, wherein the phonetic order based on the received identifies the surface chart
The control image for including as in, and the step of determining target widget image corresponding with the phonetic order, comprising:
It identifies the interface image, corresponding voice identifier is matched to the control image that the interface image includes;Wherein, described
It can trigger control in control image and the current display interface to correspond;
Control image corresponding with the matched voice identifier of the phonetic order is determined as target widget image.
3. the method according to claim 1, wherein identifying the interface in the phonetic order based on the received
The control image for including in image, and before the step of determining target widget image corresponding with the phonetic order, comprising:
The interface image is divided into several regions;
Corresponding area identification is shown in the region.
4. according to the method described in claim 3, it is characterized in that, the phonetic order includes area voice and control voice;
The phonetic order based on the received identifies the control image for including in the interface image, and the determining and phonetic order pair
The step of target widget image answered, comprising:
By parsing the area voice in the phonetic order, corresponding target area is determined;
It identifies the target area in the interface image, corresponding voice mark is matched to the control image that the target area includes
Know;Wherein, the control image and the triggerable control of corresponding region in the current display interface correspond;
Control image corresponding with the voice identifier of control voice match in the phonetic order is determined as target widget figure
Picture.
5. method according to claim 1 or 2 or 4, which is characterized in that described in the identification of phonetic order based on the received
The control image for including in interface image, and the step of determining target widget image corresponding with the phonetic order, further includes:
When control image corresponding with the phonetic order includes more than two, to more than two in the interface image
Control image is numbered;
Receive the voice selecting instruction comprising number;
The control image that reference numeral is determined according to the number in voice selecting instruction is target widget image.
6. method according to claim 1 or 2 or 4, which is characterized in that described to obtain boundary corresponding with current display interface
The step of step of face image, comprising:
Start voice control mode;
To current display interface screenshotss, interface image corresponding with current display interface is obtained.
7. method according to claim 1 or 2 or 4, which is characterized in that it is described according to the target widget image in institute
The position in interface image is stated, determines that target corresponding with the target widget image can trigger control in current display circle
Before the step of target position in face, comprising:
Obtain the display resolution of the current display interface;
Image coordinate system corresponding with the interface image is established according to the display resolution.
According to described image coordinate system, the coordinate of the target widget image is determined.
8. the method according to the description of claim 7 is characterized in that the cursor is moved to the target position described
After step, further includes:
Receive orientation phonetic order;
According to the mobile cursor of the orientation phonetic order.
9. the method according to claim 1, wherein the cursor is moved to the target position described
After step, comprising:
The target for triggering the target position can trigger control.
10. a kind of phonetic controller characterized by comprising
Interface image obtains module, for obtaining interface image corresponding with current display interface;
Picture recognition module identifies the control image for including in the interface image for phonetic order based on the received, and really
Fixed target widget image corresponding with the phonetic order;
Target position determining module, for the position according to the target widget image in the interface image, determining and institute
Stating the corresponding target of target widget image can trigger target position of the control in the current display interface;
Cursor control module, for the cursor to be moved to the target position.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910473077.3A CN110136718A (en) | 2019-05-31 | 2019-05-31 | The method and apparatus of voice control |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910473077.3A CN110136718A (en) | 2019-05-31 | 2019-05-31 | The method and apparatus of voice control |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110136718A true CN110136718A (en) | 2019-08-16 |
Family
ID=67579595
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910473077.3A Pending CN110136718A (en) | 2019-05-31 | 2019-05-31 | The method and apparatus of voice control |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110136718A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110675874A (en) * | 2019-09-29 | 2020-01-10 | 深圳欧博思智能科技有限公司 | Method for realizing interaction between virtual character and UI (user interface) based on intelligent sound box |
CN111263236A (en) * | 2020-02-21 | 2020-06-09 | 广州欢网科技有限责任公司 | Voice adaptation method and device for television application and voice control method |
CN111968639A (en) * | 2020-08-14 | 2020-11-20 | 北京小米松果电子有限公司 | Voice control method and device, electronic equipment and storage medium |
CN112445450A (en) * | 2019-08-30 | 2021-03-05 | 比亚迪股份有限公司 | Method and device for controlling terminal based on voice, storage medium and electronic equipment |
CN112732379A (en) * | 2020-12-30 | 2021-04-30 | 智道网联科技(北京)有限公司 | Operation method of application program on intelligent terminal, terminal and storage medium |
CN114467140A (en) * | 2020-08-05 | 2022-05-10 | 互动解决方案公司 | System for changing image based on voice |
WO2023103917A1 (en) * | 2021-12-09 | 2023-06-15 | 杭州逗酷软件科技有限公司 | Speech control method and apparatus, and electronic device and storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003263308A (en) * | 2002-12-27 | 2003-09-19 | Nec Infrontia Corp | Screen control device and method |
US20130046537A1 (en) * | 2011-08-19 | 2013-02-21 | Dolbey & Company, Inc. | Systems and Methods for Providing an Electronic Dictation Interface |
CN104184890A (en) * | 2014-08-11 | 2014-12-03 | 联想(北京)有限公司 | Information processing method and electronic device |
CN104360805A (en) * | 2014-11-28 | 2015-02-18 | 广东欧珀移动通信有限公司 | Application icon management method and application icon management device |
CN104965596A (en) * | 2015-07-24 | 2015-10-07 | 上海宝宏软件有限公司 | Voice control system |
CN107358953A (en) * | 2017-06-30 | 2017-11-17 | 努比亚技术有限公司 | Sound control method, mobile terminal and storage medium |
US9922651B1 (en) * | 2014-08-13 | 2018-03-20 | Rockwell Collins, Inc. | Avionics text entry, cursor control, and display format selection via voice recognition |
CN109213470A (en) * | 2018-09-11 | 2019-01-15 | 昆明理工大学 | A kind of cursor control method based on speech recognition |
CN109391833A (en) * | 2018-09-13 | 2019-02-26 | 苏宁智能终端有限公司 | A kind of sound control method and smart television of smart television |
CN109471678A (en) * | 2018-11-07 | 2019-03-15 | 苏州思必驰信息科技有限公司 | Voice midpoint controlling method and device based on image recognition |
-
2019
- 2019-05-31 CN CN201910473077.3A patent/CN110136718A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003263308A (en) * | 2002-12-27 | 2003-09-19 | Nec Infrontia Corp | Screen control device and method |
US20130046537A1 (en) * | 2011-08-19 | 2013-02-21 | Dolbey & Company, Inc. | Systems and Methods for Providing an Electronic Dictation Interface |
CN104184890A (en) * | 2014-08-11 | 2014-12-03 | 联想(北京)有限公司 | Information processing method and electronic device |
US9922651B1 (en) * | 2014-08-13 | 2018-03-20 | Rockwell Collins, Inc. | Avionics text entry, cursor control, and display format selection via voice recognition |
CN104360805A (en) * | 2014-11-28 | 2015-02-18 | 广东欧珀移动通信有限公司 | Application icon management method and application icon management device |
CN104965596A (en) * | 2015-07-24 | 2015-10-07 | 上海宝宏软件有限公司 | Voice control system |
CN107358953A (en) * | 2017-06-30 | 2017-11-17 | 努比亚技术有限公司 | Sound control method, mobile terminal and storage medium |
CN109213470A (en) * | 2018-09-11 | 2019-01-15 | 昆明理工大学 | A kind of cursor control method based on speech recognition |
CN109391833A (en) * | 2018-09-13 | 2019-02-26 | 苏宁智能终端有限公司 | A kind of sound control method and smart television of smart television |
CN109471678A (en) * | 2018-11-07 | 2019-03-15 | 苏州思必驰信息科技有限公司 | Voice midpoint controlling method and device based on image recognition |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112445450A (en) * | 2019-08-30 | 2021-03-05 | 比亚迪股份有限公司 | Method and device for controlling terminal based on voice, storage medium and electronic equipment |
CN110675874A (en) * | 2019-09-29 | 2020-01-10 | 深圳欧博思智能科技有限公司 | Method for realizing interaction between virtual character and UI (user interface) based on intelligent sound box |
CN111263236A (en) * | 2020-02-21 | 2020-06-09 | 广州欢网科技有限责任公司 | Voice adaptation method and device for television application and voice control method |
CN111263236B (en) * | 2020-02-21 | 2022-04-12 | 广州欢网科技有限责任公司 | Voice adaptation method and device for television application and voice control method |
CN114467140A (en) * | 2020-08-05 | 2022-05-10 | 互动解决方案公司 | System for changing image based on voice |
US11568877B2 (en) | 2020-08-05 | 2023-01-31 | Interactive Solutions Corp. | System to change image based on voice |
CN111968639A (en) * | 2020-08-14 | 2020-11-20 | 北京小米松果电子有限公司 | Voice control method and device, electronic equipment and storage medium |
CN112732379A (en) * | 2020-12-30 | 2021-04-30 | 智道网联科技(北京)有限公司 | Operation method of application program on intelligent terminal, terminal and storage medium |
CN112732379B (en) * | 2020-12-30 | 2023-12-15 | 智道网联科技(北京)有限公司 | Method for running application program on intelligent terminal, terminal and storage medium |
WO2023103917A1 (en) * | 2021-12-09 | 2023-06-15 | 杭州逗酷软件科技有限公司 | Speech control method and apparatus, and electronic device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110136718A (en) | The method and apparatus of voice control | |
US10929013B2 (en) | Method for adjusting input virtual keyboard and input apparatus | |
US7107079B2 (en) | Cellular phone set | |
CN106415472B (en) | Gesture control method and device, terminal equipment and storage medium | |
CN103488413B (en) | Touch control device and show control method and the device at 3D interface on touch control device | |
CN103593136A (en) | Touch terminal, and one-hand operation method and device of large-screen touch terminal | |
WO2012169155A1 (en) | Information processing terminal and method, program, and recording medium | |
CN103257811A (en) | Picture display system and method based on touch screen | |
US10732808B2 (en) | Information processing device, information processing method, and program | |
US20170047065A1 (en) | Voice-controllable image display device and voice control method for image display device | |
CN105227985B (en) | Show equipment and its control method | |
KR20110025520A (en) | Apparatus and method for controlling a mobile terminal | |
CN110007826A (en) | The mobile method and apparatus of voice control cursor | |
CN107066176A (en) | A kind of control method and device of the singlehanded pattern of terminal | |
CN109165033B (en) | Application updating method and mobile terminal | |
EP2544082A1 (en) | Image display system, information processing apparatus, display apparatus, and image display method | |
CN106648402A (en) | Information sending method and device and information processing method and device | |
CN106354376A (en) | Information processing method and client terminal | |
CN111414115A (en) | Key control method, computer readable storage medium and terminal thereof | |
CN111367483A (en) | Interaction control method and electronic equipment | |
CN108845756B (en) | Touch operation method and device, storage medium and electronic equipment | |
CN102693084B (en) | Mobile terminal and the method for response operation thereof | |
CN113298212A (en) | Graphic code display method and device | |
CN106909272A (en) | A kind of display control method and mobile terminal | |
CN103914228A (en) | Mobile terminal and touch screen operating method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190816 |
|
RJ01 | Rejection of invention patent application after publication |