CN110136718A

CN110136718A - The method and apparatus of voice control

Info

Publication number: CN110136718A
Application number: CN201910473077.3A
Authority: CN
Inventors: 童宗伟
Original assignee: Shenzhen Core Electronics Co Ltd
Current assignee: Shenzhen Core Electronics Co Ltd
Priority date: 2019-05-31
Filing date: 2019-05-31
Publication date: 2019-08-16

Abstract

The embodiment of the invention provides a kind of method and apparatus of voice control, are related to voice control technology field, and the method includes obtaining interface image corresponding with current display interface；Phonetic order identifies the control image for including in the interface image based on the received, and determines target widget image corresponding with the phonetic order；According to position of the target widget image in the interface image, determine that target corresponding with the target widget image can trigger target position of the control in the current display interface；The cursor is moved to the target position.The embodiment of the present invention is based on image recognition technology and realizes that voice control cursor is mobile, to improve the interactivity of speech recognition technology and operation interface.

Description

The method and apparatus of voice control

Technical field

The present invention relates to voice control technology fields, method and a kind of voice control more particularly to a kind of voice control Device.

Background technique

With the continuous development of mechanics of communication, the control mode of terminal is more and more abundant, also more and more intelligent.In man-machine friendship Mutually in application, speech recognition technology progresses into our daily life, such as existing smart phone, tablet computer, intelligence electricity Depending on etc. be all accompanied with speech identifying function.Although list relatively simple in specified range may be implemented in existing speech recognition technology One instruction task, such as the operation of " opening camera ".But shown on existing speech recognition technology and current operation interface Content is unrelated, and interactivity is small, and not can control cursor and move in operation interface, can not directly trigger on current operation interface Corresponding triggerable control.

Summary of the invention

In view of the above problems, it proposes the embodiment of the present invention and overcomes the above problem or at least partly in order to provide one kind A kind of method of the voice control to solve the above problems and a kind of corresponding device of voice control.

To solve the above-mentioned problems, the embodiment of the invention discloses a kind of methods of voice control, including

Obtain interface image corresponding with current display interface；

Phonetic order identifies the control image for including in the interface image based on the received, and determination refers to the voice Enable corresponding target widget image；

According to position of the target widget image in the interface image, determination is corresponding with the target widget image Target can trigger target position of the control in the current display interface；

The cursor is moved to the target position.

In a preferred embodiment, the phonetic order based on the received identifies the control for including in the interface image Image, and the step of determining target widget image corresponding with the phonetic order, comprising:

It identifies the interface image, corresponding voice identifier is matched to the control image that the interface image includes；Wherein, It can trigger control in the control image and the current display interface to correspond；

Control image corresponding with the matched voice identifier of the phonetic order is determined as target widget image.

In a preferred embodiment, the control for including in the interface image is identified in the phonetic order based on the received Part image, and before the step of determining target widget image corresponding with the phonetic order, comprising:

The interface image is divided into several regions；

Corresponding area identification is shown in the region.

In a preferred embodiment, the phonetic order includes area voice and control voice；It is described based on the received Phonetic order identifies the control image for including in the interface image, and determines target widget figure corresponding with the phonetic order The step of picture, comprising:

By parsing the area voice in the phonetic order, corresponding target area is determined；

It identifies the target area in the interface image, corresponding language is matched to the control image that the target area includes Phonetic symbol is known；Wherein, the control image and the triggerable control of corresponding region in the current display interface correspond；

Control image corresponding with the voice identifier of control voice match in the phonetic order is determined as target control Part image.

In a preferred embodiment, the phonetic order based on the received identifies the control for including in the interface image Image, and the step of determining target widget image corresponding with the phonetic order, further includes:

When control image corresponding with the phonetic order includes more than two, in the interface image to two with On control image be numbered；

Receive the voice selecting instruction comprising number；

The control image that reference numeral is determined according to the number in voice selecting instruction is target widget image.

In a preferred embodiment, the step of step for obtaining interface image corresponding with current display interface, Include:

Start voice control mode；

To current display interface screenshotss, interface image corresponding with current display interface is obtained.

In a preferred embodiment, in the position according to the target widget image in the interface image, Determine that target corresponding with the target widget image can trigger the step of target position of the control in the current display interface Before rapid, comprising:

Obtain the display resolution of the current display interface；

Image coordinate system corresponding with the interface image is established according to the display resolution.

According to described image coordinate system, the coordinate of the target widget image is determined.

In a preferred embodiment, it after the described the step of cursor is moved to the target position, also wraps It includes:

Receive orientation phonetic order；

According to the mobile cursor of the orientation phonetic order.

In a preferred embodiment, after the described the step of cursor is moved to the target position, comprising:

The target for triggering the target position can trigger control.

To solve the above-mentioned problems, the device mobile the embodiment of the invention discloses a kind of voice control cursor, comprising:

Interface image obtains module, for obtaining interface image corresponding with current display interface；

Picture recognition module identifies the control image for including in the interface image for phonetic order based on the received, And determine target widget image corresponding with the phonetic order；

Target position determining module is determined for the position according to the target widget image in the interface image Target corresponding with the target widget image can trigger target position of the control in the current display interface；

Cursor control module, for the cursor to be moved to the target position.

Compared with prior art, the embodiment of the present invention includes following advantages:

In the embodiment of the present invention, by obtaining interface image corresponding with current display interface；Then language based on the received The control image for including in sound instruction identification interface image, and determine corresponding with phonetic order target widget image, then root According to position of the target widget image in interface image, determine that target corresponding with target widget image can trigger control current The cursor is finally moved to target position by the target position in display interface；The embodiment of the present invention is by by image recognition Technology realizes that voice control cursor is mobile, in conjunction with speech recognition technology to improve the friendship of speech recognition technology and operation interface Mutual property.

Detailed description of the invention

Fig. 1 is a kind of step flow chart of the method for voice control of the embodiment of the present invention one；

Fig. 2 is a kind of step flow chart of the method for voice control of the embodiment of the present invention two；

Fig. 3 is the exemplary flow chart of steps of one of which in the embodiment of the present invention two；

Fig. 4 a-4b is the interface image schematic diagram of the method for voice control corresponding with Fig. 3；

Fig. 5 is the exemplary flow chart of steps of another kind in the embodiment of the present invention two；

Fig. 6 a-6b is the interface image schematic diagram of the method for voice control corresponding with Fig. 5；

Fig. 7 is the interface image schematic diagram of the method for voice control of the embodiment of the present invention；

Fig. 8 is a kind of structural block diagram of the device of voice control of the embodiment of the present invention three；

Fig. 9 is a kind of structural block diagram of the device of voice control of the embodiment of the present invention four；

Figure 10 is the exemplary structural block diagram of one of which in the embodiment of the present invention four；

Figure 11 is the exemplary structural block diagram of another kind in the embodiment of the present invention four.

Specific embodiment

In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.

One of the core concepts of the embodiments of the present invention is, by obtaining interface image corresponding with current display interface； Then the control image for including in phonetic order identification interface image based on the received, and determine target corresponding with phonetic order Control image determines target corresponding with target widget image then according to position of the target widget image in interface image It can trigger target position of the control in current display interface, the cursor be finally moved to target position；To realize language Sound control cursor is mobile, to improve the interactivity of speech recognition technology and operation interface.

In the following, being described in detail respectively to the solution of the present invention by following specific embodiments.

Embodiment one:

Referring to Fig.1, a kind of step flow chart of the embodiment of the method one of voice control of the invention is shown, it specifically can be with Include the following steps:

Step 101, interface image corresponding with current display interface is obtained.

In embodiments of the present invention, the display of terminal can support mouse mode and/or support control mode touch mode.Cursor Referred to as vernier, for position of the display highlighting control input equipment in operation interface, cursor is generally divided into explicit cursor and hidden Formula cursor.In a mouse mode, cursor is generally explicit cursor, i.e. the cursor upper layer that is shown in current display interface, and can be with It is moved to any position of display；Under control mode touch mode, display is touch control display, and the touch control display includes touching Screen, handwriting screen etc., cursor are generally implicit cursor, can be moved to any position of display, hidden when meeting certain condition Formula cursor is visible.Current display interface includes that at least one can trigger control, can trigger control and is used to interact with user, with Realize corresponding function.Specifically, can trigger control can be picture or text with link, it is also possible to the tool icon etc.. Interface image be with the one-to-one image of current display interface size figure, can be by being obtained to current display interface screenshotss It arrives, current display interface can also be replicated and obtained, the translucent top layer for being presented on current display interface of interface image.

Step 102, phonetic order identifies the control image for including in the interface image, and determining and institute based on the received State the corresponding target widget image of phonetic order.

In embodiments of the present invention, speech voice input function may be implemented in terminal, to receive phonetic order.Language based on the received Sound instruction, is handled the part in the entirety or interface image of interface image by image recognition technology, to identify The control image for including in the entirety of interface image or the part of interface image, control image and current display interface are current aobvious Show that the corresponding local triggerable control in interface corresponds.When in control image including text, pass through image recognition technology Voice identifier of the corresponding text as the control image is directly acquired, and is stored in default sound bank；When control image does not wrap When containing text, presets the corresponding voice identifier of the control image, and is stored in default sound bank.It include control language in phonetic order Sound matches the control voice in phonetic order with the voice identifier in default sound bank, if successful match, and the voice Identify corresponding control image be one when, then using the control image as target widget image；If the voice identifier is corresponding When control image is more than two, then more than two control images corresponding to the voice identifier carry out the in interface image One visual cues, First look mark, which can be, to be amplified display to control image, is highlighted or numbers display；Pass through shifting Dynamic cursor continues to voice selecting instruction to determine only one control image as target widget image.

Step 103, the position according to the target widget image in the interface image, the determining and target widget The corresponding target of image can trigger target position of the control in the current display interface.

After determining target widget image, position of the target widget image in interface image is determined, it can be by building Vertical image coordinate system or the modes such as camera coordinates system or world coordinate system calculate position of the target widget image in interface image It sets.Due to interface image be with the one-to-one image of current display interface size figure, target widget image is unique, and It is also uniquely, accordingly, it is determined that target widget image is in interface image that target corresponding with target widget image, which can trigger control, In position, so that it may determine target can trigger control in the target position of current display interface.

Step 104, the cursor is moved to the target position.

In the present embodiment, target position is any position of the triggerable control of target in current display interface, i.e. cursor Being moved to target can trigger any position of control；When cursor is implicit cursor, the determination of target position can also be preset Target can be touched at this point, target position is the target triggerable control in current display interface for the visible condition of implicit cursor It sends out control and the second visual cues is provided, which is the visible form of implicit cursor, and the second visual cues can be To target can trigger control amplify display, be highlighted, tint display etc..It should be noted that First look mark and Second visual cues needs distinguish, for example, the second visual cues can be adopted when First look is identified using number display With except number display in addition to other display modes, such as amplify show, be highlighted, display of tinting.

The embodiment of the present invention is by obtaining interface image corresponding with current display interface；Then voice refers to based on the received The control image for including in identification interface image is enabled, and determines target widget image corresponding with phonetic order, then according to mesh Position of the control image in interface image is marked, determines that target corresponding with target widget image can trigger control and show currently The cursor is finally moved to target position by the target position in interface；The embodiment of the present invention is by by image recognition technology In conjunction with speech recognition technology, realize that voice control cursor is mobile, to improve the interactivity of speech recognition technology and operation interface.

Embodiment two:

Referring to Fig. 2, a kind of step flow chart of the embodiment of the method two of voice control of the invention is shown, it specifically can be with Include the following steps:

Step 201, start voice control mode.

After terminal starts voice control mode, user can input phonetic order to terminal, realize voice control function.With Family can start voice control mode by the corresponding key of triggering terminal, can also be by issuing to terminal speech input module Voice control mode start command is to start voice control mode.The voice control mode start command can be shifted to an earlier date by terminal Default, storage is into terminal after can also being customized by the user setting.

Step 202, to current display interface screenshotss, interface image corresponding with the current display interface is obtained.

After terminal starts voice control mode, start full frame screenshotss program automatically, to current display interface screenshotss, obtains Current display interface screenshot be interface image corresponding with current display interface；As an example, interface image is proper Current display interface is completely covered well, and the transparency of interface image is translucent.For example, using transparency be 100 as The understanding standard of pellucidity, translucent can be understood as transparency between 30~70.In the present embodiment, interface image Obtained by the full frame screenshotss of current display interface, and be translucent and state and cut current display interface is completely covered, in order to Family operates current display interface according to the information in interface image, plays the role of prompt and guidance.

Step 203, phonetic order identifies the control image for including in the interface image, and determining and institute based on the received State the corresponding target widget image of phonetic order.

Referring to figure 3., as an example, the phonetic order based on the received, which identifies in the interface image, includes Control image, and the step of determining target widget image corresponding with the phonetic order, including following sub-step:

Step 2031, it identifies the interface image, corresponding voice is matched to the control image that the interface image includes Mark；Wherein, it can trigger control one-to-one correspondence in the control image and the current display interface.

After terminal receives phonetic order, i.e., image recognition is carried out to interface image, image recognition, which refers to, utilizes computer Image is handled, analyzed and is understood, to identify the target of various different modes and the technology of object.Pass through image recognition skill Art obtains the control image in interface image, can trigger control in control image and current display interface and corresponds, Ke Yili It solves, the control image in interface image is the image that can trigger control in current display interface；When in control image comprising text When word, voice identifier of the corresponding text as the control image is directly acquired by image recognition technology, and be stored in default language In sound library；When control image does not include text, the corresponding voice identifier of the control image is preset, and is stored in default sound bank In.

It include the first control image 41, the second control image 42, third control image 43 in interface image 1 such as Fig. 4 a, In, the first control image 41 and the second control image 42 include identical text " science ", and third control image 43 includes text " technology ", obtains the first control image 41 by image recognition technology and the matched voice identifier of the second control image 42 is " section Learn ", the matched voice of third control image 43 is expressed as " technology ", and is that " science " and " technology " deposit is default by voice identifier In sound bank.

Step 2032, control image corresponding with the matched voice identifier of the phonetic order is determined as target widget figure Picture.

Including the corresponding voice identifier of control images all in interface image in default sound bank, parse in phonetic order Control voice matches control voice with the voice identifier in default sound bank, if successful match, by the voice identifier Corresponding control image is determined as target widget image.

Specifically, the control image is then determined as target control when the corresponding control image of the voice identifier is one Part image；In Fig. 4 a, a control image, i.e. third control image 43 are only corresponded to when voice identifier is " technology ", then will Third control image 43 is determined as target widget image.

When the corresponding control image of the voice identifier is more than two, to more than two control figures in interface image As being numbered；The voice selecting comprising label is received to instruct；According to voice selecting instruct in number determine reference numeral Control image is target widget image.In Fig. 4 a, when voice identifier is " science ", corresponding two control images, i.e., first Control image 41 and the second control image 42, at this point, two control images are numbered in interface image 1, such as Fig. 4 b institute Show, by number display in the corresponding position of corresponding control image；The corresponding position can be on the side of control image, can also be with Cover control image；Then the voice selecting instruction comprising number is received, for example, include number 1 in voice selecting instruction, according to Number 1 in voice selecting instruction determines that the control image of number 1, i.e. the first control image 41 are target widget image.

Referring to figure 5., it as another example, identifies in the interface image and wraps in the phonetic order based on the received The control image contained, and before the step of determining target widget image corresponding with the phonetic order, including following sub-step:

Step 2033, the interface image is divided into several regions.

It, can be in one fixed partitioned mode of systemic presupposition, according to the partitioned mode of preset fixation in the present embodiment Interface image is divided, and in interface image display area line of demarcation；For example, presetting a fixed partitioned mode For interface image is divided into 9 regions in matrix distribution.It subregion identification can also instruct based on the received by interface image Be divided into several regions, and in interface image display area line of demarcation, wherein subregion identification instruction in include region number Amount, for example, being 4 by interface image equal part or random division when the quantity that the identification instruction of received subregion includes region is 4 Region.

Step 2034, corresponding area identification is shown in the region.

In the present embodiment, area identification can be number, letter etc., and area identification is associated with corresponding region and one by one It is corresponding.Such as Fig. 6 a, interface image 1 is divided into 3 regions, the display area mark 1 in first area 61, in second area 62 Interior display area mark 2, the display area mark 3 in third region 63.By setting area identification corresponding with region, and will Area identification is shown in corresponding region, and the accuracy of voice control can be improved.

In the present embodiment, phonetic order includes area voice and control voice, the identification of phonetic order based on the received The control image for including in the interface image, and the step of determining target widget image corresponding with the phonetic order, packet Include following sub-step:

Step 2035, by parsing the area voice in the phonetic order, corresponding target area is determined.

Parse phonetic order in area voice, area voice is matched with the set of area identification, when matching at When function, region corresponding with the matched area identification of the area voice is determined as target area.Such as Fig. 6 a, when in phonetic order The corresponding target area of area voice be second area 62 when, second area 62 is determined as target area.

Step 2036, the target area in the interface image is identified, the control image for including to the target area With corresponding voice identifier；Wherein, the triggerable control one of the control image and corresponding region in the current display interface One is corresponding.

Target area is a portion in the interface image, is obtained in target area by image recognition technology Control image, and match corresponding voice identifier to control image, voice identifier is stored in default sound bank, control image with The triggerable control of corresponding region corresponds in the current display interface.

Such as Fig. 6 b, the control image in target area is obtained by image recognition technology, i.e., in acquisition second area 62 Control image includes the second control image 42 in second area 62, and the matched voice identifier of the second control image 42 is " section Learn ", and be that " science " is stored in default sound bank by voice identifier.

Step 2037, control image corresponding with the voice identifier of control voice match in the phonetic order is determined For target widget image.

Including the corresponding voice identifier of control images all in target area in default sound bank, parse in phonetic order Control voice matches control voice with the voice identifier in default sound bank, if successful match, by the voice identifier Corresponding control image is determined as target widget image.

As Fig. 6 b only corresponds to a control image, i.e., the second control when voice identifier is " science " in second area 62 Second control image 42 is then determined as target widget image by part image 42.

If in the target area, above-mentioned steps 2032 can be referred to when the corresponding control image more than one of voice identifier In description, further screening determine unique control image be target widget image, details are not described herein again.

Step 204, the display resolution of the current display interface is obtained.

Display resolution is the resolution ratio of display when displaying an image, is measured with pixel.The numerical value of display resolution Refer to the quantity of horizontal pixel and vertical pixel in all effective areas of whole display.For example, the display of 1920*1080 is differentiated Rate refers to that the horizontal pixel for the current display interface that display is shown has 1920, and vertical pixel has 1080.

Step 205, image coordinate system corresponding with the interface image is established according to the display resolution.

It is that origin establishes rectangular coordinate system as unit of pixel by the lower left corner of interface image in the present embodiment, it can be with Understand, the unit length of abscissa and ordinate in image coordinate system is the width of a pixel in interface image.Such as figure 7, the lower-left angular coordinate of interface image 1 is (0,0), and bottom right angular coordinate is (1920,0), and top left co-ordinate is (0,1080), upper right Angular coordinate is (1920,1080).

Step 206, according to described image coordinate system, the coordinate of the target widget image is determined.

The coordinate of target widget image can be the coordinate at any point in feeling the pulse with the finger-tip mark control image, or refer to target widget The coordinate at the edge of image can also refer in particular to the coordinate of target widget image center.General control image is in rectangle, passes through meter The height and width of corresponding control image can be obtained in the coordinate for calculating vertex, and calculate the marginal point of control image coordinate and The coordinate of central point.

Such as Fig. 7, when target widget image is the second control image 42, the lower-left angular coordinate of the second control image 42 is (700,428), top left co-ordinate are (700,612), and upper right angular coordinate is (980,612), and bottom right angular coordinate is (980,428), The coordinate that central point is calculated is (840,520).

Step 207, the position according to the target widget image in the interface image, the determining and target widget The corresponding target of image can trigger target position of the control in the current display interface.

Due to interface image be with the one-to-one image of current display interface size figure, target widget image is unique , and it is also uniquely, accordingly, it is determined that target widget image is at interface that target corresponding with target widget image, which can trigger control, Position in image, so that it may determine that target can trigger control in the target position of current display interface.

Step 208, the cursor is moved to the target position.

According to the target position that above-mentioned steps determine, cursor, which is moved to, can trigger the corresponding target position of control with target It sets, can refer to the description of above-mentioned steps 104, therefore repeat no more.

Preferably, further include following sub-step after the described the step of cursor is moved to the target position:

Receive orientation phonetic order；

According to the mobile cursor of the orientation phonetic order.

Orientation phonetic order includes direction instruction and digital command, and cursor is according to the orientation phonetic order received towards direction Instruct the mobile step number corresponding with digital command in corresponding direction.Unit step number can be set to the quantity for being separated by triggerable control, Or the width of a pixel, it is not construed as limiting herein.Such as Fig. 4, when the instruction of received orientation is " when the right side 1 ", cursor is from Fig. 4 First control image, 41 position is moved to 42 position of the second control image.

The target for triggering the target position can trigger control.

In the present embodiment, triggering the target and can trigger control includes recalling target to can trigger the corresponding function dish of control Interface that is single, can trigger control link into target etc.；In a mouse mode, triggering the target can trigger the operation packet of control Include the operation such as single left button mouse click, double left button mouse click, a mouse click mail；Under control mode touch mode, display be include touch Screen or handwriting screen etc., triggering the target and can trigger the operation of control includes the behaviour such as clicking screen, double-clicking screen, long-pressing screen Make.When triggering target can trigger control, cancel interface image, i.e. interface image is removed from the top layer of current display interface.

Preferably, the method for the voice control further include:

Exit voice control mode.

In the present embodiment, the operation for exiting voice control mode can be when starting any one after voice control mode It carves and carries out, user can exit voice control mode by the corresponding key of triggering terminal, can also be by defeated to terminal speech Enter module sending voice control mode to exit command to exit voice control mode.The voice control mode exits command can be with By terminal preset in advance, storage is into terminal after can also being customized by the user setting.Voice control mode is exited when receiving Operation when, cancel interface image, i.e., interface image from the top layer of current display interface remove.

The present embodiment obtains and current display interface pair current display interface screenshotss by starting voice control mode The interface image answered；The control image for including in phonetic order identification interface image based on the received, and determining and phonetic order Corresponding target widget image；It is established by obtaining the display resolution of current display interface, and according to the display resolution Image coordinate system corresponding with interface image determines the coordinate of target widget image, according to target widget according to image coordinate system Position of the image in interface image determines that target corresponding with target widget image can trigger control in current display interface Target position, the cursor is finally moved to the target position, the embodiment of the present invention by by image recognition technology with Speech recognition technology combines, and realizes that voice control cursor is mobile, to improve the interactivity of speech recognition technology and operation interface.

It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented Necessary to example.

Embodiment three:

Referring to Fig. 8, a kind of structural block diagram of the device of voice control of the embodiment of the present invention three is shown, specifically can wrap It includes with lower module:

Interface image obtains module 301, for obtaining interface image corresponding with current display interface；

Picture recognition module 302 identifies the control figure for including in the interface image for phonetic order based on the received Picture, and determine target widget image corresponding with the phonetic order；

Target position determining module 303, for the position according to the target widget image in the interface image, really Fixed target corresponding with the target widget image can trigger target position of the control in the current display interface；

Cursor control module 304, for the cursor to be moved to the target position.

The device of the present embodiment is used to execute the method and step in above-described embodiment, and details are not described herein.

The embodiment of the present invention obtains module by interface image and obtains interface image corresponding with current display interface, then By picture recognition module, phonetic order identifies the control image for including in the interface image, and determining and institute based on the received The corresponding target widget image of phonetic order is stated, then passes through target position determining module according to the target widget image in institute The position in interface image is stated, determines that target corresponding with the target widget image can trigger control in current display circle Target position in face, finally, the cursor is moved to the target position by cursor control module；The embodiment of the present invention By the way that image recognition technology in conjunction with speech recognition technology, is realized that voice control cursor is mobile, to improve speech recognition technology With the interactivity of operation interface.

Example IV:

Referring to Fig. 9, a kind of structural block diagram of the device of voice control of the embodiment of the present invention four is shown, specifically can wrap It includes with lower module:

Voice control starting module 401, for starting voice control mode.

Interface screen capture module 402, for obtaining boundary corresponding with the current display interface to current display interface screenshotss Face image.

Picture recognition module 403 identifies the control figure for including in the interface image for phonetic order based on the received Picture, and determine target widget image corresponding with the phonetic order.

Figure 10 is please referred to, as an example, described image identification module 403 includes following submodule:

Full frame identification submodule 4031, for identification interface image, the control image for including to the interface image Match corresponding voice identifier；Wherein, it can trigger control one-to-one correspondence in the control image and the current display interface.

Full frame target widget image determines submodule 4032, and being used for will be corresponding with the matched voice identifier of the phonetic order Control image be determined as target widget image.

Specifically, the control image is then determined as target control when the corresponding control image of the voice identifier is one Part image；When the corresponding control image of the voice identifier is more than two, to more than two control figures in interface image As being numbered；The voice selecting comprising label is received to instruct；According to voice selecting instruct in number determine reference numeral Control image is target widget image.

Figure 11 is please referred to, as another example, described device further includes following submodule:

Image divides submodule 4033, for the interface image to be divided into several regions.

Area identification display sub-module 4034, for showing corresponding area identification in the region.

In the present embodiment, area identification can be number, letter etc., and area identification is associated with corresponding region and one by one It is corresponding.

In the present embodiment, phonetic order includes area voice and control voice, and described image identification module 403 includes as follows Submodule:

Target area determines submodule 4035, for determining and corresponding to by parsing the area voice in the phonetic order Target area.

Parse phonetic order in area voice, area voice is matched with the set of area identification, when matching at When function, region corresponding with the matched area identification of the area voice is determined as target area.

Target area identifies submodule 4036, for identification the target area in the interface image, to the target area The control image that domain includes matches corresponding voice identifier；Wherein, the control image is corresponding with the current display interface The triggerable control in region corresponds.

Regional aim control image determines submodule 4037, for by with the control voice match in the phonetic order The corresponding control image of voice identifier is determined as target widget image.

Including the corresponding voice identifier of control images all in target area in default sound bank, parse in phonetic order Control voice matches control voice with the voice identifier in default sound bank, if successful match, by the voice identifier Corresponding control image is determined as target widget image.If in the target area, the corresponding control image of voice identifier more than one When a, the control image in the target area of interface image is numbered；The voice selecting comprising label is received to instruct；According to Number in voice selecting instruction determines that the control image of reference numeral is target widget image.

Resolution ratio obtains module 404, for obtaining the display resolution of the current display interface.

Image coordinate system establishes module 405, corresponding with the interface image for being established according to the display resolution Image coordinate system.

It is that origin establishes rectangular coordinate system as unit of pixel by the lower left corner of interface image in the present embodiment, it can be with Understand, the unit length of abscissa and ordinate in image coordinate system is the width of a pixel in interface image.

Target widget image coordinate determining module 406, for determining the target widget figure according to described image coordinate system The coordinate of picture.

Target position determining module 407, for the position according to the target widget image in the interface image, really Fixed target corresponding with the target widget image can trigger target position of the control in the current display interface.

Cursor control module 408, for the cursor to be moved to the target position.

According to the target position that above-mentioned target position determining module 407 determines, cursor, which is moved to, can trigger control with target The corresponding target position of part.

Preferably, described device further includes following module:

Orientation speech reception module, for receiving orientation phonetic order；

Orientation speech control module, for according to the mobile cursor of the orientation phonetic order.

Orientation phonetic order includes direction instruction and digital command, and cursor is according to the orientation phonetic order received towards direction Instruct the mobile step number corresponding with digital command in corresponding direction.Unit step number can be set to the quantity for being separated by triggerable control, Or the width of a pixel, it is not construed as limiting herein.

Preferably, described device further includes following module:

Trigger module, the target for triggering the target position can trigger control.

In the present embodiment, triggering the target and can trigger control includes recalling target to can trigger the corresponding function dish of control Interface that is single, can trigger control link into target；Triggering the target and can trigger the operation of control includes clicking, double-clicking, growing It is operated by equal.When triggering target can trigger control, cancel interface image, i.e. interface image is moved from the top layer of current display interface It removes.

Preferably, described device further includes following module:

Module is exited, for exiting voice control mode.

The embodiment of the present invention four starts voice control mode by voice control starting module, passes through interface screen capture module pair Current display interface screenshotss obtain interface image corresponding with current display interface；Based on the received by picture recognition module The control image for including in phonetic order identification interface image, and determine target widget image corresponding with phonetic order；Pass through Resolution ratio obtains the display resolution that module obtains current display interface, establishes module according to the display by image coordinate system Resolution ratio establishes image coordinate system corresponding with interface image, by target widget image coordinate determining module according to image coordinate System, determine the coordinate of target widget image, by target position determining module according to target widget image in interface image Position determines that target corresponding with target widget image can trigger target position of the control in current display interface, finally leads to It crosses cursor control module and the cursor is moved to the target position, the embodiment of the present invention is by by image recognition technology and language Sound identification technology combines, and realizes that voice control cursor is mobile, to improve the interactivity of speech recognition technology and operation interface.

For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple Place illustrates referring to the part of embodiment of the method.

Method to a kind of voice control provided by the present invention and a kind of device of voice control above have carried out in detail It introduces, used herein a specific example illustrates the principle and implementation of the invention, the explanation of above embodiments It is merely used to help understand method and its core concept of the invention；At the same time, for those skilled in the art, according to this The thought of invention, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification is not answered It is interpreted as limitation of the present invention.

Claims

1. a kind of method of voice control characterized by comprising

Obtain interface image corresponding with current display interface；

Phonetic order identifies the control image for including in the interface image, and the determining and phonetic order pair based on the received The target widget image answered；

According to position of the target widget image in the interface image, mesh corresponding with the target widget image is determined It marks and can trigger target position of the control in the current display interface；

The cursor is moved to the target position.

2. the method according to claim 1, wherein the phonetic order based on the received identifies the surface chart The control image for including as in, and the step of determining target widget image corresponding with the phonetic order, comprising:

It identifies the interface image, corresponding voice identifier is matched to the control image that the interface image includes；Wherein, described It can trigger control in control image and the current display interface to correspond；

3. the method according to claim 1, wherein identifying the interface in the phonetic order based on the received The control image for including in image, and before the step of determining target widget image corresponding with the phonetic order, comprising:

The interface image is divided into several regions；

Corresponding area identification is shown in the region.

4. according to the method described in claim 3, it is characterized in that, the phonetic order includes area voice and control voice； The phonetic order based on the received identifies the control image for including in the interface image, and the determining and phonetic order pair The step of target widget image answered, comprising:

It identifies the target area in the interface image, corresponding voice mark is matched to the control image that the target area includes Know；Wherein, the control image and the triggerable control of corresponding region in the current display interface correspond；

Control image corresponding with the voice identifier of control voice match in the phonetic order is determined as target widget figure Picture.

5. method according to claim 1 or 2 or 4, which is characterized in that described in the identification of phonetic order based on the received The control image for including in interface image, and the step of determining target widget image corresponding with the phonetic order, further includes:

When control image corresponding with the phonetic order includes more than two, to more than two in the interface image Control image is numbered；

Receive the voice selecting instruction comprising number；

6. method according to claim 1 or 2 or 4, which is characterized in that described to obtain boundary corresponding with current display interface The step of step of face image, comprising:

Start voice control mode；

7. method according to claim 1 or 2 or 4, which is characterized in that it is described according to the target widget image in institute The position in interface image is stated, determines that target corresponding with the target widget image can trigger control in current display circle Before the step of target position in face, comprising:

Obtain the display resolution of the current display interface；

8. the method according to the description of claim 7 is characterized in that the cursor is moved to the target position described After step, further includes:

Receive orientation phonetic order；

According to the mobile cursor of the orientation phonetic order.

9. the method according to claim 1, wherein the cursor is moved to the target position described After step, comprising:

The target for triggering the target position can trigger control.

10. a kind of phonetic controller characterized by comprising

Picture recognition module identifies the control image for including in the interface image for phonetic order based on the received, and really Fixed target widget image corresponding with the phonetic order；

Target position determining module, for the position according to the target widget image in the interface image, determining and institute Stating the corresponding target of target widget image can trigger target position of the control in the current display interface；

Cursor control module, for the cursor to be moved to the target position.