CN106293600A

CN106293600A - A kind of sound control method and system

Info

Publication number: CN106293600A
Application number: CN201610641425.XA
Authority: CN
Inventors: 张瀚林
Original assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Current assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Priority date: 2016-08-05
Filing date: 2016-08-05
Publication date: 2017-01-04

Abstract

The invention discloses a kind of sound control method, be used for controlling APP, the method includes: A, according to user's operation to APP interface control, intercepts the action that each operation is corresponding, and this action occurs the coordinate position on APP interface；B, respective action for each operation, and this action occurs the coordinate position on APP interface to set up unique corresponding speech recognition label, forms label record；C, the speech recognition label substance read aloud according to user, find the action that this speech recognition label is corresponding, and this action occurs the coordinate position on APP interface；D, coordinate position on described APP interface perform respective action.The invention also discloses a kind of speech control system.Use the present invention each interface in third party's program can be controlled and be operated.

Description

A kind of sound control method and system

Technical field

The present invention relates to computer realm, particularly to a kind of sound control method and system.

Background technology

It is the most convenient that voice assistant can be brought for us.We can pass through voice assistant, utilizes Voice command to open The computer applied algorithm (APP) that system is installed.

At present, the voice assistant software that popular most of manufacturer is supported, it is impossible to well support third party software Voice-controlled operations, can only do the operation that some simple third party applications are opened.And can not be in third party's program Each interface is controlled and operates.It addition, also there are some to invent, extraction interface element label is used to carry out being saved in operation Shi Ku, mates label and carries out the operation of predefined action when of speech recognition, on the one hand this invention needs to extract Interface Element Element label, still further aspect needs predefined basic operation.Some interface element relatively or identical in the case of, hold Easily cause the situation of different interface element correspondence same labels；Do not exist or non-legible label at some interface element Time can cause the situation that cannot extract interface element label.And this invention needs the action of predefined basic operation, because of This, it can only perform predefined action.

Summary of the invention

It is an object of the invention to provide a kind of sound control method and system, it is possible to each in third party's program Interface is controlled and operates.

For achieving the above object, the invention provides a kind of sound control method, be used for controlling computer utility journey Sequence APP, the method includes:

A, according to user's operation to APP interface control, intercept the action that each operation is corresponding, and this action occur Coordinate position on APP interface；

B, respective action for each operation, and this action to occur the coordinate position on APP interface to set up the most right The speech recognition label answered, forms label record；

C, the speech recognition label substance read aloud according to user, find the action that this speech recognition label is corresponding, and There is the coordinate position on APP interface in this action；

D, coordinate position on described APP interface perform respective action.

For achieving the above object, present invention also offers a kind of speech control system, be used for controlling computer utility Program APP, this system includes:

Blocking module, according to user's operation to APP interface control, intercepts the action that each operation is corresponding, and should There is the coordinate position on APP interface in action；

Tag recognition module, for the respective action of each operation, and there is the coordinate position on APP interface in this action Set up unique corresponding speech recognition label, form label record；The speech recognition label substance read aloud according to user, finds The action that this speech recognition label is corresponding, and the coordinate position that this action generation is on APP interface；

Action control module, the coordinate position on described APP interface performs respective action.

In sum, the language control method of embodiment of the present invention offer and device, while interception action, utilize language The speech recognition label of the self-defined each action of sound identification technology.In this way, system is possible not only to catch the every of user There is the position on screen in one operational motion and this action.Being additionally, since is self-defining speech recognition label, therefore, Label can be avoided completely identical or the situation generation of label can not be obtained.It addition, the present invention is by the way of motion capture, Rather than utilize the mode of sectional drawing identification operable area, therefore, there is no need to store substantial amounts of picture, it is not required that utilize image Identification technology goes to identify the operable area of each pictures.Therefore, do not exist and take storage space, reduce system and perform effect Rate and the situation of waste electric energy.

Accompanying drawing explanation

Fig. 1 is the schematic flow sheet of preferred embodiment of the present invention sound control method.

Fig. 2 is the structural representation of embodiment of the present invention speech control system.

Detailed description of the invention

For making the purpose of the present invention, technical scheme and advantage clearer, develop simultaneously embodiment referring to the drawings, right Scheme of the present invention is described in further detail.

The sound control method of the present invention mainly comprises two stages, and first stage is the generation rank of speech recognition label Section, second stage is the speech recognition controlled stage.At first stage, user opens voice assistant software, and utilizes language A third party APP opened by sound assistant's software.Then, on the operation interface of third party APP, interface control is operated, And voice assistant software is at running background, catch and intercept and record the action of every single stepping of user (such as: click on and press Button) and this action generation coordinate position (X, Y) on screen.Then, user is the self-defined speech recognition of this action Label, utilizes word that speech recognition obtains to store in data base as the content of this speech recognition label.Thus complete one The manufacturing process of individual speech recognition label.At second stage, user utilizes the label record being stored in data base, by correspondence Speech recognition label substance show certain the suitable coordinate around the operable control element of correspondence of third party's APP view On position, when user reads aloud speech recognition label substance corresponding on a certain interface control, speech recognition is utilized to obtain correspondence Word tag, then mates this label in data base, thus obtains the action corresponding on screen of this label and exist with this action The position occurred on screen.After obtaining above-mentioned information, voice assistant software command system is automatically to the coordinate bit on screen Put the operation of action carrying out being associated.Thus reach the purpose of Voice command third party APP.

Fig. 1 is the schematic flow sheet of preferred embodiment of the present invention sound control method, as it is shown in figure 1, comprise the following steps:

A1, acquisition APP title and APP interface control place, when the page speech recognition label page number, join label record In；

Wherein, APP interface control refers to the visualized graphs " element " can placed on forms, such as button, Document Editing frame Deng.Wherein great majority are to have perform function or cause code to run and complete the function of response by " event ".

B1, occur coordinate position on APP interface to calculate described label record to show on APP interface according to this action Coordinate position, and described label record is shown that the coordinate position on APP interface joins in label record；

C1, according to APP title and when the page speech recognition label page number, find and described APP title and voice known All label records that the distinguishing label page number matches, show each strip label record on the corresponding coordinate position at APP interface；

D, coordinate position on described APP interface perform respective action.

Thus complete the sound control method of the present invention.Wherein, first stage includes step A1, A, B and B1, for voice Identifying the generation phase of label, second stage includes that step C1, C and D are the speech recognition controlled stage.It should be noted that The preferred embodiment of the present invention adds the speech recognition label page number in speech recognition label, and the speech recognition label page number is with each Page APP interface is corresponding.In the case of speech recognition label is identical, difference can be distinguished by the speech recognition label page number Action corresponding to label record and the generation position of action.If making each during the most self-defined speech recognition label Individual speech recognition bookmark name is different, sending out of the uniquely corresponding action of the title of each speech recognition label and action Raw position, then avoid the need for arranging the speech recognition label page number.

Further, performing after step B1, the method also includes: step B2, according to user to APP interface control Lower one page APP interface that operation is jumped to, obtains lower one page speech recognition label page number, by described lower one page speech recognition mark Sign the page number and join when in strip label record, and described lower one page speech recognition label page number is joined new label record In, then repeated execution of steps A1, A, B and B1, form all labels note mated with lower one page speech recognition Shipping Options Page code-phase Record.

When performing step C, also include searching whether current speech identification label comprises lower one page speech recognition Shipping Options Page Code, if comprised, then, after performing step D, enters this lower one page speech recognition label page number, then repeated execution of steps C1, C and D, the execution of execution on the APP interface corresponding to lower one page speech recognition label page number.

Based on same inventive concept, the present invention provides a kind of speech control system, is used for controlling APP, as in figure 2 it is shown, This system includes:

Blocking module 201, according to user's operation to APP interface control, intercepts the action that each operation is corresponding, and There is the coordinate position on APP interface in this action；

Tag recognition module 202, for the respective action of each operation, and there is the coordinate on APP interface in this action Unique corresponding speech recognition label is set up in position, forms label record；The speech recognition label substance read aloud according to user, looks into Find the action that this speech recognition label is corresponding, and this action occurs the coordinate position on APP interface；

Action control module 203, the coordinate position on described APP interface performs respective action.

Described tag recognition module 202, is additionally operable at blocking module according to user's operation to APP interface control, intercepts To the action that each operation is corresponding, and before this action occurs the coordinate position on APP interface, obtain APP title and The page speech recognition label page number is worked as at APP interface control place, joins in label record；

Described tag recognition module 202, is additionally operable in the respective action for each operation, and this action occurs at APP Coordinate position on interface sets up unique corresponding speech recognition label, after forming label record, occurs according to this action Coordinate position on APP interface calculates described label record and shows the coordinate position on APP interface, and by described label record Display coordinate position on APP interface joins in label record；

Described tag recognition module 202, is additionally operable to, at the speech recognition label substance read aloud according to user, find this language The action that sound identification label is corresponding, and before this action occurs coordinate position on APP interface, according to APP title and When the page speech recognition label page number, find all labels mated with described APP title and speech recognition Shipping Options Page code-phase Record, shows each strip label record on the corresponding coordinate position at APP interface.

Described tag recognition module 202, is additionally operable to occurring the coordinate position on APP interface to calculate institute according to this action State label record and show the coordinate position on APP interface, and described label record is shown the coordinate position on APP interface After joining in label record, the lower one page APP interface operation of APP interface control jumped to according to user, obtain Lower one page speech recognition label page number, joins when in strip label record by described lower one page speech recognition label page number, and Described lower one page speech recognition label page number is joined in new label record.

Described tag recognition module 202, is additionally operable to the speech recognition label substance read aloud according to user, finds this voice Identify the action that label is corresponding, and when this action occurs the coordinate position on APP interface, search current speech identification label Whether comprise lower one page speech recognition label page number, if comprised, then, at action control module 203, on described APP interface Coordinate position perform after respective action, enter this lower one page speech recognition label page number.

Described system also includes sound identification module 204, receives the speech recognition label that user reads aloud, and is converted into word Speech recognition label, be sent to tag recognition module 202, for the respective action of each operation, and this action occurs at APP Coordinate position on interface sets up unique corresponding speech recognition label.

For understanding the explanation present invention, it is analyzed explanation the most stage by stage.It is right that the sound control method of the present invention to realize The control of third party APP.

First stage: the generation phase of speech recognition label

(1) when APP_XXX opened by needs, user, on the basis of opening speech control system, reads aloud and opens APP_ XXX；

(2) sound identification module identification voice, opens APP_XXX.Acquiescence opens APP_XXX page 1 interface；

(3) tag recognition module gets APP title " APP_XXX ", also gets the voice corresponding with page 1 interface and knows The distinguishing label page number 1, and join in label record；

(4) one POP UP of speech control system ejection allows user choose whether to need recorded speech identification label, user Select recorded speech identification label.

(5) when a certain APP interface control on user operation page 1 interface, it is assumed that this APP interface control is button, Then, user clicks on, and now the click event of this button is intercepted block intercepts, obtain this click action (Click) and There is the coordinate position (X0, Y0) on page 1 interface in this click action, sends it to tag recognition module, join mark Sign in record；

(6) simultaneously, starting sound identification module, user reads aloud a self-defined speech recognition label " Button1 ", voice Identification module is after identifying " Button1 " read aloud, and " " Button1 " sends it to mark to the speech recognition label of generation word Sign identification module and join in label record, set up " Button1 " and the unique corresponding relation between " Click " and (X0, Y0).

It addition, tag recognition module calculates the display position (x0, y0) of label record according to click coordinate (X0, Y0), Join in label record.(x0, y0) is generally shown at (X0, Y0) around, in order to user is clearly by speech recognition mark Sign and on label record one_to_one corresponding.

The label record of above-mentioned generation is as shown in table 1:

Table 1

(7) after generating label record, the click event of " Button1 ", jump page to page 2 circle are continued executing with Face；

(8) user reads aloud page 2, and sound identification module, after identifying " page 2 " read aloud, is sent to tag recognition mould Block, tag recognition module gets the speech recognition label page number 2 corresponding with page 2 interface, by this speech recognition label page number 2 It is appended in the label record of table 1, as when the page speech recognition label page number lower one page speech recognition label page number to be redirected. As shown in table 1 '.Further, new tab record, this speech recognition label page number 2 is joined in new label record.

Table 1 '

It follows that in like manner, as the generation step of the speech recognition label " Button1 " at page 1 interface, generate the 2nd The speech recognition label in page boundary face.

(9) one POP UP of speech control system ejection allows user choose whether to need recorded speech identification label, user Select recorded speech identification label.

(10) when user clicks on a certain button on page 2 interface, now the click event of this button is intercepted module Intercept, obtain this click action (Click) and this click action occurs the coordinate position (X1, Y1) on page 2 interface, by it It is sent to tag recognition module, joins in new label record；

Meanwhile, starting sound identification module, user reads aloud a self-defined speech recognition label " Button1 ", and voice is known Other module, after identifying " Button1 " read aloud, generates the speech recognition label " Button1 " of word, sends it to label Identification module joins in label record, sets up " Button1 " and the unique corresponding relation between " Click " and (X1, Y1).

It addition, tag recognition module calculates the display position (x1, y1) of label record according to click coordinate (X1, Y1), Join in new label record.(x1, y1) is generally shown at (X1, Y1) around, in order to by speech recognition label and mark Sign on record one_to_one corresponding.

The label record of above-mentioned generation is as shown in table 2:

Table 2

(11) when user clicks on another button on page 2 interface, now the click event of this button is intercepted module Intercept, obtain this click action (Click) and this click action occurs the coordinate position (X2, Y2) on page 2 interface, by it It is sent to tag recognition module, joins in new label record；

Meanwhile, starting sound identification module, user reads aloud a self-defined speech recognition label " Button2 ", and voice is known Other module, after identifying " Button2 " read aloud, generates the speech recognition label " Button2 " of word, sends it to label Identification module joins in label record, sets up " Button2 " and the unique corresponding relation between " Click " and (X2, Y2).

It addition, tag recognition module calculates the display position (x2, y2) of label record according to click coordinate (X2, Y2), Join in new label record.(x2, y2) is generally shown at (X2, Y2) around, in order to by speech recognition label and mark Sign on record one_to_one corresponding.

The label record of above-mentioned generation is as shown in table 3:

Table 3

According to foregoing description, by that analogy, the operational motion of the interception carrying out on each interface of third party APP, raw Should there is the label record of speech recognition label in pairs.

Second stage: speech recognition controlled stage

(3) tag recognition module gets APP title " APP_XXX ", also gets the voice corresponding with page 1 interface and knows The distinguishing label page number 1,

(4) tag recognition module is according to APP title " APP_XXX " and when the page speech recognition label page number 1, find with All label records that " APP_XXX " and the speech recognition label page number 1 match, according to table 1 ', mate a strip label record, Therefore, this strip label record is shown on the coordinate position (x0, y0) at page 1 interface.

(5) user reads aloud the speech recognition label " Button1 " on label record, and sound identification module is read aloud in identification After " Button1 ", generate the speech recognition label " Button1 " of word, send it to tag recognition module, tag recognition mould Tuber, according to speech recognition label " Button1 ", finds action corresponding to this speech recognition label " Click " and this action is sent out Raw coordinate position (X1, Y1) on page 1 interface.

(6) action " Click " corresponding for this speech recognition label and this action are occurred at page 1 by tag recognition module Coordinate position (X1, Y1) on interface passes to action control module, and action control module performs click at position (X1, Y1) and presses The operation of button " Button1 ".

(7), after action control module performs the operation of click button " Button1 ", jump page is to page 2 interface.

(8) due to the label record of tag recognition module polls table 1 ', next page is page 2 interface, then by page 2 circle Label record corresponding to face finds out, including table 2 and the label record of table 3.

It follows that in like manner, the step of page 2 interface control is controlled as the step controlling page 1 interface control.

(9) from the label record of table 2 and table 3 it can be seen that there are two speech recognition labels page 2 interface, " Button1 " and " Button2 ".User selects to read aloud the speech recognition label " Button2 " on label record, speech recognition mould Block, after identifying " Button2 " read aloud, generates the speech recognition label " Button2 " of word, sends it to tag recognition Module, tag recognition module, according to speech recognition label " Button2 ", finds the action that this speech recognition label is corresponding There is the coordinate position (X2, Y2) on page 2 interface in " Click " and this action.

Action " Click " corresponding for this speech recognition label and this action are occurred at page 2 circle by tag recognition module Coordinate position (X2, Y2) on face passes to action control module, and action control module performs to click on button at position (X2, Y2) The operation of " Button2 ".

According to foregoing description, by that analogy, it is automatically performed the control of control on each interface to third party APP by voice System.

Interface operation process cited in above example, the simply citing of an application scenarios, in this embodiment Each step during voice operating is all carried out Refinement operation step by step, letter can also be carried out the most in the process Change, such as: massage voice reading can be reduced to and meet certain grammatical rules " Page xx, Button xx, Next pagexx " Deng, read aloud step unification and be one by multiple and read aloud step.These all can define with oneself.It addition, whether there is Pop-up button, Or other control the method that voice label is recorded, can be self-defined.

The method of the present invention carries out the monitoring of the overall situation to the touch screen in system and key-press event, once finds that user has behaviour The behavior of control terminal, intercepted the system behavior, carries out the generation of customized label, thus is moved with this by customized label Make to bind.It is stored in data base, and the when that interface control position not changing in systems, this recording generates The behavior of customized label has only to occur once.Use this APP can use Voice command the most every time.If interface is controlled The position of part changes, then need again to record customized label.

The sound control method of the present invention and device, go for various mobile terminal and PC.It is mainly used in voice Identify and Voice command aspect.The present invention can be used to carry out the voice-controlled operations of APP.Thus reach to liberate both hands, more Intelligentized purpose, is particularly suited for as intelligent watch operating interface smaller, the smart machine of inconvenient operation.

The beneficial effects of the present invention is,

One, user can be intercepted and catch and record to touch screen and all operations of button, and this operation is sent out Raw screen coordinate position.And with a self-defining speech recognition label binding, carrying out Voice command when, according to This customized label of massage voice reading, and to the operation of this label binding of data base querying, then directly command system arrives automatically Carry out the operation being correlated with on this coordinate position of screen, thus remove operation screen or button without manual.Reach voice control The purpose of system.

Two, after recording generation customized label, its action and label are bound.Therefore Voice command is being used During, as long as reading aloud customized label content, can be according to tag queries to corresponding action.Then the generation of this action is also It not user's manual operations, or other physical methods operate, but notice system, system it is automatically brought into operation, example As, certain position etc. of point touching screen.

The above, only presently preferred embodiments of the present invention, it is not intended to limit protection scope of the present invention.All Within the spirit and principles in the present invention, any modification, equivalent substitution and improvement etc. made, should be included in the protection of the present invention Within the scope of.

Claims

1. a sound control method, is used for controlling computer applied algorithm APP, and the method includes:

A, according to user's operation to APP interface control, intercept the action that each operation is corresponding, and this action occur to exist Coordinate position on APP interface；

B, respective action for each operation, and this action to occur the coordinate position on APP interface to set up unique corresponding Speech recognition label, forms label record；

C, the speech recognition label substance read aloud according to user, find the action that this speech recognition label is corresponding, and this moves Make the coordinate position on APP interface；

D, coordinate position on described APP interface perform respective action.

2. the method for claim 1, it is characterised in that

Before step A, the method also includes: step A1, acquisition APP title and APP interface control place are known when page voice The distinguishing label page number, joins in label record；

After stepb, the method also includes: step B1, occur coordinate position on APP interface to calculate institute according to this action State label record and show the coordinate position on APP interface, and described label record is shown the coordinate position on APP interface Join in label record；

Before step C, the method also includes: step C1, according to APP title and when the page speech recognition label page number, searches To all label records mated with described APP title and speech recognition Shipping Options Page code-phase, each strip label record is shown On the corresponding coordinate position at APP interface.

3. method as claimed in claim 2, it is characterised in that after performing step B1, the method also includes:

Step B2, the lower one page APP interface jumped to the operation of APP interface control according to user, obtain lower one page voice Identify the label page number, described lower one page speech recognition label page number joined when in strip label record, and by described next The page speech recognition label page number joins in new label record, and then repeated execution of steps A1, A, B and B1 is formed and next All label records of page speech recognition Shipping Options Page code-phase coupling.

4. method as claimed in claim 3, it is characterised in that when performing step C, also includes searching current speech identification mark Signing and whether comprise lower one page speech recognition label page number, if comprised, then, after performing step D, entering this lower one page voice Identify the label page number, then repeated execution of steps C1, C and D, at the APP interface corresponding to lower one page speech recognition label page number The execution of upper execution.

5. the method for claim 1, it is characterised in that the described respective action for each operation, and this action sends out Raw coordinate position on APP interface is set up unique corresponding speech recognition label and is included:

The speech recognition label that reception user reads aloud, and it is converted into the speech recognition label of word, the correspondence for each operation is moved Make, and this action occurs the coordinate position on APP interface to set up unique corresponding speech recognition label.

6. a speech control system, is used for controlling computer applied algorithm APP, and this system includes:

Blocking module, according to user's operation to APP interface control, intercepts the action that each operation is corresponding, and this action There is the coordinate position on APP interface；

Tag recognition module, for the respective action of each operation, and this action occurs the coordinate position on APP interface to set up Unique corresponding speech recognition label, forms label record；The speech recognition label substance read aloud according to user, finds this language The action that sound identification label is corresponding, and the coordinate position that this action generation is on APP interface；

7. system as claimed in claim 6, it is characterised in that

Described tag recognition module, is additionally operable at blocking module according to user's operation to APP interface control, intercepts each behaviour Before making corresponding action, and the coordinate position that this action generation is on APP interface, obtain APP title and the control of APP interface The page speech recognition label page number is worked as at part place, joins in label record；

Described tag recognition module, is additionally operable in the respective action for each operation, and this action occurs on APP interface Coordinate position sets up unique corresponding speech recognition label, after forming label record, occurs on APP interface according to this action Coordinate position calculate described label record and show the coordinate position on APP interface, and described label record is shown at APP Coordinate position on interface joins in label record；

Described tag recognition module, is additionally operable to, at the speech recognition label substance read aloud according to user, find this speech recognition Before the action that label is corresponding, and the coordinate position that this action generation is on APP interface, according to APP title and when page language The sound identification label page number, finds all label records mated with described APP title and speech recognition Shipping Options Page code-phase, will Each strip label record shows on the corresponding coordinate position at APP interface.

8. system as claimed in claim 7, it is characterised in that described tag recognition module, is additionally operable to sending out according to this action Raw coordinate position on APP interface calculates described label record and shows the coordinate position on APP interface, and by described label After record display coordinate position on APP interface joins in label record, according to user's operation to APP interface control The lower one page APP interface jumped to, obtains lower one page speech recognition label page number, by described lower one page speech recognition Shipping Options Page Code joins when in strip label record, and is joined in new label record by described lower one page speech recognition label page number.

9. system as claimed in claim 8, it is characterised in that described tag recognition module, is additionally operable to read aloud according to user Speech recognition label substance, finds the action that this speech recognition label is corresponding, and this action occurs the seat on APP interface During cursor position, search whether current speech identification label comprises lower one page speech recognition label page number, if comprised, then, dynamic Make control module, after the coordinate position on described APP interface performs respective action, enter this lower one page speech recognition label The page number.

10. system as claimed in claim 6, it is characterised in that described system also includes sound identification module, receives user bright The speech recognition label read, and be converted into the speech recognition label of word, is sent to tag recognition module, right for each operation Answer action, and this action occurs the coordinate position on APP interface to set up unique corresponding speech recognition label.