CN109407946B

CN109407946B - A Method of Target Selection in Graphical Interface Based on Speech Recognition

Info

Publication number: CN109407946B
Application number: CN201811056705.XA
Authority: CN
Inventors: 殷继彬; 谢海浪
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2018-09-11
Filing date: 2018-09-11
Publication date: 2021-05-14
Anticipated expiration: 2038-09-11
Also published as: CN109407946A

Abstract

本发明涉及一种基于语音识别的图形界面目标选择方法，属于语音目标选择领域。它包括以下步骤：创建语音标记词库、标记点；将标记点与标记词关联；用户说出标记词且持续发声，生成以对应标记点为圆心，半径随用户持续发声而不断增加的圆；将圆分成若干圆弧段；将圆弧段与标记词关联；用户说出标记词，以对应圆弧段的中心点为圆心，以与相邻圆弧段的距离为半径，生成一个圆；将圆分成若干区域；将区域与标记词关联；用户说出标记词，以对应区域内的中心点为选择点进行目标选择。本发明注重给用户直观的视觉反馈，使用户清楚的知道什么时候该说什么命令，而无需再让用户进行多次语音命令的学习，从而大大方便了用户使用智能设备。The invention relates to a voice recognition-based graphic interface target selection method, which belongs to the field of voice target selection. It includes the following steps: creating a speech mark lexicon and marking points; associating the marked points with the marked words; the user speaks the marked words and continues to utter a sound, and generates a circle with the corresponding marked point as the center and the radius of which increases continuously as the user continues to utter; Divide the circle into several arc segments; associate the arc segments with the marked words; the user speaks the marked words, and takes the center point of the corresponding arc segment as the center and the distance from the adjacent arc segment as the radius to generate a circle; Divide the circle into several areas; associate the area with the marked word; the user speaks the marked word, and selects the target with the center point in the corresponding area as the selection point. The present invention pays attention to giving the user intuitive visual feedback, so that the user can clearly know when to say the command, and the user does not need to learn the voice commands for many times, thereby greatly facilitating the user to use the smart device.

Description

Graphical interface target selection method based on voice recognition

Technical Field

The invention relates to the field of voice target selection, in particular to a graphical interface target selection method based on voice recognition.

Background

Through the development of many years, the voice recognition technology gradually moves from a laboratory to practical application, starts to gradually become a landmark technology in the field of information industry, and gradually enters our daily life in man-machine interaction application, for example, the existing smart phone, tablet computer, smart television, vehicle-mounted tablet, smart bracelet, smart watch and the like are usually accompanied by a voice recognition function. The application program of the voice control intelligent device can be developed by utilizing the voice recognition technology, people do not need to carry out manual physical key operation on the intelligent device, and can realize the operation on the intelligent device only in a voice command mode, so that the method has important significance for a plurality of disabled people. However, the existing voice target selection method can only select a preset target, cannot select any target point appearing on the screen, and cannot give intuitive visual feedback to the user in the selection process, so that the user does not know when the user can speak the command, does not know whether the spoken command is valid, and does not know what the spoken command is, for example: when a certain target in a screen needs to be selected, if a plurality of targets are completely consistent, when a user says 'select a certain target', a voice selection system usually directly screens all matched targets, sets different target names for the consistent targets respectively, generates a new selectable command sentence, and waits for the user to say a new command, so that the user can not clearly know when the user can say the command, when the previous command is finished, and the user does not know what the new command is, thereby influencing the user experience.

Disclosure of Invention

The invention aims to solve the technical problem of providing a voice target selection method which is more intuitive, more efficient in selection and more convenient to use.

The technical scheme of the invention is as follows: a graphical interface target selection method based on voice recognition comprises the following steps:

step1, creating a voice marked word bank in the intelligent equipment, and setting at least one type of marked words in the voice marked word bank, wherein the type of marked words at least comprises one marked word;

step2, creating a plurality of mark points on the screen of the intelligent device and displaying the mark points on the screen of the intelligent device;

step3, respectively associating the mark points of the screen of the intelligent device with a certain type of mark words of the voice mark word library, and respectively displaying the mark words in the type around the corresponding mark points;

step4, judging whether the user speaks a tagged word displayed in a screen of the intelligent device in Step3, and judging whether the user speaks the tagged word continuously, if the user does not speak the tagged word, the user does not judge whether the user speaks the tagged word continuously, the system waits until the user speaks the tagged word, if the user speaks the tagged word and a target to be selected is just located at a tagged point corresponding to the tagged word, the user does not speak the tagged word continuously, the target is selected by taking the tagged point corresponding to the tagged word as a selection point, and if the user speaks the tagged word and the target to be selected is located outside the tagged point corresponding to the tagged word, the user speaks continuously, and a circle is generated, wherein the circle takes the tagged point corresponding to the tagged word as the circle center and the radius of the circle continuously increases along with the user's voice;

step5 cancels the association between the marked words and the marked points in Step3, clears the marked words displayed on the screen of the intelligent device in Step3, and clears the marked points created on the screen of the intelligent device in Step 2;

step6, dividing the circle generated on the screen of the intelligent device in Step4 into a plurality of arc segments, associating the arc segments with certain type of tagged words in the voice tagged word library respectively, and displaying the tagged words in the type of tagged words around the corresponding arc segments respectively;

step7, judging whether the user speaks a marker word displayed in the screen of the intelligent device in Step6, if the user does not speak the marker word, waiting until the user speaks the marker word, if the system waits for time out, returning to Step2, and if the user speaks the marker word, generating a circle by taking the center point of the arc segment corresponding to the marker word as the center of the circle and the distance between the center point and the intersection point of the adjacent arc segment of the arc segment as the radius;

step8 cancels the association of the marked words and the arc segments in Step6, clears the marked words displayed on the screen of the intelligent device in Step6, and clears the circle generated on the screen of the intelligent device in Step 4;

step9, dividing the circle generated on the screen of the intelligent device in Step7 into a plurality of areas, simultaneously associating each area with a certain type of tagged words in a voice tagged word library respectively, and displaying the tagged words in the type of tagged words in the corresponding area respectively;

step10 judges whether the user utters the tagged word displayed in the screen of the smart device in Step9, if the user has not uttered the tagged word, the system waits until the user utters the tagged word, if the system waits for time out, the system returns to Step2, and if the user utters the tagged word, the target selection is performed by taking the central point in the area corresponding to the tagged word as a selection point.

Specifically, the smart devices in the above method refer to computers and smart phones having a voice recognition function.

Specifically, the voice tagged word library in Step1 includes numeric tagged words, alphabetical tagged words, and text tagged words, or user-defined tagged words.

Specifically, the creation of the mark point in Step2 refers to: the screen of the intelligent device is divided into a plurality of blocks, and the central point of each block is taken as a mark point.

Specifically, the Step4 refers to repeating the sounding of the same tag word until the generated circle approaches the target to be selected, the user stops sounding, and the radius of the circle stops increasing.

Specifically, the association between Step3, Step6 and Step9 and a certain type of tagged word in the voice tagged word library is random, that is, the categories of the associated tagged words in the three steps can be the same or different, and the selection of the tagged words in the type of tagged words is random when the association is performed.

The invention has the beneficial effects that: the graphical interface target selection method based on the voice recognition emphasizes visual feedback to the user, generates a circle with a radius which is continuously enlarged through continuous vocalization of the user to acquire the position of the target point, and generates the circle based on the position to select the target, so that the visual feedback can be provided for the user, the user can clearly know what command is spoken when the user does not need to learn voice commands for many times, the user can conveniently use intelligent equipment, the time for selecting the target is shortened, and the precision for selecting the target is improved.

Detailed Description

The present invention will be described in further detail with reference to specific examples.

Example 1: a graphical interface target selection method based on voice recognition comprises the following steps:

step1, creating a voice mark word bank in the intelligent device, and setting at least one type of mark words in the voice mark word bank, wherein the type of mark words at least contains one mark word, and the voice mark word bank comprises digital mark words, letter mark words and character mark words, or user-defined mark words;

step2, creating a plurality of marking points on the screen of the intelligent equipment, dividing the screen of the intelligent equipment into a plurality of blocks, taking the central point of each block as a marking point, and displaying the marking point on the screen of the intelligent equipment;

step4, judging whether the user utters the marked word displayed in the screen of the intelligent device in Step3, and judging whether the user utters the marked word continuously, if the user does not utter the marked word, then not judging whether the user utters the marked word continuously, the system waits until the user utters the marked word, if the user utters the marked word and the target to be selected is just positioned at the marked point corresponding to the marked word, the user does not utter the marked word continuously, the target selection is carried out by taking the marked point corresponding to the marked word as the selection point, if the user utters the marked word and the target to be selected is positioned outside the marked point corresponding to the marked word, the user utters the marked word continuously, the continuous uttering means that the same marked word is uttered repeatedly, a circle with the marked point corresponding to the marked word as the center of the circle and the radius increasing continuously along with the utterance of the user is generated until the generated circle is close to the target to be selected, and the user stops uttering the voice, and the radius of the circle stops increasing.

The intelligent equipment in the method refers to a computer and a smart phone with a voice recognition function. The Step3, Step6 and Step9 are all associated randomly with a certain type of tagged word in the voice tagged word library, that is, the categories of the associated tagged words in the three steps can be the same or different, and the selection of the tagged words in the type of tagged words is also random when the association is carried out.

Example 2: the following describes the method for selecting a target on a graphical interface based on voice recognition in further detail by taking the example that a user randomly appears a target folder in a voice selection screen when using a computer with a voice recognition function.

Firstly, a voice mark word bank is created in the intelligent equipment, and a digital mark word is set in the voice mark word bank: "1", "2", "3", "4", "5", "6", "7", "8", "9", "0";

secondly, equally dividing the length and the width of the screen of the intelligent equipment into three parts on the screen of the intelligent equipment, dividing the screen into nine rectangular areas with equal areas, taking the diagonal intersection point of the rectangular areas as a mark point, and displaying the 9 mark points on the screen of the intelligent equipment;

thirdly, associating 9 marking points of the screen of the intelligent device with the '1', '2', '3', '4', '5', '6', '7', '8' and '9' of the 'digital' type marking words of the voice marking word stock respectively, and setting the marking words as the background of a rectangular area for dividing the screen to be displayed;

fourthly, assuming that the folder randomly appearing in the screen is positioned at the upper right corner of the screen and just below the mark point corresponding to the mark word "3", the user speaks the mark word "3" in the screen of the intelligent device and pauses the sound production, the mark point corresponding to the mark word "3" spoken by the user is taken as the selection point for target selection, assuming that the folder randomly appearing in the screen is positioned at the upper right corner of the screen and is not positioned below the mark point corresponding to the mark word "3" but is positioned in the area where the mark point is positioned, the user speaks the mark word "3" in the screen of the intelligent device and continuously produces the sound, namely, "333333 …", the mark point corresponding to the mark word "3" spoken by the user generates a circle with the mark point corresponding to the mark word as the center of the circle and the radius continuously increasing with the sound production of the user, and when the arc of the circle is close to the target file, the user stops the sound production, and the radius of the circle stops increasing;

fifthly, canceling the association between the mark words and the mark points in the third step, clearing the mark words displayed on the screen of the intelligent equipment in the third step, and clearing the mark points created on the screen of the intelligent equipment in the second step;

sixthly, taking the point of the upper arc of the circle intersected by the straight line which is vertical and passes through the center of the circle as a starting point, averagely dividing the arc into nine sections, respectively associating the arc sections with the '1', '2', '3', '4', '5', '6', '7', '8' and '9' of the 'digital' type marking words of the voice marking thesaurus, and displaying the marking words above the corresponding arc sections;

seventhly, assuming that a file randomly appearing in the screen is clamped under the 2 nd arc segment of the circle in the sixth step, and the user speaks a marker word '2' in the screen of the intelligent device, generating a circle by taking the center point of the arc segment corresponding to the marker word as the center of a circle and the distance between the center point and the intersection point of the adjacent arc segment of the arc segment as a radius;

eighth step, canceling the association between the mark word and the arc segment in the sixth step, clearing the mark word displayed on the screen of the intelligent device in the sixth step, and clearing the circle generated on the screen of the intelligent device in the fourth step;

ninth, the left horizontal radius of the circle generated on the screen of the intelligent device in the seventh step is rotated clockwise by 40 degrees and rotated 9 times, the circle is evenly divided into nine sector areas, the sector areas and the concentric circle with the radius of 1/3 are intersected, a circular ring evenly divided into nine areas is obtained after the intersected parts are removed, the concentric circle is added, the circle is divided into ten areas, meanwhile, each area is respectively associated with the '1', '2', '3', '4', '5', '6', '7', '8', '9' and '0' of the 'digital' type mark words of the voice mark word library, and the mark words are set as the background of the area dividing the circle to be displayed;

and step ten, assuming that the file randomly appearing in the screen is clamped under the area of the circle corresponding to the marker word "1" in the step ninth, the user speaks the marker word "1" displayed in the screen of the intelligent device, and the center point in the area corresponding to the marker word is taken as a selection point to perform target selection.

When the target is selected, the corresponding mark point coordinates or the central point coordinates of the corresponding area are obtained after the mark words are spoken, and then the system receives the coordinate information and controls the cursor to move to the coordinate position, so that the cursor can be moved to a blank area to perform right key operation and the like according to the method when the target does not appear on the graphical interface, and the target can be selected more accurately after the area selection is gradually divided through the steps, namely the target selection accuracy is improved, and the selection time is also saved.

The graphical interface target selection method based on voice recognition in the embodiment is suitable for being used in the situation that a disabled user or a user with inconvenient hands selects a random target at any position in a screen.

The above examples are only for describing the preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, and various modifications and improvements of the technical solution of the present invention by those skilled in the art should be made within the protection scope defined by the claims of the present invention without departing from the spirit of the present invention.

Claims

1. a graphical interface target selection method based on speech recognition, is characterized in that: comprise the following steps:

Step1 Create a speech mark word library in the smart device, and set at least one type of mark word in the speech mark word library, and this type of mark word contains at least one mark word;

Step2 Create several markers on the screen of the smart device and display them on the screen of the smart device;

Step3 Associating the marked points on the screen of the smart device with a certain type of marked words in the speech markup thesaurus, respectively, and displaying the marked words in this category around the corresponding marked points;

Step4 Determine whether the user speaks the marked word displayed on the screen of the smart device in Step 3, and at the same time determines whether the user continues to speak. Speak the marked word, if the user speaks the marked word and the target to be selected is just at the marked point corresponding to the marked word, the user will no longer continue to speak, and select the target with the marked point corresponding to the marked word as the selection point, If the user speaks the marked word and the target to be selected is outside the marked point corresponding to the marked word, the user continues to speak, and a circle is generated with the marked point corresponding to the marked word as the center and the radius of which increases continuously as the user continues to speak. Until the generated circle is close to the target to be selected, the user stops sounding, and the radius of the circle stops increasing;

Step5 cancel the association between the marked word and the marked point described in Step 3, and clear the marked word displayed on the screen of the smart device described in Step 3, and at the same time clear the marked point created on the screen of the smart device described in Step 2;

Step6 Divide the circle generated on the screen of the smart device described in Step4 into several arc segments, associate the arc segments with a certain type of marked words in the speech markup thesaurus, and display the marked words in this category respectively around the corresponding arc segment;

Step7 Determine whether the user speaks the marked word displayed on the screen of the smart device in Step 6. If the user has not spoken the marked word, the system waits until the user speaks the marked word. If the system waits for a timeout, it returns to Step 2. Speak a marked word, take the center point of the arc segment corresponding to the marked word as the center of the circle, and take the distance between the center point and the intersection of the adjacent arc segments of the arc segment as the radius to generate a circle;

Step8 cancel the association between the marked word and the arc segment described in Step 6, and clear the marked word displayed on the screen of the smart device described in Step 6, and at the same time clear the circle generated on the screen of the smart device described in Step 4;

Step9 Divide the circle generated on the screen of the smart device described in Step7 into several areas, and at the same time associate each area with a certain type of marked words in the speech markup thesaurus, and display the marked words in this category in the corresponding within the area;

Step 10 Determine whether the user has spoken the marked word displayed on the screen of the smart device in Step 9. If the user has not spoken the marked word, the system will wait until the user speaks the marked word. If the system waits for a timeout, it will return to Step 2. When the user speaks the marked word, the target selection is performed with the center point in the area corresponding to the marked word as the selection point.

2. The method for selecting a graphical interface target based on speech recognition according to claim 1, wherein the intelligent device in the Step1-Step10 refers to a computer with a speech recognition function.

3. The method for selecting a graphical interface target based on speech recognition according to claim 1 or 2, wherein the speech mark lexicon in the Step 1 comprises numeral mark words, letter mark words and text mark words.

4. the graphical interface target selection method based on speech recognition according to claim 1 and 2, it is characterized in that: the creation of mark point in described Step2 refers to: the screen of intelligent equipment is divided into several pieces, and get every piece the center point as the marker point.

5. the graphical interface target selection method based on speech recognition according to claim 1 and 2, is characterized in that: in described Step4, continuous uttering refers to the repeated uttering of the same marked word, until the generated circle is close to the target to be selected , the user stops sounding, and the circle radius stops increasing.

6. the graphical interface target selection method based on speech recognition according to claim 1 and 2, is characterized in that: in described Step3, Step6 and Step9, being associated with a certain type of marked word of speech mark thesaurus is random association , that is, the categories of the associated marked words in these three steps may be the same or different, and the selection of the marked words in a category of marked words is also random during the association.