WO2018112856A1 - 基于语音控制的位置定位方法、装置、用户设备及计算机程序产品 - Google Patents

基于语音控制的位置定位方法、装置、用户设备及计算机程序产品 Download PDF

Info

Publication number
WO2018112856A1
WO2018112856A1 PCT/CN2016/111591 CN2016111591W WO2018112856A1 WO 2018112856 A1 WO2018112856 A1 WO 2018112856A1 CN 2016111591 W CN2016111591 W CN 2016111591W WO 2018112856 A1 WO2018112856 A1 WO 2018112856A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice control
display interface
control instruction
location
current display
Prior art date
Application number
PCT/CN2016/111591
Other languages
English (en)
French (fr)
Inventor
骆磊
黄晓庆
Original Assignee
深圳前海达闼云端智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海达闼云端智能科技有限公司 filed Critical 深圳前海达闼云端智能科技有限公司
Priority to CN201680002796.1A priority Critical patent/CN107077319A/zh
Priority to PCT/CN2016/111591 priority patent/WO2018112856A1/zh
Publication of WO2018112856A1 publication Critical patent/WO2018112856A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback

Definitions

  • the present application relates to the field of communications technologies, and in particular, to a location control method, apparatus, user equipment, and computer program product based on voice control.
  • speech recognition can already perform limited operations, such as adding an alarm clock, adding a schedule, checking the weather, telling a story, chatting, and the like.
  • the voice content localization limitation is strong, and the operation of the user equipment cannot be performed efficiently with the user.
  • the present application provides a voice positioning based location positioning method, apparatus, user equipment, robot and computer program product, which are mainly used to improve the applicability of voice positioning.
  • a position control method based on voice control comprising: receiving a voice control instruction; determining, according to an image analysis technique, a position of a content indicated by the voice control instruction in a current display interface;
  • the voice control instruction includes: positioning content and instruction content; determining the location of the content in the voice control instruction in the current display interface, comprising: determining that the content indicated by the positioning content in the voice control instruction is currently Displaying a location in the interface; the method further includes: The determined location and the content of the instruction control the user equipment.
  • Locating to the determined location includes moving a cursor in the user device to the location.
  • Determining, by the image analysis technology, the location of the content indicated by the voice control instruction in the current display interface comprising: determining, according to an image analysis technique, a text of the content indicated by the voice control instruction or an indicated icon is currently displayed The location in the interface.
  • Determining, by the image analysis technology, the location of the content indicated by the voice control instruction in the current display interface including: searching for text of the content indicated by the voice control instruction in the text information displayed on the current display interface, and searching for The position of the text is determined as the position of the content indicated by the voice control instruction in the current display interface; or when the text indicated by the voice control instruction cannot be found in the text information displayed on the display interface,
  • the location of the content indicated by the voice control instruction in the current display interface is determined based on image analysis techniques.
  • the method also includes triggering the interactive button when the content indicated by the voice control instruction is on an interactive button.
  • Determining, by the image analysis technology, the location of the content indicated by the voice control instruction in the current display interface comprising: using a center point of the interaction button as a location of the content indicated by the voice control instruction in the current display interface; Triggering the interactive button includes: triggering a center position of the interactive button.
  • a user equipment system comprising: a display, a memory, one or more processors; and one or more modules, the one or more modules being stored in the memory and configured to be configured by the one or Executing by a plurality of processors, the one or more modules comprising instructions for performing the steps of any of the methods described above.
  • the computer program product comprises a computer program embodied in a computer readable storage medium, the computer program comprising instructions for causing the electronic device to perform various steps of any of the methods described above.
  • a position control device based on voice control comprising: a receiving module, configured to receive a voice control instruction; and a determining module, configured to determine, according to an image analysis technique, the voice control instruction The location of the content in the current display interface; an execution module for locating to the determined location.
  • the voice control instruction includes: positioning content and instruction content; the determining module is specifically configured to determine a location of the content indicated by the positioning content in the voice control instruction in the current display interface; The user equipment is controlled according to the determined location and the content of the instruction.
  • the execution module is specifically configured to move a cursor in the user equipment to the location.
  • the determining module is specifically configured to determine, according to an image analysis technology, a text of the content indicated by the voice control instruction or a position of the indicated icon in the current display interface.
  • the determining module is configured to search for text of the content indicated by the voice control instruction in the text information displayed by the current display interface, and determine a location where the found text is located as the content indicated by the voice control instruction. In the current display interface position; or when the text of the content indicated by the voice control instruction cannot be found in the text information displayed on the display interface, the content indicated by the voice control instruction is determined based on the image analysis technology. Shows the location in the interface.
  • the execution module is further configured to trigger the interactive button when the content indicated by the voice control instruction is located on an interactive button.
  • the determining module is specifically configured to use a center point of the interactive button as a position of the content indicated by the voice control instruction in the current display interface; and the executing module is specifically configured to trigger a center position of the interactive button.
  • the technical solution proposed by the foregoing embodiments of the present application based on the voice control command, instead of the traditional operation of controlling the user equipment by clicking and sliding the user equipment, requires only a very small number of command recognition, such as “open”, “click...”, “Input...”, “Up and Down”, etc., can realize a highly accurate voice control command, which can accurately locate the content in the voice control command according to the voice control command, and is used to solve the current received according to the received
  • the voice command has a strong limitation on the location of the voice content, which makes it impossible to efficiently cooperate with the user to complete the operation on the user equipment, and does not need to modify the original system and the application program, and does not need the complex semantic understanding module of the cloud. Achieve a good user experience.
  • FIG. 1 is a flowchart of a method for position location based on voice control according to Embodiment 1 of the present application;
  • FIG. 2 is a flowchart of a method for implementing a user equipment unlocking function based on voice control according to Embodiment 2 of the present application;
  • FIG. 3 is a schematic diagram of unlocking a user equipment based on voice control according to Embodiment 2 of the present application;
  • FIG. 4 is a schematic diagram of a current interface of a user that implements positioning based on voice control according to Embodiment 3 of the present application;
  • FIG. 5 is a flowchart of implementing user equipment control based on voice control according to Embodiment 3 of the present application;
  • FIG. 6 is a schematic diagram of a current interface of a user that implements positioning based on voice control according to Embodiment 3 of the present application;
  • FIG. 7 is a schematic structural diagram of a user equipment according to Embodiment 5 of the present application.
  • FIG. 8 is a schematic structural diagram of a user equipment according to Embodiment 5 of the present application.
  • FIG. 9 is a schematic structural diagram of a user equipment according to Embodiment 5 of the present application.
  • the solution in the embodiment of the present application can be applied to various scenarios, and the solution in the embodiment of the present application can be implemented in various computer languages, such as an object-oriented programming language Java.
  • a first embodiment of the present application provides a location control method based on voice control. As shown in FIG. 1 , the specific processing flow is as follows:
  • step 11 the user inputs a voice control command.
  • the user can input voice control commands through an audio device such as a microphone.
  • step 12 the user equipment receives the voice control instruction.
  • step 13 the user equipment determines the location of the content indicated by the voice control instruction in the current display interface based on the image analysis technique.
  • the process of determining the location of the content indicated by the voice control instruction in the current display interface based on the image analysis technology in the foregoing step S13 may also be coordinated by an electronic system composed of the user equipment and the server. The process of completing the position of the content indicated by the voice control instruction in the current display interface.
  • the user equipment intercepts all the display interfaces in the current display interface, and the user equipment sends the intercepted display interface and the voice control instruction to the server, and the server receives the intercepted display interface and the voice control command sent by the user equipment, and the service
  • the terminal determines the location of the received voice control command in the received display image by using the image control technology, and the server sends the determined location to the user equipment, and the user equipment receives the server to send the message.
  • the server analyzes the position of the voice control command obtained after the image is displayed in the current display interface, and the user equipment uses the preset coordinate origin in the current display interface as a starting point to match the current display interface with the location sent by the server to obtain the voice. Controls the position of the command in the current display interface.
  • the user equipment can also intercept all the display interfaces in the current display interface, send the display image and voice control commands to the server, and receive the position setting command sent by the server, where the position setting instruction is
  • the server analyzes and displays the position of the voice control command obtained after the interface in the current display interface
  • the user equipment obtains the coordinates included in the instruction, and uses the preset coordinate origin in the current display interface as the starting point to obtain the coordinates in the current display. The location in the interface.
  • the preset coordinate origin in the current display interface may be used as a starting point in the display image of the current display interface. Determine the corresponding image with the keyword in the current display interface.
  • the voice control instruction may include positioning content and/or instruction content when determining a corresponding location with the voice control instruction in the current display interface.
  • the text of the content indicated by the voice control instruction or the position of the indicated icon in the current display interface may be determined based on image analysis techniques.
  • searching for the text of the content indicated by the voice control instruction in the text information displayed on the current display interface and determining the location of the found text as the position of the content indicated by the voice control instruction in the current display interface;
  • the position of the content indicated by the voice control instruction in the current display interface is determined based on the image analysis technique.
  • Step 14 Position to the determined position.
  • the voice control instruction includes the positioning content and the instruction content; and determining the location of the content indicated by the positioning content in the voice control instruction in the current display interface, the method further includes:
  • the user equipment is controlled based on the determined location and the content of the instruction.
  • the cursor in the user device can be moved to a location.
  • the method may further include:
  • the interactive button is triggered when the content indicated by the voice control instruction is on an interactive button.
  • the center point of the interactive button may be used as the position of the content indicated by the voice control instruction in the current display interface, and the center position of the interactive button is triggered.
  • the second embodiment of the present application further describes a location control method based on voice control in a specific example.
  • the user equipment unlocking function is implemented by using the technical solution proposed in this application. As shown in FIG. 2, the specific processing flow is as follows:
  • step 21 the user invokes the voice input function by touching the user equipment.
  • the user can wake up the user equipment by touching the user equipment screen, and then call up the voice input function, or wake up the user equipment through the Home button, or directly call the voice input function by touch, or omit the step 21, It is not specifically limited herein.
  • step 22 the user sends an unlocked voice control command.
  • the unlocked voice control command may be a direct unlocking vocabulary, or may be a left-sliding unlocking block, a sliding unlocking block, or the like. It is not specifically limited herein.
  • the user equipment unlocking function is described in detail by sliding to the left as an example. In a specific implementation, it may also be a method of sliding to the right, sliding upward, sliding downward, turning, or folding. It is not specifically limited herein.
  • Step 23 The user equipment receives the unlocked voice control command sent by the user.
  • Step 24 The user equipment parses the received voice control command, and obtains the indicated content as unlocking.
  • Step 25 The user equipment unlocks according to the indicated content, and determines the location of the unlocking point.
  • Unlocking can be done by sliding the slider to the left or to the right, but in either case, you need to find the unlock point.
  • the sliding point is swiped to the left as an example for detailed description.
  • the user equipment unlocks according to the acquired keyword, and determines the position of the unlocking point corresponding to the keyword sliding in the current display interface based on the image analysis technology in the image of the current display interface.
  • step 26 the unlocking point is triggered to slide to the left to unlock the user equipment.
  • the third embodiment of the present application further elaborates a location control method based on voice control to implement a method for controlling a user equipment.
  • the current display interface of the user equipment is an application program, which includes multiple contacts. people.
  • the contact confirmation function in an application is implemented by the technical solution proposed in the present application, as shown in FIG. 5, and the specific processing flow is as follows:
  • step 51 the user sends a voice control command that clicks on user A.
  • the user wants to contact the user A in the application through an application program.
  • the user sends a voice control command that clicks on the user A. .
  • Step 52 The user equipment receives the voice control instruction of clicking user A.
  • step 53 the voice recognition module in the user equipment recognizes the voice control command as the click user A of the text.
  • Step 54 The voice parsing module in the user equipment parses the click operation, and the click location is the user. A.
  • Step 55 The processing module in the user equipment obtains the display interface of the current user equipment, and matches the user A in the display interface of the current user equipment based on the image analysis technology.
  • the processing module of the user equipment itself based on the image analysis technology, matches the user A in the display interface of the current user equipment.
  • the processing module may also take a screenshot of the current display interface of the user equipment, and upload the current display interface after the screenshot to the server.
  • the server matches the keyword A in the received screenshot, and obtains the user A.
  • the location the server transmits the location to the processing module.
  • the screenshot When the screenshot is transmitted, it can be transmitted in a compressed manner, and no specific limitation is made here.
  • Step 56 The user equipment determines the location of the user A according to a predefined coordinate origin.
  • the lower left corner of the user equipment screen is defined as coordinates (0, 0).
  • the horizontal axis is the X axis and the vertical axis is the Y axis.
  • the current screen resolution is 1080x1920
  • the user A text analyzed in step 55 has an X-axis interval of 240-420 in the image and a Y-axis interval of 1300-1400, the pixel point is clicked to locate the center point of the rectangle. In this case (330, 1350).
  • step 57 the user A is located.
  • the center point of the location where the user A is located is (330, 1350), and the cursor of the user equipment can be located at the (330, 1350).
  • Step 58 Trigger an interactive button according to the control content included in the received voice control instruction.
  • the processing module of the user equipment clicks the user A according to the click in the voice control command.
  • step 58 the processing module of the user equipment clicks on the pixel point (330, 1350) according to the click in the voice control command.
  • contacts in the address book are stored in text form, but the application functions in some applications are presented to the user in a graphical manner for easy identification and aesthetics.
  • the mobile phone is unlocked in the second embodiment, and the unlocking module is also displayed in a graphical manner.
  • the text matching icon is taken as an example in the fourth embodiment of the present application, and the processing flow is as follows:
  • step one the user sends a searched voice control command.
  • step two the user equipment receives the searched voice control command.
  • Step 3 The voice recognition module in the user equipment recognizes the voice control command as a text search.
  • Step 4 The processing module in the user equipment obtains the current display interface of the user equipment, and matches the search in the current display interface of the user equipment based on the image analysis technology.
  • the processing module of the user equipment itself matches the search in the current display interface of the user equipment.
  • the processing module may also take a screenshot of the current display interface of the user equipment, and upload the current display interface after the screenshot to the server, and the server performs matching according to the keyword search in the received screenshot to obtain the location of the search.
  • the server transmits the location to the processing module.
  • the third embodiment of the present application is described in detail by taking the location of the user equipment itself as an example.
  • the user equipment searches for the search text and the graphics library of the search system preset by the analysis system based on the image analysis technology. Search only the search text, and do not match any graphics related to the search.
  • step six the user equipment determines the obtained location according to a predefined coordinate origin.
  • the control of the user equipment is implemented after the location corresponding to the voice control instruction is determined.
  • the cursor may be moved. At this position, the voice control command input by the user is waited for, or other commands are operated accordingly, and are not specifically limited herein.
  • a fifth embodiment of the present application provides a user equipment, including:
  • Display memory, one or more processors, and one or more modules, one or more
  • the modules are stored in a memory and are configured to be executed by one or more processors, and the one or more modules include instructions for performing the various steps in the first embodiment of the method. I will not repeat them here.
  • the memory may be a volatile memory, such as a random access memory (English: random-access memory, abbreviation: RAM); or a nonvolatile Non-volatile memory (English: non-volatile memory), such as flash memory (English: flash memory), hard disk (English: hard disk drive, abbreviation: HDD) or solid state drive (English: solid-state drive, abbreviation: SSD); Or a combination of the above types of memories.
  • RAM random access memory
  • nonvolatile Non-volatile memory such as flash memory (English: flash memory), hard disk (English: hard disk drive, abbreviation: HDD) or solid state drive (English: solid-state drive, abbreviation: SSD); Or a combination of the above types of memories.
  • the processor can be a central processing unit (CPU) or a combination of a CPU and a hardware chip.
  • CPU central processing unit
  • the processor can be a central processing unit (CPU) or a combination of a CPU and a hardware chip.
  • the processor can also be a network processor (NP). Either a combination of CPU and NP, or a combination of NP and hardware chips.
  • NP network processor
  • the hardware chip may be a combination of one or more of the following: an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), and a complex programmable logic device (complex) Programmable logic device, CPLD).
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate array
  • CPLD complex programmable logic device
  • one or more modules in the user equipment proposed in Embodiment 5 of the present application may be corresponding functions of the device module provided in the foregoing embodiment.
  • the logical structure of the computing node of the control method of the user equipment provided by the embodiment of the present application is introduced by using FIG. 7 as an example.
  • the computing node may be a user equipment, and the user equipment may specifically be a desktop computer, a notebook computer, a smart phone or a tablet computer.
  • the hardware layer of the user equipment includes a central processing unit (CPU), a graphics processing unit (GPU), and the like, and may further include a memory and an input/output device (Input Device).
  • the input device may include a keyboard, a mouse, a touch screen, etc.
  • the output device may include a display device such as a liquid crystal display (LCD), a cathode ray tube (CRT), a holographic image (Holographic), Projector, etc.
  • LCD liquid crystal display
  • CRT cathode ray tube
  • Holographic holographic image
  • Projector Projector
  • the core library layer is the core part of the operating system, including input/output services, core services, graphics device interfaces, and graphics engine (Graphics Engine) for CPU and GPU graphics processing.
  • the graphics engine may include a 2D engine, a 3D engine, a composition, a frame buffer, and the like.
  • the core library layer also includes input method services. Among them, the input method service includes the input method service provided by the terminal.
  • the terminal further includes a driving layer, a frame layer, and an application layer.
  • the driver layer may include a CPU driver, a GPU driver, a display controller driver, a Trust Zone Driver, and the like.
  • the framework layer may include a graphic service (Graphic Service), a system service (System service), a web service (Web Service), and a customer service (Customer Service); and the graphic service may include, for example, a widget (widget) or a canvas (Canvas). , Views, Render Script, etc.
  • the application layer may include a desktop, a media player, a browser, and the like.
  • the user equipment proposed by the embodiment of the present application includes at least one processor 201, at least one network interface 204 or other user interface 203, a memory 205, and at least one communication bus 202.
  • Communication bus 202 is used to implement connection communication between these components.
  • the user device 200 optionally includes a user interface 203, including a display (such as the LCD, CRT, Holographic or Projector shown in FIG. 7), a keyboard or a pointing device (eg, a mouse, a trackball ( Trackball), touchpad or touch screen, etc.).
  • a display such as the LCD, CRT, Holographic or Projector shown in FIG. 7
  • a keyboard or a pointing device eg, a mouse, a trackball ( Trackball), touchpad or touch screen, etc.
  • the memory 205 may include read only memory and random access memory, and provides the processor 201 with program instructions and data stored in the memory 205.
  • a portion of the memory 205 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 205 stores the following elements, executable modules or data structures, or a subset thereof, or their extended set:
  • the operating system 2051 includes various system program instructions that can be run, for example, at the framework layer, core library layer, driver layer, etc., as shown in FIG. 8, for implementing various basic services and processing hardware-based tasks.
  • the application 2052 includes various applications, such as a desktop, a media player, a browser, and an input method application, as shown in FIG. Now various application businesses.
  • the memory 205 may also be referred to as a storage area for storing data programs and storing an operating system.
  • the processor 201 is configured to execute the method steps stored in the memory 205, and the processor 201 is configured to execute the method steps in the first embodiment of the method according to the obtained program instructions, and details are not described herein.
  • the user equipment applied in the method for controlling a user equipment may be a mobile phone, a tablet computer, a personal digital assistant (PDA), or the like.
  • PDA personal digital assistant
  • FIG. 9 it is a schematic diagram of one of the structural components of the user equipment 300 .
  • the user equipment 300 mainly includes a memory 320, a processor 360, and an input unit 330, and the input unit 330 is configured to receive a generated event when the user performs an operation on the terminal.
  • the memory 320 is used to store program instructions for the operating system and various applications.
  • processor 360 can be referred to the detailed description of the processor 201 described above, and details are not described herein.
  • the memory 320 may be a memory of the user equipment 300, and the memory may be divided into three storage spaces, corresponding to the security memory set in the first running environment, the non-secure memory set in the second environment, and the first operating environment and the first
  • the shared memory that can be accessed by applications or hardware in the running environment.
  • the space division of secure memory, non-secure memory, and shared memory can be divided into the same size, or can be divided into different sizes according to different storage data input events.
  • the input unit 330 in the user device can be used to receive numeric or character information input by the user, as well as to generate signal inputs related to user settings and function control of the user device 300.
  • the input unit 330 may include a touch panel 331.
  • the touch panel 331 can collect operations of the user (such as the user using a finger, a stylus, or the like on the touch panel 331), and drive and touch the panel according to preset program instructions. 331 corresponding connection device.
  • the touch panel 331 can include two parts: a touch detection device and a touch controller.
  • the touch detection device detects the touch orientation of the user, and detects a signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts the touch information into contact coordinates, and sends the touch information.
  • the touch panel 331 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves.
  • the input unit 330 may further include other input devices 332, which may include, but are not limited to, physical keyboards, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, and the like. One or more of them.
  • the user device 300 can also include a display unit 340 that can be used to display information entered by the user or information provided to the user and various menu interfaces of the user device 300.
  • the display unit 340 can include a display panel 341.
  • the display panel 341 can be configured in the form of a liquid crystal display (LCD) or an organic light-emitting diode (OLED).
  • the processor 360 is a control center of the user device 300 that connects various portions of the entire handset using various interfaces and lines, and executes various types of the user device 300 by running or executing software programs and/or modules stored in the memory 320.
  • the user equipment 300 is monitored as a whole by functioning and processing the data.
  • the optional user device 300 can also include an RF circuit 310, a WIFI module 380 for providing wireless connectivity, and a power source 390 and an audio circuit 370 for providing sound input and output.
  • a sixth embodiment of the present application provides a computer program product, the computer program product comprising a computer program embedded in a computer readable storage medium, the computer program comprising: for causing the electronic device to perform the first embodiment
  • a seventh embodiment of the present application provides a location control apparatus based on voice control, including:
  • the receiving module is configured to receive a voice control instruction.
  • a determining module configured to determine, according to an image analysis technique, a location of the content indicated by the voice control instruction in the current display interface.
  • An execution module for locating to the determined location.
  • the voice control instruction includes: positioning content and instruction content; the determining module is specifically configured to determine a location of the content indicated by the positioning content in the voice control instruction in a current display interface; It is further configured to control the user equipment according to the determined location and the instruction content.
  • the execution module is specifically configured to move a cursor in the user equipment to the location.
  • the determining module is specifically configured to determine, according to an image analysis technology, a text of the content indicated by the voice control instruction or a position of the indicated icon in the current display interface.
  • the determining module is configured to search for text of the content indicated by the voice control instruction in the text information displayed on the current display interface, and determine a location where the found text is located as the voice control instruction Determining the position of the indicated content in the current display interface; or determining the text of the content indicated by the voice control instruction in the text information displayed on the display interface, determining, according to the image analysis technique, the voice control instruction The location of the content in the current display interface.
  • the executing module is further configured to trigger the interactive button when the content indicated by the voice control instruction is located on an interactive button.
  • the determining module is specifically configured to use a center point of the interactive button as a location of the content indicated by the voice control instruction in the current display interface; and the executing module is specifically configured to trigger a center of the interactive button position.
  • another embodiment of the present application further provides an electronic system including a user equipment and a server; the display, a memory, one or more processors, and a communication unit; the server includes a memory, one or more a processor, and a communication unit; each communication unit is configured to implement communication with an external device; further comprising: one or more modules, the one or more modules being stored in a memory of the user device or the server, and configured to Executed by a respective processor, the one or more modules include instructions for performing the steps in the technical solutions set forth in any of the embodiments of the first embodiment to the fourth embodiment.
  • the user equipment here can also be a robot.
  • embodiments of the present application can be provided as a method, apparatus (device), or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, read-only optical disks, optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种基于语音控制的位置定位方法、装置、用户设备及计算机程序产品,该方法包括:接收语音控制指令(12);基于图像分析技术确定语音控制指令所指示的内容在当前显示界面中的位置(13);定位至确定出的位置(14)。用于解决通过语音控制指令实现定位时,局限性较强,不能高效地配合用户完成对用户设备的操作的问题。

Description

基于语音控制的位置定位方法、装置、用户设备及计算机程序产品 技术领域
本申请涉及通信技术领域,尤其是涉及一种基于语音控制的位置定位方法、装置、用户设备及计算机程序产品。
背景技术
随着语音识别技术的不断发展与应用,语音识别已经可以进行有限的操作,如添加闹钟、添加日程、查天气、讲故事、聊天等等功能。
然而,对于已经固化的应用程序,只能由此应用程序的开发者专门开发一套复杂的语音接口来进行比较简单的行为操作,此类操作大多针对一个常用行为,且受限于语音识别技术的现状,无法实现足够智能的操作。而针对现有的UI操作界面,目前也只能通过点击、滑动等动作进行操作,而无法通过语音进行准确定位,并进而对用户设备进行操作。
因此目前根据接收到的语音指令,实现语音内容定位局限性较强,进而使得不能高效地配合用户完成对用户设备的操作。
发明内容
本申请提供了一种基于语音控制的位置定位方法、装置、用户设备、机器人及计算机程序产品,主要用于提高语音定位的适用性。
一种基于语音控制的位置定位方法,包括:接收语音控制指令;基于图像分析技术确定所述语音控制指令所指示的内容在当前显示界面中的位置;
定位至确定出的所述位置。
所述语音控制指令包括:定位内容和指令内容;所述确定所述语音控制指令中的内容在当前显示界面中的位置,包括:确定所述语音控制指令中的定位内容所指示的内容在当前显示界面中的位置;所述方法还包括:根据所 确定的位置和所述指令内容,控制用户设备。
定位至确定出的所述位置,包括:将用户设备中的光标移动至所述位置处。
所述基于图像分析技术确定所述语音控制指令所指示的内容在当前显示界面中的位置,包括:基于图像分析技术确定所述语音控制指令所指示的内容的文字或者所指示的图标在当前显示界面中的位置。
基于图像分析技术确定所述语音控制指令所指示的内容在当前显示界面中的位置,包括:在当前显示界面所显示的文本信息中查找所述语音控制指令所指示的内容的文字,将查找到的文字所在的位置确定为所述语音控制指令所指示的内容在当前显示界面中的位置;或在显示界面所显示的文本信息中无法查找到所述语音控制指令所指示的内容的文字时,基于图像分析技术确定所述语音控制指令所指示的内容在当前显示界面中的位置。
所述方法还包括:在所述语音控制指令所指示的内容位于一个交互按钮上时,触发所述交互按钮。
所述基于图像分析技术确定所述语音控制指令所指示的内容在当前显示界面中的位置,包括:将交互按钮的中心点作为所述语音控制指令所指示的内容在当前显示界面中的位置;触发所述交互按钮,包括:触发所述交互按钮的中心位置。
一种用户设备系统,包括:显示器,存储器,一个或多个处理器;以及一个或多个模块,所述一个或多个模块被存储在所述存储器中,并被配置成由所述一个或多个处理器执行,所述一个或多个模块包括用于执行上述方法中任一所述方法中各个步骤的指令。
所述计算机程序产品包括内嵌于计算机可读的存储介质中的计算机程序,所述计算机程序包括用于使所述电子设备执行上述任一所述方法中的各个步骤的指令。
一种基于语音控制的位置定位装置,包括:接收模块,用于接收语音控制指令;确定模块,用于基于图像分析技术确定所述语音控制指令所指示的 内容在当前显示界面中的位置;执行模块,用于定位至确定出的所述位置。
所述语音控制指令包括:定位内容和指令内容;所述确定模块,具体用于确定所述语音控制指令中的定位内容所指示的内容在当前显示界面中的位置;所述执行模块,还用于根据所确定的位置和所述指令内容,控制用户设备。
所述执行模块,具体用于将用户设备中的光标移动至所述位置处。
所述确定模块,具体用于基于图像分析技术确定所述语音控制指令所指示的内容的文字或者所指示的图标在当前显示界面中的位置。
所述确定模块,具体用于在当前显示界面所显示的文本信息中查找所述语音控制指令所指示的内容的文字,将查找到的文字所在的位置确定为所述语音控制指令所指示的内容在当前显示界面中的位置;或在显示界面所显示的文本信息中无法查找到所述语音控制指令所指示的内容的文字时,基于图像分析技术确定所述语音控制指令所指示的内容在当前显示界面中的位置。
所述执行模块,还用于在所述语音控制指令所指示的内容位于一个交互按钮上时,触发所述交互按钮。
所述确定模块,具体用于将交互按钮的中心点作为所述语音控制指令所指示的内容在当前显示界面中的位置;所述执行模块,具体用于触发所述交互按钮的中心位置。
通过本申请上述各实施例提出技术方案,基于语音控制命令,代替用户对用户设备进行点击和滑动等传统操作控制用户设备,只需要极少数的命令识别,如“打开”、“点击…”、“输入…”、“上划下划”等等,即可实现准确度很高的语音控制命令,能够准确根据语音控制命令实现语音控制命令中的内容进行定位,用于解决目前根据接收到的语音指令,实现语音内容定位局限性较强,进而使得不能高效地配合用户完成对用户设备的操作,且不需要对原有系统和应用程序进行改动,也不再需要云端复杂的语意理解模块,达到良好的用户体验。
附图说明
图1为本申请实施例一提出的基于语音控制的位置定位方法流程图;
图2为本申请实施例二提出的基于语音控制实现用户设备解锁功能方法流程图;
图3为本申请实施例二提出的基于语音控制实现用户设备解锁滑动示意图;
图4为本申请实施例三提出的基于语音控制实现定位的用户当前界面示意图;
图5为本申请实施例三提出的基于语音控制实现用户设备控制流程图;
图6为本申请实施例三提出的基于语音控制实现定位的用户当前界面示意图;
图7为本申请实施例五提出的用户设备结构组成示意图;
图8为本申请实施例五提出的用户设备结构组成示意图;
图9为本申请实施例五提出的用户设备结构组成示意图。
具体实施方式
针对现有技术中本申请实施例中的方案可以应用于各种场景中,本申请实施例中的方案可以采用各种计算机语言实现,例如面向对象的程序设计语言Java等。
为了使本申请各实施例中的技术方案及优点更加清楚明白,以下结合附图对本申请的示例性实施例进行进一步详细的说明,显然,所描述的实施例仅是本申请的一部分实施例,而不是所有实施例的穷举。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
实施例一
本申请实施例一提出一种基于语音控制的位置定位方法,如图1所示,其具体处理流程如下述:
步骤11,用户输入语音控制指令。
用户可以通过音频设备,例如麦克风等输入语音控制指令。
步骤12,用户设备接收语音控制指令。
步骤13,用户设备基于图像分析技术确定语音控制指令所指示的内容在当前显示界面中的位置。
作为一种替换的实施方式,上述的步骤S13中“基于图像分析技术确定语音控制指令所指示的内容在当前显示界面中的位置”的过程也可以由用户设备和服务端共同组成的电子系统配合完成语音控制指令所指示的内容在当前显示界面中的位置的处理过程。
具体实施中,用户设备截取当前显示界面中的全部显示界面,用户设备将截取的显示界面和语音控制指令一起发送至服务端,服务端接收用户设备发送的截取的显示界面和语音控制指令,服务端基于接收到的语音控制指令,利用图像分析技术,在接收到的显示图像中确定接收到的语音控制指令所在的位置,服务端将确定出的位置发送给用户设备,用户设备接收服务端发送的服务端分析显示图像后获得的语音控制指令在当前显示界面中的位置,用户设备以当前显示界面中的预设的坐标原点为起点,将当前显示界面和服务端发送的位置匹配,获取语音控制指令在当前显示界面中的位置。
在该种方式中,用户设备还可以截取当前显示界面中的全部显示界面,将显示图像和语音控制指令发送至服务端,接收服务端发送的位置设定的指令,其中位置设定的指令是服务端分析显示界面后获得的语音控制指令在当前显示界面中的位置后发送的,用户设备获得指令中包含的坐标,以当前显示界面中的预设的坐标原点为起点,获得坐标在当前显示界面中的位置。
在具体实施时,可以当前显示界面中的预设的坐标原点为起点,在当前显示界面的显示图像中。确定与关键字在当前显示界面中的对应的图像。
在具体实施时,在确定与语音控制指令在当前显示界面中的对应的位置时,该语音控制指令可以包含定位内容和/或指令内容。
具体地,可以基于图像分析技术确定语音控制指令所指示的内容的文字或者所指示的图标在当前显示界面中的位置。
具体地,在当前显示界面所显示的文本信息中查找语音控制指令所指示的内容的文字,将查找到的文字所在的位置确定为语音控制指令所指示的内容在当前显示界面中的位置;或
在显示界面所显示的文本信息中无法查找到所述语音控制指令所指示的内容的文字时,基于图像分析技术确定语音控制指令所指示的内容在当前显示界面中的位置。
步骤14,定位至确定出的位置。
具体地,语音控制指令包括定位内容和指令内容;则确定语音控制指令中的定位内容所指示的内容在当前显示界面中的位置,上述方法还包括:
根据所确定的位置和所述指令内容,控制用户设备。
一种较佳地实施方式,可以将用户设备中的光标移动至位置处。
进一步地,在上述步骤14之后,还可以包括:
在语音控制指令所指示的内容位于一个交互按钮上时,触发交互按钮。
具体地,在确定位置时,可以将交互按钮的中心点作为语音控制指令所指示的内容在当前显示界面中的位置,触发交互按钮的中心位置。
实施例二
本申请实施例二以一具体实例进一步详细阐述基于语音控制的位置定位方法,通过本申请提出的技术方案实现用户设备解锁功能,如图2所示,其具体处理流程如下述:
步骤21,用户通过触摸用户设备调出语音录入功能。
在上述步骤21中,用户可以通过触摸用户设备屏幕唤醒用户设备,然后调出语音录入功能,也可以通过Home键唤醒用户设备,或者通过触摸方式直接调出语音录入功能,或者省略该步骤21,在此不做具体地限定。
步骤22,用户发送解锁的语音控制指令。
在本申请实施例二提出的技术方案中,解锁的语音控制指令,可以是直接是解锁词汇,还可以是向左滑动解锁块、滑动解锁块之类的语音控制指令, 在此不做具体地限定。如图3所示,用户设备解锁功能以向左滑动为例进行详细阐述。具体实施中,还可以是向右滑动、向上滑动、向下滑动、转圈、或者折线等方式。在此不做具体地限定。
步骤23,用户设备接收用户发送的解锁的语音控制指令。
步骤24,用户设备对接收到的语音控制指令进行解析,获得所指示的内容为解锁。
步骤25,用户设备根据所指示的内容解锁,确定解锁点的位置。
解锁可以是向左滑动滑动块实现,也可以是向右等方式,但是无论是哪种方式实现,均需要找到解锁点。在本申请实施例提出的技术方案中,将以向左滑动解锁点为例进行详细阐述。
在上述步骤25中,用户设备根据获取的关键字解锁,在当前显示界面的图像中,基于图像分析技术,确定关键字滑动对应的解锁点在当前显示界面中的位置。
步骤26,触发解锁点向左滑动解锁用户设备。
实施例三
本申请实施例三以一具体实例进一步详细阐述基于语音控制的位置定位方法,实现用户设备的控制方法,如图4所示,用户设备当前显示界面,为某一应用程序,其中包含多个联系人。通过本申请提出的技术方案实现某应用程序中联系人确认功能,如图5所示,其具体处理流程如下述:
步骤51,用户发送点击用户A的语音控制指令。
在本申请实施例三提出的技术方案中,用户通过某应用程序,欲和该应用程序中的用户A进行联系,按照本申请实施例三提出的技术方案,用户发送点击用户A的语音控制指令。
步骤52,用户设备接收点击用户A的语音控制指令。
步骤53,用户设备中的语音识别模块将语音控制指令识别成文字的点击用户A。
步骤54,用户设备中的语音解析模块解析出点击操作,点击位置为用户 A。
步骤55,用户设备中的处理模块,获得当前用户设备的显示界面,并基于图像分析技术,在当前用户设备的显示界面中匹配用户A。
在上述步骤55中,本申请实施例三以用户设备自身的处理模块,基于图像分析技术,在当前用户设备的显示界面中匹配用户A。
具体实施中,处理模块还可以将用户设备的当前显示界面截图,并将截图后的当前显示界面上传至服务端,服务端根据关键字用户A在接收到的截图中进行匹配,获得用户A所在的位置,服务端将位置传输给处理模块。其中在传输截图时,可以以压缩的方式传输,在此不做具体的限定。
步骤56,用户设备根据预先定义的坐标原点,确定用户A的位置。
假设用户设备屏幕左下角定义为坐标的(0,0),如图6所示,横轴为X轴,纵轴为Y轴。假设当前屏幕分辨率为1080x1920,假设步骤55中分析到的用户A文字在图像中的X轴区间为240-420,Y轴区间为1300-1400,则将点击像素点定位这个矩形的中心点,此例中为(330,1350)。
步骤57,定位到用户A处。
仍沿用上述步骤56中的实施例,用户A所在位置的中心点为(330,1350),则可以将用户设备的光标定位在该(330,1350)处。
步骤58,根据接收到的语音控制指令中包含的控制内容触发交互按钮。
在接收到的语音控制指令中,包含点击字样,则在定位到确定出的位置之后,用户设备的处理模块按照语音控制命令中的点击,点击用户A。
在上述步骤58中,用户设备的处理模块按照语音控制命令中的点击,点击像素点(330,1350)。
实施例四
通常情况下,通讯录中的联系人是文字形式存储,但是某些应用程序中的应用功能,为便于识别和美观,通过采用图形的方式展示给用户。例如实施例二中的手机解锁,解锁模块同样是图形方式展示,基于此,本申请实施例四以文字匹配图标为例,进一步详细阐述,其处理流程如下述:
步骤一,用户发送搜寻的语音控制指令。
步骤二,用户设备接收搜寻的语音控制指令。
步骤三,用户设备中的语音识别模块将语音控制指令识别成文字的搜寻。
步骤四,用户设备中的处理模块,获得用户设备的当前显示界面,并基于图像分析技术,在用户设备的当前显示界面中匹配搜寻。
在上述步骤五中,本申请实施例四以用户设备自身的处理模块,基于图像分析技术,在用户设备的当前显示界面中匹配搜寻。具体实施中,处理模块还可以将用户设备的当前显示界面截图,并将截图后的当前显示界面上传至服务端,服务端根据关键字搜寻在接收到的截图中进行匹配,获得搜寻所在的位置,服务端将位置传输给处理模块。
具体地,本申请实施例三以用户设备自身进行位置定位为例进行详细阐述,用户设备基于图像分析技术在抓取的屏幕图像中搜寻搜寻文字和分析系统预置的关于搜寻的图形库,如果只搜寻到搜寻文字,未匹配到任何关于搜寻的图形,则与实施例三相同,对中心的像素点进行点击操作;如果未搜寻到搜寻文字,但匹配到了关于搜寻的图形(右上角的放大镜图标),则对放大镜的中心像素点进行点击操作;如果既搜寻到搜寻文字,又匹配到了关于搜寻的图形,如图6所示的放大镜,则需要进一步分析搜寻文字周围是否有文字,放大镜图形周围是否有文字(有文字的判定为实际的内容),将周围没有文字的对象判定为点击目标,并对中心像素点进行点击操作。
步骤六,用户设备根据预先定义的坐标原点,确定获得的位置。
在上述步骤一~步骤六中,是以在确定出语音控制指令对应的位置之后,实现对用户设备的控制为例进行详细阐述,具体实施中,在上述步骤六之后,还可以是将光标移动到该位置处,等待用户再次输入的语音控制指令,或者其它指令进行相应操作,在此不做具体地限定。
实施例五
本申请实施例五提出一种用户设备,包括:
显示器,存储器,一个或多个处理器,以及一个或多个模块,一个或多 个模块被存储在存储器中,并被配置成由一个或多个处理器执行,一个或多个模块包括用于执行方法实施例一中各个步骤的指令。在此不做赘述。
在本申请实施例五提出的技术方案中,存储器,存储器可以是易失性存储器(英文:volatile memory),例如随机存取存储器(英文:random-access memory,缩写:RAM);或者非易失性存储器(英文:non-volatile memory),例如快闪存储器(英文:flash memory),硬盘(英文:hard disk drive,缩写:HDD)或固态硬盘(英文:solid-state drive,缩写:SSD);或者上述种类的存储器的组合。
处理器可以是中央处理器(central processing unit,CPU),或者是CPU和硬件芯片的组合。
处理器还可以是网络处理器(network processor,NP)。或者是CPU和NP的组合,或者是NP和硬件芯片的组合。
上述硬件芯片可以是以下一种或多种的组合:专用集成电路(application-specific integrated circuit,ASIC),现场可编程逻辑门阵列(field-programmable gate array,FPGA),复杂可编程逻辑器件(complex programmable logic device,CPLD)。
可选地,在本申请实施例五提出的用户设备中的一个或多个模块,该些模块可以是具备上述实施例是中提出的装置模块的相应功能。
进一步地,本申请实施例五提出的技术方案中,以图7为例介绍本申请实施例提供的用户设备的控制方法计算节点的逻辑结构。该计算节点可以是用户设备,该用户设备具体可以为桌面计算机、笔记本电脑、智能手机或平板电脑等。如图7所示,该用户设备的硬件层包括中央处理器(Center Processing Unit,CPU)、图形处理器(Graphic Processing Unit,GPU)等,当然还可以包括存储器、输入/输出设备(Input Device)、网络接口等,输入设备可包括键盘、鼠标、触摸屏等,输出设备可包括显示设备如液晶显示器(Liquid Crystal Display,LCD)、阴极射线管(Cathode Ray Tube,CRT)、全息成像(Holographic)、投影(Projector)等。在硬件层之上可运行有操作系 统(如Android等)以及一些应用程序。核心库层是操作系统的核心部分,包括输入/输出服务、核心服务、图形设备接口以及实现CPU、GPU图形处理的图形引擎(Graphics Engine)等。图形引擎可包括2D引擎、3D引擎、合成器(Composition)、帧缓冲区(Frame Buffer)等。核心库层还包括输入法服务。其中,输入法服务包括终端自带的输入法服务。除此之外,该终端还包括驱动层、框架层和应用层。驱动层可包括CPU驱动(driver)、GPU驱动、显示控制器驱动、安全区域驱动(Trust Zone Driver)等。框架层可包括图形服务(Graphic Service)、系统服务(System service)、网页服务(Web Service)和用户服务(Customer Service)等;图形服务中,可包括如微件(Widget)、画布(Canvas)、视图(Views)、Render Script等。应用层可包括桌面(launcher)、媒体播放器(Media Player)、浏览器(Browser)等。
本申请实施例提出的用户设备,如图8所示,该用户设备200包括:至少一个处理器201,至少一个网络接口204或者其他用户接口203,存储器205,至少一个通信总线202。通信总线202用于实现这些组件之间的连接通信。该用户设备200可选的包含用户接口203,包括显示器(例如图7所示的LCD、CRT、全息成像(Holographic)或者投影(Projector)等),键盘或者点击设备(例如,鼠标,轨迹球(trackball),触感板或者触摸屏等)。
存储器205可以包括只读存储器和随机存取存储器,并向处理器201提供存储器205中存储的程序指令和数据。存储器205的一部分还可以包括非易失性随机存取存储器(NVRAM)。
在一些实施方式中,存储器205存储了如下的元素,可执行模块或者数据结构,或者他们的子集,或者他们的扩展集:
操作系统2051,包含各种系统程序指令,该程序指令可运行在例如图8所示的框架层、核心库层、驱动层等,用于实现各种基础业务以及处理基于硬件的任务。
应用程序2052,包含各种应用程序,例如图8所示的桌面(launcher)、媒体播放器(Media Player)、浏览器(Browser)以及输入法应用等,用于实 现各种应用业务。
在本申请实施例中,存储器205也可以称之为存储区域,用于存储数据程序,以及存储操作系统。
处理器201通过调用存储器205存储的程序指令,处理器201用于按照获得的程序指令执行上述方法实施例一中各方法步骤,这里不再赘述。
本申请实施例提出的提出控制用户设备的方法所应用的用户设备,该用户设备可以为手机、平板电脑、个人数字助理(Personal Digital Assistant,PDA)等。参考图9所示,为用户设备300的其中一种结构组成示意图。
该用户设备300主要包括,存储器320、处理器360及输入单元330,该输入单元330用于接收用户在终端上进行操作时的生成的事件。该存储器320用于存储操作系统和各种应用程序的程序指令。
可以理解的,处理器360的具体实现功能可参见上述处理器201的详细阐述,不再赘述。
存储器320可以是用户设备300的内存,该内存可以划分为三个存储空间,分别对应设置在第一运行环境中的安全内存、设置在第二环境中的非安全内存以及第一运行环境和第二运行环境中的应用程序或者硬件都可以访问的共享内存。安全内存、非安全内存以及共享内存的空间划分,可以划分相同的大小,也可以根据存储数据输入事件的不同,划分不同的大小。
用户设备中的输入单元330可用于接收用户输入的数字或字符信息,以及产生与用户设备300的用户设置以及功能控制有关的信号输入。具体地,本申请实施例中,该输入单元330可以包括触控面板331。触控面板331,可收集用户在其上(比如用户使用手指、触笔等任何适合的物体或附件在触控面板331上)的操作,并根据预先设定的程序指令,驱动与触控面板331相应的连接装置。可选的,触控面板331可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给该处理器360,并能接收处理器360发来 的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板331。除了触控面板331,输入单元330还可以包括其他输入设备332,其他输入设备332可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。
该用户设备300还可以包括显示单元340,该显示单元340可用于显示由用户输入的信息或提供给用户的信息以及用户设备300的各种菜单界面。该显示单元340可包括显示面板341,可选的,可以采用液晶显示器(Liquid Crystal Display,LCD)或有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板341。
本申请实施例中,该触摸显示屏包括不同的显示区域。每一个显示区域可以包含至少一个应用程序的图标和/或widget桌面控件等界面元素。
该处理器360是用户设备300的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在该存储器320内的软件程序和/或模块,执行用户设备300的各种功能和处理数据,从而对用户设备300进行整体监控。
可选的该用户设备300还可以包括RF电路310,用于提供无线连接的WIFI模块380,以及电源390和用于提供声音输入输出的音频电路370。
实施例六
本申请实施例六提出一种计算机程序产品,所述计算机程序产品包括内嵌于计算机可读的存储介质中的计算机程序,所述计算机程序包括用于使所述电子设备执行执行上述实施例一~实施例四中任一实施例提出的技术方案中的各个步骤的指令。
实施例七
本申请实施例七提出一种基于语音控制的位置定位装置,包括:
接收模块,用于接收语音控制指令。
确定模块,用于基于图像分析技术确定所述语音控制指令所指示的内容在当前显示界面中的位置。
执行模块,用于定位至确定出的所述位置。
其中,所述语音控制指令包括:定位内容和指令内容;所述确定模块,具体用于确定所述语音控制指令中的定位内容所指示的内容在当前显示界面中的位置;所述执行模块,还用于根据所确定的位置和所述指令内容,控制用户设备。
具体地,所述执行模块,具体用于将用户设备中的光标移动至所述位置处。
具体地,所述确定模块,具体用于基于图像分析技术确定所述语音控制指令所指示的内容的文字或者所指示的图标在当前显示界面中的位置。
具体地,所述确定模块,具体用于在当前显示界面所显示的文本信息中查找所述语音控制指令所指示的内容的文字,将查找到的文字所在的位置确定为所述语音控制指令所指示的内容在当前显示界面中的位置;或在显示界面所显示的文本信息中无法查找到所述语音控制指令所指示的内容的文字时,基于图像分析技术确定所述语音控制指令所指示的内容在当前显示界面中的位置。
可选地,所述执行模块,还用于在所述语音控制指令所指示的内容位于一个交互按钮上时,触发所述交互按钮。
具体地,所述确定模块,具体用于将交互按钮的中心点作为所述语音控制指令所指示的内容在当前显示界面中的位置;所述执行模块,具体用于触发所述交互按钮的中心位置。
相应的,本申请另一实施例还提供了一种电子系统,包括用户设备和服务器;所述显示器,存储器,一个或多个处理器,以及通信单元;所述服务器包括存储器,一个或多个处理器,以及通信单元;各个通信单元用于实现与外部设备的通信;还包括:一个或多个模块,所述一个或多个模块被存储在用户设备或者服务器的存储器中,并被配置成由相应的处理器执行,所述一个或多个模块包括用于执行执行上述实施例一~实施例四中任一实施例提出的技术方案中的各个步骤的指令。
这里的用户设备也可以为机器人。
通过本申请上述各实施例提出技术方案,基于语音控制命令,代替用户对用户设备进行点击和滑动等传统操作控制用户设备,只需要极少数的命令识别,如“打开”、“点击…”、“输入…”、“上划下划”等等,即可实现准确度很高的语音操控命令,且不需要对原有系统和应用程序进行改动,也不再需要云端复杂的语意理解模块,达到良好的用户体验。
本领域的技术人员应明白,本申请的实施例可提供为方法、装置(设备)、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、只读光盘、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、装置(设备)和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了 基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例落入本申请范围的所有变更和修改。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (19)

  1. 一种基于语音控制的位置定位方法,其特征在于,包括:
    接收语音控制指令;
    基于图像分析技术确定所述语音控制指令所指示的内容在当前显示界面中的位置;
    定位至确定出的所述位置。
  2. 如权利要求1所述的方法,其特征在于,所述语音控制指令包括:定位内容和指令内容;
    所述确定所述语音控制指令中的内容在当前显示界面中的位置,包括:
    确定所述语音控制指令中的定位内容所指示的内容在当前显示界面中的位置;
    所述方法还包括:
    根据所确定的位置和所述指令内容,控制用户设备。
  3. 如权利要求1所述的方法,其特征在于,定位至确定出的所述位置,包括:
    将用户设备中的光标移动至所述位置处。
  4. 如权利要求1所述的方法,其特征在于,所述基于图像分析技术确定所述语音控制指令所指示的内容在当前显示界面中的位置,包括:
    基于图像分析技术确定所述语音控制指令所指示的内容的文字或者所指示的图标在当前显示界面中的位置。
  5. 根据权利要求1所述的方法,其特征在于,基于图像分析技术确定所述语音控制指令所指示的内容在当前显示界面中的位置,包括:
    在当前显示界面所显示的文本信息中查找所述语音控制指令所指示的内容的文字,将查找到的文字所在的位置确定为所述语音控制指令所指示的内容在当前显示界面中的位置;或
    在显示界面所显示的文本信息中无法查找到所述语音控制指令所指示的 内容的文字时,基于图像分析技术确定所述语音控制指令所指示的内容在当前显示界面中的位置。
  6. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    在所述语音控制指令所指示的内容位于一个交互按钮上时,触发所述交互按钮。
  7. 根据权利要求6所述的方法,其特征在于,所述基于图像分析技术确定所述语音控制指令所指示的内容在当前显示界面中的位置,包括:
    将交互按钮的中心点作为所述语音控制指令所指示的内容在当前显示界面中的位置;
    触发所述交互按钮,包括:
    触发所述交互按钮的中心位置。
  8. 一种用户设备,其特征在于,包括:
    显示器,存储器,一个或多个处理器;以及一个或多个模块,所述一个或多个模块被存储在所述存储器中,并被配置成由所述一个或多个处理器执行,所述一个或多个模块包括用于执行权利要求1-7中任一所述方法中各个步骤的指令。
  9. 根据权利要求8所述的用户设备,其特征在于,所述用户设备系统包括机器人。
  10. 一种基于语音控制的位置定位装置,其特征在于,包括:
    接收模块,用于接收语音控制指令;
    确定模块,用于基于图像分析技术确定所述语音控制指令所指示的内容在当前显示界面中的位置;
    执行模块,用于定位至确定出的所述位置。
  11. 如权利要求10所述的装置,其特征在于,所述语音控制指令包括:定位内容和指令内容;
    所述确定模块,具体用于确定所述语音控制指令中的定位内容所指示的内容在当前显示界面中的位置;
    所述执行模块,还用于根据所确定的位置和所述指令内容,控制用户设备。
  12. 如权利要求11所述的装置,其特征在于,所述执行模块,具体用于将用户设备中的光标移动至所述位置处。
  13. 如权利要求10所述的装置,其特征在于,所述确定模块,具体用于基于图像分析技术确定所述语音控制指令所指示的内容的文字或者所指示的图标在当前显示界面中的位置。
  14. 根据权利要求10所述的装置,其特征在于,所述确定模块,具体用于在当前显示界面所显示的文本信息中查找所述语音控制指令所指示的内容的文字,将查找到的文字所在的位置确定为所述语音控制指令所指示的内容在当前显示界面中的位置;或在显示界面所显示的文本信息中无法查找到所述语音控制指令所指示的内容的文字时,基于图像分析技术确定所述语音控制指令所指示的内容在当前显示界面中的位置。
  15. 根据权利要求10所述的装置,其特征在于,所述执行模块,还用于在所述语音控制指令所指示的内容位于一个交互按钮上时,触发所述交互按钮。
  16. 根据权利要求15所述的装置,其特征在于,所述确定模块,具体用于将交互按钮的中心点作为所述语音控制指令所指示的内容在当前显示界面中的位置;
    所述执行模块,具体用于触发所述交互按钮的中心位置。
  17. 一种计算机程序产品,所述计算机程序产品包括内嵌于计算机可读的存储介质中的计算机程序,所述计算机程序包括用于使所述电子设备执行如权利要求1-7任一所述方法中的各个步骤的指令。
  18. 一种电子系统,其特征在于,包括用户设备和服务器;所述显示器,存储器,一个或多个处理器,以及通信单元;所述服务器包括存储器,一个或多个处理器,以及通信单元;各个通信单元用于实现与外部设备的通信;还包括:一个或多个模块,所述一个或多个模块被存储在用户设备或者服务 器的存储器中,并被配置成由相应的处理器执行,所述一个或多个模块包括用于执行权利要求1-7中任一所述方法中各个步骤的指令。
  19. 根据权利要求18所述的电子系统,其特征在于,所述用户设备为机器人。
PCT/CN2016/111591 2016-12-22 2016-12-22 基于语音控制的位置定位方法、装置、用户设备及计算机程序产品 WO2018112856A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201680002796.1A CN107077319A (zh) 2016-12-22 2016-12-22 基于语音控制的位置定位方法、装置、用户设备及计算机程序产品
PCT/CN2016/111591 WO2018112856A1 (zh) 2016-12-22 2016-12-22 基于语音控制的位置定位方法、装置、用户设备及计算机程序产品

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/111591 WO2018112856A1 (zh) 2016-12-22 2016-12-22 基于语音控制的位置定位方法、装置、用户设备及计算机程序产品

Publications (1)

Publication Number Publication Date
WO2018112856A1 true WO2018112856A1 (zh) 2018-06-28

Family

ID=59624485

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/111591 WO2018112856A1 (zh) 2016-12-22 2016-12-22 基于语音控制的位置定位方法、装置、用户设备及计算机程序产品

Country Status (2)

Country Link
CN (1) CN107077319A (zh)
WO (1) WO2018112856A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107077319A (zh) * 2016-12-22 2017-08-18 深圳前海达闼云端智能科技有限公司 基于语音控制的位置定位方法、装置、用户设备及计算机程序产品

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111324213A (zh) * 2018-12-13 2020-06-23 青岛海信移动通信技术股份有限公司 终端的信息输入方法和终端
CN109671432A (zh) * 2018-12-25 2019-04-23 斑马网络技术有限公司 语音定位处理方法、装置、定位设备及车辆
CN110085224B (zh) * 2019-04-10 2021-06-01 深圳康佳电子科技有限公司 智能终端全程语音操控处理方法、智能终端及存储介质
CN115145529B (zh) * 2019-08-09 2023-05-09 华为技术有限公司 语音控制设备的方法及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011039222A (ja) * 2009-08-10 2011-02-24 Nec Corp 音声認識システム、音声認識方法および音声認識プログラム
CN104899003A (zh) * 2015-06-12 2015-09-09 广州视源电子科技股份有限公司 终端控制方法和系统
CN104965596A (zh) * 2015-07-24 2015-10-07 上海宝宏软件有限公司 语音操控系统
CN105551492A (zh) * 2015-12-04 2016-05-04 青岛海信传媒网络技术有限公司 语音控制的方法、装置与终端
CN105677152A (zh) * 2015-12-31 2016-06-15 宇龙计算机通信科技(深圳)有限公司 一种语音触屏操作处理的方法、装置以及终端

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105635776B (zh) * 2014-11-06 2019-03-01 深圳Tcl新技术有限公司 虚拟操作界面遥控控制方法及系统
CN107077319A (zh) * 2016-12-22 2017-08-18 深圳前海达闼云端智能科技有限公司 基于语音控制的位置定位方法、装置、用户设备及计算机程序产品

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011039222A (ja) * 2009-08-10 2011-02-24 Nec Corp 音声認識システム、音声認識方法および音声認識プログラム
CN104899003A (zh) * 2015-06-12 2015-09-09 广州视源电子科技股份有限公司 终端控制方法和系统
CN104965596A (zh) * 2015-07-24 2015-10-07 上海宝宏软件有限公司 语音操控系统
CN105551492A (zh) * 2015-12-04 2016-05-04 青岛海信传媒网络技术有限公司 语音控制的方法、装置与终端
CN105677152A (zh) * 2015-12-31 2016-06-15 宇龙计算机通信科技(深圳)有限公司 一种语音触屏操作处理的方法、装置以及终端

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107077319A (zh) * 2016-12-22 2017-08-18 深圳前海达闼云端智能科技有限公司 基于语音控制的位置定位方法、装置、用户设备及计算机程序产品

Also Published As

Publication number Publication date
CN107077319A (zh) 2017-08-18

Similar Documents

Publication Publication Date Title
US9152529B2 (en) Systems and methods for dynamically altering a user interface based on user interface actions
WO2018112856A1 (zh) 基于语音控制的位置定位方法、装置、用户设备及计算机程序产品
EP3028136B1 (en) Visual confirmation for a recognized voice-initiated action
US8749499B2 (en) Touch screen for bridging multi and/or single touch points to applications
US10416777B2 (en) Device manipulation using hover
US9632693B2 (en) Translation of touch input into local input based on a translation profile for an application
JP2016509301A (ja) ユーザーが生成した知識による協調学習
CN106843715A (zh) 用于远程化的应用的触摸支持
CN103729065A (zh) 触控操作映射到实体按键的系统及方法
JP2016506564A (ja) スワイプストローク入力及び連続的な手書き
US20140354554A1 (en) Touch Optimized UI
US10346599B2 (en) Multi-function button for computing devices
US20160350136A1 (en) Assist layer with automated extraction
US10754452B2 (en) Unified input and invoke handling
KR102210238B1 (ko) 폼 프로세싱
US10152308B2 (en) User interface display testing system
KR20160016526A (ko) 정보 제공하는 방법 및 이를 위한 전자기기
WO2018177156A1 (zh) 一种移动终端的操作方法及移动终端
US10970476B2 (en) Augmenting digital ink strokes
US10466863B1 (en) Predictive insertion of graphical objects in a development environment
KR20200009090A (ko) 그래픽 키보드로부터 어플리케이션 피처들의 액세스
US20180090027A1 (en) Interactive tutorial support for input options at computing devices
KR20150128406A (ko) 음성 인식 정보를 표시하는 방법 및 장치
US10254858B2 (en) Capturing pen input by a pen-aware shell
US11620030B2 (en) Coherent gestures on touchpads and touchscreens

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16924502

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.10.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 16924502

Country of ref document: EP

Kind code of ref document: A1