CN116501284A

CN116501284A - Voice control method, device, equipment, storage medium and program product

Info

Publication number: CN116501284A
Application number: CN202310483673.6A
Authority: CN
Inventors: 华鲸州; 欧阳能钧; 刘卫; 刘嵘
Original assignee: Apollo Zhilian Beijing Technology Co Ltd
Current assignee: Apollo Zhilian Beijing Technology Co Ltd
Priority date: 2023-04-28
Filing date: 2023-04-28
Publication date: 2023-07-28

Abstract

The disclosure discloses a voice control method, a voice control device, voice control equipment, a voice control storage medium and a voice control program product, and relates to the technical field of computers, in particular to the technical field of artificial intelligence. The specific implementation scheme is as follows: under the condition that the display interface comprises a webpage container, acquiring a first coordinate of a first preset position of the webpage container; acquiring text information and second coordinates of each webpage type control in a webpage container; determining a third coordinate of each web page type control relative to the display interface based on the first coordinate and the second coordinate of each web page type control; and sending the text information and the third coordinates of each webpage type control to the voice control module so that the voice control module carries out voice control on the webpage type control based on the text information and the third coordinates of each webpage type control. By adopting the technical scheme provided by the invention, the voice control of the webpage type control of the display interface can be realized, and the user experience is improved.

Description

Voice control method, device, equipment, storage medium and program product

Technical Field

The present disclosure relates to the field of computer technology, and in particular, to the field of artificial intelligence technology, and in particular, to a voice control method, apparatus, device, storage medium, and program product.

Background

An important function in the voice operation system is ' what you see is what you say ', ' what you see is what you say ' the function depends on the ' barrier-free interface ' of the system, and ' what you see is what you say ' in the voice operation system ' obtains the literal information of the system control on the screen display interface through the ' barrier-free interface ', and combines with the natural language input of the user to automatically click the system control which the user wants to click. However, the "barrier-free interface" of the voice operating system cannot acquire control data in the webpage being displayed, and only can acquire text information of the native control of the voice operating system.

Disclosure of Invention

The present disclosure provides a method, apparatus, device, storage medium, and program product for voice control.

According to an aspect of the present disclosure, there is provided a voice control method including:

under the condition that a webpage container is included in a display interface, acquiring a first coordinate of a first preset position of the webpage container; the first coordinates are coordinates of the first preset position relative to a second preset position of the display interface;

acquiring text information and second coordinates of each webpage type control in the webpage container; the second coordinates are coordinates of the webpage type control relative to the first preset position;

determining a third coordinate of each web page type control relative to the display interface based on the first coordinate and the second coordinate of each web page type control;

and sending the text information and the third coordinates of each webpage type control to a voice control module so that the voice control module carries out voice control on the webpage type controls based on the text information and the third coordinates of each webpage type control.

According to another aspect of the present disclosure, there is provided a voice control apparatus including:

the first acquisition module is used for acquiring a first coordinate of a first preset position of the webpage container under the condition that the webpage container is included in the display interface; the first coordinates are coordinates of the first preset position relative to a second preset position of the display interface;

the second acquisition module is used for acquiring the text information and the second coordinates of each webpage type control in the webpage container; the second coordinates are coordinates of the webpage type control relative to the first preset position;

the coordinate determining module is used for determining a third coordinate of each webpage type control relative to the display interface based on the first coordinate and the second coordinate of each webpage type control;

and the voice control module is used for sending the text information and the third coordinates of each webpage type control to the voice control module so that the voice control module carries out voice control on the webpage type control based on the text information and the third coordinates of each webpage type control.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the aspects.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method according to any one of the aspects.

According to another aspect of the present disclosure, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps in the method according to any one of the aspects.

In the embodiment of the disclosure, under the condition that a webpage container is included in a display interface, acquiring a first coordinate of a first preset position of the webpage container; the first coordinate is the coordinate of the first preset position relative to the second preset position of the display interface; acquiring text information and second coordinates of each webpage type control in a webpage container; the second coordinates are coordinates of the webpage type control relative to the first preset position; determining a third coordinate of each web page type control relative to the display interface based on the first coordinate and the second coordinate of each web page type control; and sending the text information and the third coordinates of each webpage type control to the voice control module so that the voice control module carries out voice control on the webpage type control based on the text information and the third coordinates of each webpage type control. In this way, the coordinates of each webpage type control relative to the first preset position can be obtained, the coordinates of the first preset position of the webpage container can be obtained, the coordinates of each webpage type control relative to the display interface can be determined, and the text information of the webpage type control can be obtained. Therefore, the data support can be provided for the voice control module to realize the voice control of the webpage type control, so that the voice control of the webpage type control of the display interface is realized, and the user experience is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a flow chart of a voice control method according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a first coordinate and a second coordinate provided in accordance with an embodiment of the present disclosure;

FIG. 3 is a flow chart providing a method for determining second coordinates of a web page type control according to an embodiment of the present disclosure;

FIG. 4 is a block diagram of a voice control apparatus provided in accordance with an embodiment of the present disclosure;

fig. 5 is a block diagram of an electronic device for implementing a voice control method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

The following describes a voice control method, apparatus, device, storage medium and program product provided by embodiments of the present disclosure with reference to the accompanying drawings.

Fig. 1 is a flowchart of a voice control method according to an embodiment of the present disclosure. The voice control method provided by the embodiment of the disclosure can be applied to electronic equipment, the electronic equipment can be provided with a voice operation system, the electronic equipment can be, for example, a vehicle-mounted screen, a screen sound box screen and the like, or the electronic equipment can be other electronic equipment provided with the voice operation system, for example, the electronic equipment can be used for realizing 'what you see can say' in the voice operation system. As shown in fig. 1, the voice control method provided by the embodiment of the present disclosure may include the following processes:

s101, under the condition that the display interface comprises the webpage container, acquiring a first coordinate of a first preset position of the webpage container.

The first coordinate is a coordinate of the first preset position relative to the second preset position of the display interface.

In the embodiment of the disclosure, when performing voice control on a display interface including a web page container, coordinates of a preset position (i.e., a first preset position) of the web page container, that is, first coordinates, may be first obtained, where the first coordinates may be understood as coordinates of the first preset position of the web page container relative to the preset position (i.e., a second preset position) of the display interface. Wherein the web page container may be an area on the web page for accommodating other elements (e.g., text, pictures, forms, etc.); the first preset position and the second preset position can be set according to actual needs, for example, can be set to be the position of the upper left corner. It is understood that the first coordinate is a coordinate of the first preset position relative to the second preset position of the display interface, and may also be understood as a coordinate of the first preset position calculated based on the origin of coordinates using the second preset position of the display interface as the origin of coordinates. For example, referring to fig. 2, fig. 2 shows a schematic diagram of a first coordinate and a second coordinate, in fig. 2, the display interface includes a system interface and a web page container, and coordinates (0, 0) represent coordinates of a second preset location of the display interface; the coordinates (x 1, y 1) represent the coordinates of the first preset position with respect to the second preset position of the display interface, i.e. the first coordinates.

S102, acquiring text information and second coordinates of each webpage type control in the webpage container.

The second coordinates are coordinates of the webpage type control relative to the first preset position.

In the embodiment of the present disclosure, after the first coordinate of the first preset position of the web page container is acquired, control data, that is, text information and coordinates (that is, second coordinates) of the web page type control in the web page container may be acquired. Illustratively, given that there are typically one or more controls (i.e., web page type controls) in the web page container, the text information and coordinates (i.e., second coordinates) of each web page type control in the web page container may be obtained, and the text information may be text content displayed by the web page type control region.

And S103, determining a third coordinate of each webpage type control relative to the display interface based on the first coordinate and the second coordinate of each webpage type control.

In the embodiment of the disclosure, after the text information and the second coordinates of each web page type control in the web page container are acquired, the coordinates, namely the third coordinates, of each web page type control relative to the display interface are determined based on the first coordinates and the second coordinates of each web page type control. Thus, the coordinates of each web page type control in the web page container relative to the display interface, that is, the coordinates of each web page type control in the web page container relative to the second preset position of the display interface can be obtained. For example, referring still to fig. 2, taking the web page type control as the web page button a in fig. 2 as an example, the second coordinate of the web page button a relative to the first preset position of the web page container is (x 2, y 2), and the first coordinate of the first preset position of the web page container is (x 1, y 1), the third coordinate (x1+x2, y1+y2) of the web page button a relative to the display interface may be determined based on the first coordinate (x 1, y 1) and the second coordinate (x 2, y 2). It will be appreciated that the foregoing operations may be performed for each web page type control to obtain a third coordinate of each web page type control relative to the display interface.

And S104, the text information and the third coordinates of each webpage type control are sent to the voice control module, so that the voice control module carries out voice control on the webpage type controls based on the text information and the third coordinates of each webpage type control.

In the embodiment of the disclosure, after determining the third coordinate of each web page type control relative to the display interface, the text information and the third coordinate of each web page type control may be sent to a voice control module, which may be, for example, a visible or so-called functional module in a voice operating system. Therefore, the visible and namely functional module can acquire the text information and the coordinates of each webpage type control in the webpage container of the display interface, so as to provide data support for the user to control the webpage type controls in the webpage container in a voice mode, and realize voice control on the webpage type controls.

In one possible implementation manner, the specific implementation manner of obtaining the first coordinate of the first preset position of the web page container in the step above may be:

acquiring a first coordinate of a first preset position of a webpage container through a system interface;

the specific implementation manner of obtaining the text information and the second coordinates of each web page type control in the web page container in the above steps may be:

and acquiring the text information and the second coordinates of each webpage type control in the webpage container through a preset script.

In the embodiment of the disclosure, the first coordinate of the first preset position of the web page container may be obtained through a system interface, where the system interface may be, for example, an unobstructed interface of the system. The text information and the second coordinates of each webpage type control in the webpage container can be obtained through a preset script. It can be understood that the preset scripts are in one-to-one correspondence with the webpage type controls, that is, text information and second coordinates of the webpage type controls can be obtained through the preset scripts corresponding to each webpage type control.

In one possible implementation manner, the voice control method provided by the embodiment of the disclosure further includes the following processing:

acquiring text information and fourth coordinates of each system type control in the display interface through a system interface; the fourth coordinate is a coordinate of the system type control relative to the display interface;

and sending the text information and the fourth coordinates of each system type control to a voice control module.

In the embodiment of the disclosure, control data of a system native control, i.e., a control of a system type control, in a display interface, i.e., text information and coordinates (i.e., fourth coordinates) may also be obtained. The text information of each system type control in the display interface and the fourth coordinate of each system type control in the display interface, that is, the coordinate of the second preset position relative to the display interface, may be acquired through a system interface, where the system interface may be, for example, an unobstructed interface of the system. The text information and fourth coordinates of each system type control may then be sent to the voice control module, which may be, for example, a visual i.e. a functional module in the voice operating system. Therefore, the visible and namely functional module can acquire the text information and the coordinates of the webpage type control in the webpage container of the display interface, and also can acquire the text information and the fourth coordinates of each system type control in the display interface, so that data support is provided for the user to control the system type control in the webpage container in a voice mode, and voice control of the system type control is realized on the basis of realizing voice control of the system type control.

In one possible implementation, the preset script may be a JavaScript script. Among other things, javascript is a programming language that can be used to develop interactive and dynamic web applications. javascript script is commonly used to create dynamic web pages and interactive web applications in web browsers, a client-side scripting language that can be executed directly in the user's web browser without processing by a server. It is understood that each web page type control may be injected with a javascript. Therefore, not only can support the acquisition of the text information and coordinates of the webpage type control, but also the extra compiling process can be avoided, the pressure of the server side is reduced, and the response speed is improved.

In a further possible implementation manner, in the case that the preset script is a JavaScript script, the specific implementation manner of obtaining, by the preset script, the text information and the second coordinates of each web page type control in the web page container may include the following processing:

injecting JavaScript scripts into each webpage type control;

running JavaScript of each webpage type control, and acquiring a root node of a webpage container through a webpage container interface;

acquiring control information of each control in the webpage container based on the root node; the control information comprises text information and coordinates corresponding to each control;

based on the control information of each control in the webpage container, the text information and the second coordinates of each webpage type control are determined.

In the embodiment of the present disclosure, if the preset script is a JavaScript script, when the text information and the second coordinates of each web page type control in the web page container are obtained through the preset script, the JavaScript script may be injected into each web page type control first. Then, a JavaScript script of each web page type control can be run, and a root node of the web page container is obtained through a web page container interface (for example, document/body), wherein the root node comprises text information corresponding to each web page type control in the web page container and coordinates corresponding to a first preset position of the web page container. The text information and the second coordinates of each web page type control may then be determined based on the control information of each control in the web page container, for example, each control in the web page container obtained from the root node may be determined as a web page type control, and the text information and the coordinates of each control in the web page container obtained from the root node may be determined as the text information and the second coordinates of each web page type control. Thus, the text information and the second coordinates of each webpage type control can be obtained from the root node of the webpage container through the JavaScript of the running webpage type control so as to provide data support for subsequent voice control.

In a further possible embodiment, determining text information and second coordinates for each web page type control based on control information for each control in the web page container includes:

based on the control information of each control in the webpage container, determining the control meeting the preset condition as the webpage type control in each control in the webpage container;

and determining the control information corresponding to each control meeting the preset condition as the text information and the second coordinate of each webpage type control.

In the embodiment of the disclosure, when determining the text information and the second coordinates of each web page type control based on the control information of each control in the web page container, the controls in the web page container may be screened to determine the web page type control and the text information and the coordinates corresponding to the web page type control. For example, the control in the webpage container may be screened based on the control information of each control in the webpage container, so as to determine, as the webpage type control, the control meeting the preset condition in each control in the webpage container. The preset condition may include, for example, that the control area has text content and is not a transparent invalid control. After the web page type controls meeting the preset conditions are screened out, the control information corresponding to each control meeting the preset conditions can be obtained, namely the text information and the coordinates corresponding to each control meeting the preset conditions are obtained, and then the text information and the coordinates corresponding to each control meeting the preset conditions are determined to be the text information and the second coordinates of each web page type control. For example, referring to fig. 3, fig. 3 is a flowchart illustrating determining a second coordinate of a web page type control according to an embodiment of the present disclosure, as shown in fig. 3, a JavaScript script of the web page type control may be run, that is, the JavaScript script starts to execute, and then a root node of a web page layout of a web page container is obtained through a web page container interface document. Then, the root node of the web page layout of the web page container can be traversed to obtain all the controls in the web page layout of the web page container, and the respective text information and coordinates of each control in the web page layout of the web page container can also be obtained. And then, filtering all the controls in the webpage layout of the webpage container to obtain transparent invalid controls without text content, and determining the rest controls as webpage type controls. And determining the control information corresponding to each control meeting the preset conditions as the text information and the second coordinate of each webpage type control, and returning the text information and the second coordinate of each webpage type control to the visible functional module. Therefore, the control which does not meet the preset conditions can be filtered, for example, the transparent invalid control without text content is filtered, so that a more accurate data base can be provided for the subsequent determination of the third coordinate of the webpage type control relative to the display interface, unnecessary coordinate determination processing process is avoided, resource expenditure is reduced, and voice control efficiency is improved.

In one possible implementation manner, the determining, based on the first coordinate and the second coordinate of each web page type control, a specific implementation manner of the third coordinate of each web page type control with respect to the display interface may be:

and carrying out coordinate conversion on the second coordinate based on the first coordinate to obtain a third coordinate of each webpage type control relative to the display interface.

In the embodiment of the disclosure, when determining the third coordinate of each web page type control relative to the display interface based on the first coordinate and the second coordinate of each web page type control, the second coordinate of each web page type control may be subjected to coordinate transformation based on the first coordinate of the first preset position of the web page container, so as to obtain the third coordinate of each web page type control relative to the display interface. For example, assuming that the first preset position and the second preset position are both the upper left corner, for a certain web page type control, a coordinate sum of the first coordinate and the second coordinate of the web page type control may be calculated, and the coordinate sum is determined as the third coordinate of the web page type control. Therefore, the determined third coordinate of the webpage type control can be more accurate through coordinate conversion, and the condition that voice control is wrong due to coordinate errors is avoided, so that the accuracy rate and the control efficiency of voice control can be further improved.

The embodiment of the disclosure also provides a voice control device. Fig. 4 is a block diagram of a voice control apparatus according to an embodiment of the present disclosure, and as shown in fig. 4, a voice control apparatus 400 includes:

a first obtaining module 410, configured to obtain, in a case that a web page container is included in a display interface, a first coordinate of a first preset position of the web page container; the first coordinates are coordinates of the first preset position relative to a second preset position of the display interface;

a second obtaining module 420, configured to obtain text information and a second coordinate of each web page type control in the web page container; the second coordinates are coordinates of the webpage type control relative to the first preset position;

a coordinate determining module 430, configured to determine a third coordinate of each web page type control with respect to the display interface based on the first coordinate and the second coordinate of each web page type control;

and the voice control module 440 is configured to send the text information and the third coordinates of each web page type control to the voice control module, so that the voice control module performs voice control on the web page type control based on the text information and the third coordinates of each web page type control.

In one possible implementation manner, the first obtaining module is configured to:

acquiring a first coordinate of a first preset position of the webpage container through a system interface;

the second obtaining module is configured to:

In one possible implementation, the voice control apparatus 400 further includes:

the third acquisition module is used for acquiring the text information and the fourth coordinate of each system type control in the display interface through the system interface; the fourth coordinate is a coordinate of the system type control relative to the display interface;

and the sending module is used for sending the text information of each system type control and the fourth coordinate to the voice control module.

In one possible implementation, the preset script is a JavaScript script; the second obtaining module 420 includes:

the script injection unit is used for injecting the JavaScript script into each webpage type control;

the first obtaining unit is used for running the JavaScript script of each webpage type control and obtaining a root node of the webpage container through the webpage container interface;

the second acquisition unit is used for acquiring control information of each control in the webpage container based on the root node; the control information comprises text information and coordinates corresponding to each control;

and the determining unit is used for determining the text information and the second coordinates of each webpage type control based on the control information of each control in the webpage container.

In one possible implementation manner, the determining unit includes:

the first determining subunit is configured to determine, based on the control information of each control in the web page container, that a control that meets a preset condition is a web page type control in each control in the web page container;

and the second determining subunit is used for determining the control information corresponding to each control meeting the preset condition as the text information and the second coordinate of each webpage type control.

In one possible implementation manner, the coordinate determining module is configured to:

and carrying out coordinate conversion on the second coordinates based on the first coordinates to obtain third coordinates of each webpage type control relative to the display interface.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 5 illustrates a schematic block diagram of an example electronic device 500 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the apparatus 500 includes a computing unit 501 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The computing unit 501, ROM 502, and RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

Various components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, etc.; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508 such as a magnetic disk, an optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the respective methods and processes described above, such as a voice control method. For example, in some embodiments, the voice control method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of the voice control method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the voice control method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A voice control method, comprising:

2. The method of claim 1, wherein the obtaining the first coordinate of the first preset location of the web page container comprises:

the obtaining the text information and the second coordinates of each webpage type control in the webpage container includes:

3. The method of claim 2, wherein the method further comprises:

acquiring text information and fourth coordinates of each system type control in the display interface through the system interface; the fourth coordinate is a coordinate of the system type control relative to the display interface;

and sending the text information of each system type control and the fourth coordinate to the voice control module.

4. The method of claim 2, wherein the preset script is a JavaScript script; the obtaining, by a preset script, text information and second coordinates of each web page type control in the web page container includes:

injecting the JavaScript script into each webpage type control;

running the JavaScript script of each webpage type control, and acquiring a root node of the webpage container through the webpage container interface;

and determining the text information and the second coordinates of each webpage type control based on the control information of each control in the webpage container.

5. The method of claim 4, wherein the determining text information and second coordinates for each of the web page type controls based on control information for each of the controls in the web page container comprises:

based on the control information of each control in the webpage container, determining that the control meeting the preset condition is a webpage type control in each control in the webpage container;

and determining the control information corresponding to each control meeting the preset conditions as the text information and the second coordinate of each webpage type control.

6. The method of claim 1, wherein the determining a third coordinate of each of the web page type controls relative to the display interface based on the first coordinates and the second coordinates of each of the web page type controls comprises:

7. A voice control apparatus comprising:

8. The apparatus of claim 7, wherein the first acquisition module is configured to:

the second obtaining module is configured to:

9. The apparatus of claim 8, wherein the apparatus further comprises:

10. The apparatus of claim 8, wherein the preset script is a JavaScript script the second acquisition module, comprising:

11. The apparatus of claim 10, wherein the determining unit comprises:

12. The apparatus of claim 7, wherein the coordinate determination module is configured to:

13. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6.

15. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps in the method according to any of claims 1-6.