CN112199623A

CN112199623A - Script execution method and device, electronic equipment and storage medium

Info

Publication number: CN112199623A
Application number: CN202011056691.9A
Authority: CN
Inventors: 杨阳; 田发景
Original assignee: Shanghai Pateo Electronic Equipment Manufacturing Co Ltd
Current assignee: Shanghai Pateo Electronic Equipment Manufacturing Co Ltd
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2021-01-08
Anticipated expiration: 2040-09-29
Also published as: CN112199623B

Abstract

The application provides a script execution method, a script execution device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring voice information of a user; determining a current scene according to the voice information; sending a script acquisition request to a server; the script acquisition request comprises a name and an identification of at least one target application program for realizing the current scene, and is used for requesting the server to detect whether a first scene script corresponding to the current scene exists according to the name and the identification of the target application program; and receiving the first scene script returned by the server, analyzing and running the first scene script. The method and the device for the pre-configuration of the scene script are beneficial to the client to run the pre-configuration scene script corresponding to the target application program, so that the success rate of running the pre-configuration scene script is improved, the user requirements are met, and the user experience is improved.

Description

Script execution method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a script execution method and apparatus, an electronic device, and a storage medium.

Background

With the development of artificial intelligence, more and more related technologies are applied to real life, such as: automatic driving, speech recognition and the like, and the interaction between the robot and the human realizes the liberation of the hands of the human, and brings great convenience to the life and work of the human. The prior art has been able to achieve some purpose by speech, such as: the corresponding preconfigured scene scheme (scene script) is called and executed through voice to help a user achieve the purpose, however, the local application program of the terminal is updated within a certain time, if the corresponding preconfigured scene scheme before the application program is updated is continuously executed, a fault is probably caused in operation, and the success rate of the operation of the preconfigured scene scheme is difficult to ensure.

Disclosure of Invention

In view of the above problems, the present application provides a script execution method, an apparatus, an electronic device, and a storage medium, which are beneficial to improving the success rate of running a pre-configured scene script and improving user experience.

In order to achieve the above object, a first aspect of the embodiments of the present application provides a script execution method, which is applied to a client, and the method includes:

acquiring voice information of a user;

determining a current scene according to the voice information;

sending a script acquisition request to a server; the script acquisition request comprises a name and an identification of at least one target application program for realizing the current scene, and is used for requesting the server to detect whether a first scene script corresponding to the current scene exists according to the name and the identification of the target application program;

and receiving the first scene script returned by the server, analyzing and running the first scene script.

In an embodiment of the first aspect, the method further comprises:

under the condition that the first scene script does not exist at the server, recording the current scene by operating the target application program by the user so as to generate a second scene script corresponding to the current scene;

and uploading the second scene script to a server, so that the server stores the second scene script in association with the name and the identifier of the target application program.

In another implementation manner of the first aspect, the recording the current scene by operating the target application program by the user to generate a second scene script corresponding to the current scene includes:

acquiring a preset script template corresponding to the current scene; the preset script template comprises the content to be recorded of the current scene, which is realized by the operation of the target application program by the user;

and recording the content to be recorded to obtain the second scene script.

In another implementation manner of the first aspect, the content to be recorded includes one or more items of a trigger event of a user on an operation interface of the target application program, trigger time of the trigger event, a type of the trigger event, coordinates of a trigger position, a type of a triggered control, and a hierarchical relationship of a view in which the control is located;

the recording the content to be recorded to obtain the second scene script includes:

and recording one or more items of the trigger event, the trigger time, the type of the trigger event, the coordinates of the trigger position, the type of the control and the hierarchical relationship of each operation interface aiming at each operation interface of the current scene realized by the target application program operated by a user, and recording the operation sequence of each operation interface to obtain the second scene script.

In another implementation manner of the first aspect, before sending the script obtaining request to the server, the method further includes:

detecting whether a plurality of application programs for realizing the current scene exist locally;

if so, acquiring the priority of each application program in the plurality of application programs;

and determining the target application program from the plurality of application programs according to the priority of each application program.

In another implementation manner of the first aspect, the parsing and executing the first scenario script includes:

performing reverse analysis on the first scene script to obtain the analyzed first scene script;

determining a trigger event to be executed of each operation interface according to one or more items of the trigger time, the type of the trigger event, the coordinates of the trigger position, the type of the control and the hierarchical relationship;

acquiring the time difference between every two adjacent trigger events to be executed according to the trigger time;

and executing the trigger event to be executed of each operation interface according to the operation sequence and the time difference so as to finish the operation of the analyzed first scene script.

In another implementation manner of the first aspect, the determining a current scene according to the speech information includes:

recognizing the voice information to obtain a voice recognition result;

and performing intention recognition and slot filling based on the voice recognition result to determine the current scene.

In another implementation manner of the first aspect, the recognizing the speech information to obtain a speech recognition result includes:

carrying out voice extraction on the voice information to obtain M sections of voice information; m is an integer greater than 1;

acquiring a characteristic sequence of the M segments of voice information to obtain M segments of characteristic sequences;

generating a first characteristic sequence according to the M sections of characteristic sequences;

truncating P1 target feature subsequences and P2 target features from the first feature sequence respectively; p1 and P2 are integers greater than 1;

calculating the similarity between each target feature in the P1 target feature subsequences and the P2 target features to obtain P2 similarities of each target feature;

acquiring an updated feature subsequence of each target feature subsequence in the P1 target feature subsequences based on the P2 similarity of each target feature;

generating a second feature sequence according to the updated feature subsequence;

matching the second feature sequence with feature sequences corresponding to a plurality of texts stored in a corpus to obtain the voice recognition result; the corpus is used for storing the plurality of texts and the feature sequences corresponding to each text in the plurality of texts.

In another implementation manner of the first aspect, the obtaining an updated feature subsequence of each of the P1 target feature subsequences based on the P2 similarities of each target feature includes:

normalizing the P2 similarity of each target feature to obtain P2 weights of each target feature;

calculating P2 output features of each target feature based on the each target feature and P2 weights of the each target feature;

and summing the P2 output features of each target feature to obtain an updated feature of each target feature, and forming the updated feature subsequence by the updated features of each target feature.

A second aspect of the embodiments of the present application provides a script execution apparatus, including:

the voice acquisition module is used for acquiring voice information of a user;

the scene determining module is used for determining a current scene according to the voice information;

the scene script request module is used for sending a script acquisition request to the server; the script acquisition request comprises a name and an identification of at least one target application program for realizing the current scene, and is used for requesting the server to detect whether a first scene script corresponding to the current scene exists according to the name and the identification of the target application program;

and the scene script execution module is used for receiving the first scene script returned by the server, analyzing and running the first scene script.

A third aspect of embodiments of the present application provides an electronic device, which includes an input device, an output device, and a processor, and is adapted to implement one or more instructions; and a computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the steps of:

acquiring voice information of a user;

determining a current scene according to the voice information;

A fourth aspect of embodiments of the present application provides a computer storage medium having one or more instructions stored thereon, the one or more instructions adapted to be loaded by a processor and to perform the following steps:

acquiring voice information of a user;

determining a current scene according to the voice information;

The above scheme of the present application includes at least the following beneficial effects: compared with the prior art, the embodiment of the application acquires the voice information of the user; determining a current scene according to the voice information; sending a script acquisition request to a server; the script acquisition request comprises a name and an identification of at least one target application program for realizing the current scene, and is used for requesting the server to detect whether a first scene script corresponding to the current scene exists according to the name and the identification of the target application program; and receiving the first scene script returned by the server, analyzing and running the first scene script. The name and the identification of the target application program are adopted to request the server to detect whether the first scene script corresponding to the current scene exists, if so, the first scene script is directly acquired and operated, so that the current scene which needs to be realized by the user is completed, the client is facilitated to operate the pre-configured scene script corresponding to the target application program, the success rate of the operation of the pre-configured scene script is improved, the user requirement is met, and the user experience is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic diagram of a network system architecture according to an embodiment of the present application;

fig. 2 is a flowchart illustrating a script execution method according to an embodiment of the present application;

fig. 3A is an exemplary diagram of an operation interface provided in an embodiment of the present application;

FIG. 3B is an illustration of another example of an operator interface provided by an embodiment of the present application;

FIG. 3C is an exemplary diagram of another interface provided in an embodiment of the present application;

FIG. 3D is an exemplary diagram of another interface provided in an embodiment of the present application;

FIG. 3E is an illustration of another example of an operator interface provided by an embodiment of the disclosure;

fig. 4 is an exemplary diagram of an M-segment signature sequence provided in an embodiment of the present application;

FIG. 5 is a flowchart illustrating another script execution method according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a script execution device according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of another script execution apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "comprising" and "having," and any variations thereof, as appearing in the specification, claims and drawings of this application, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. Furthermore, the terms "first," "second," and "third," etc. are used to distinguish between different objects and are not used to describe a particular order.

An embodiment of the present application provides a script execution method, which can be implemented based on a network system architecture shown in fig. 1, where as shown in fig. 1, the network system architecture includes a server and at least one client, and the client is connected to the server through a wired or wireless network. The client side at least comprises a communication module, a processing module, a display module and a voice input module, the server side at least comprises a communication module and a processing module, the communication modules of the client side and the server side are both provided with data protocol interfaces, and communication is carried out based on the data protocol interfaces. Specifically, the user sends out voice information for implementing the current scene, for example: "help me buy a cup of coffee and send to company", the speech input module of the customer end will obtain this speech information, and carry on speech recognition, semantic analysis, intention to discern etc. to this speech information through the processing module, in order to confirm the scene that the user wants to realize, then interact with the server end through communication module in order to make the server end detect whether to store the scene script corresponding to present scene, the scene script means can realize some program instructions of specific scene, the server end looks for on the basis of the request that the customer end submits, if there is corresponding scene script in the server end, the server end issues the scene script that the present scene corresponds to the customer end through the communication module, the customer end analyzes it, installs and operates, finish the present scene that the user needs to realize automatically, for example: open a certain ordering application-select a certain take-away merchant-place an order to purchase coffee-select a shipping address for a company-payment submission. Optionally, if the server does not detect the scene script corresponding to the current scene, the client may simulate and record the implementation of the current scene to generate the scene script corresponding to the current scene, and send the generated scene script to the server for storage, so that the client may run the adapted scene script subsequently when the current scene is implemented. Optionally, the server may be a cloud, and the client may be an application client installed on a mobile phone, a wearable device, a vehicle-mounted terminal, or the like, or may refer to a mobile phone, a wearable device, a vehicle-mounted terminal, or the like.

Based on the network system shown in fig. 1, the following describes in detail the script execution method provided by the embodiment of the present application with reference to other drawings.

Referring to fig. 2, fig. 2 is a flowchart illustrating a script execution method according to an embodiment of the present application, where the method is applied to a client, and as shown in fig. 2, the method includes steps S21-S24:

s21, acquiring the voice information of the user;

s22, determining the current scene according to the voice information;

in this embodiment of the present application, the user may wake up the terminal device, and then input the voice information, or directly input the voice information without waking up the terminal device, for example: when a vehicle is driven, the vehicle-mounted terminal is directly spoken to play a certain song A. After acquiring the voice information of the user, the client may identify the voice information to obtain a voice identification result, and then perform intent Recognition and slot filling based on the voice identification result to determine a current scene, that is, determine what purpose the user wants to achieve, optionally, an ASR (Automatic Speech Recognition) technology may be used herein to identify the voice information of the user, identify the voice information as text information, and then perform intent Recognition and slot filling on the voice identification result by using an NLU (natural language understanding), the NLU performs text classification on the input text information to identify an intent, and performs slot filling by sequence labeling, where the intent Recognition and slot filling may be processed as separate tasks or as a joint task, which is not limited herein.

S23, sending a script acquisition request to the server; the script acquisition request comprises a name and an identification of at least one target application program for realizing the current scene, and is used for requesting the server to detect whether a first scene script corresponding to the current scene exists according to the name and the identification of the target application program;

in a specific embodiment of the present application, the first scenario script refers to a scenario script that is generated in advance and stored in the server and corresponds to a current scenario. After determining the current scene, the client may determine a target application needed to implement the current scene, where the target application may be one or multiple, for example: the current scene of 'helping me buy a cup of coffee and send the cup of coffee to a company' needs to be realized by using a food ordering application program, the client can determine a target application program according to a mapping relation between a preset scene and the application program for realizing the scene, then determine the name and the identification of the target application program, carry the name and the identification in a script acquisition request and send the script acquisition request to the server, and the server searches for a first scene script.

The mapping relationship between the scenario and the application program implementing the scenario may be as shown in table 1, when the current scenario needs multiple target application programs to implement, the script obtaining request should include a name and an identifier of each target application program in the multiple target application programs, for example, a scenario "sharing a pan link to a wechat friend" in table 1 needs two target application programs, namely, pan and wechat to implement, of course, table 1 is only an example and does not limit the present application.

Scene	Application program
		Buying snacks	Good product shop
Hotel booking	Distance-carrying
		…	…
Buying coffee	Beauty ball
		Share panning links to Wechat friends	Taobao + Wenxin

TABLE 1

Because the server stores the names and the identifiers of the plurality of application programs and the scene scripts corresponding to the names and the identifiers in an associated manner, if the server can find the name and the identifier of the target application program, the server indicates that the first scene script corresponding to the current scene exists. Optionally, the identifier of the target application may be a version number of the target application, for example: the version 7.0.12, the version 7.0.13 and the like are stored in the server side by adopting the name + version number in association with the scene script, so that the application program currently installed on the terminal only has the only corresponding scene script at the server side, and the situation that the client side obtains the non-corresponding scene script due to the adjustment of an operation interface after the application program updates the version, and the operation is wrong can be avoided.

S24, receiving the first scene script returned by the server, analyzing and running the first scene script.

In this embodiment of the present application, if the server has the first scene script corresponding to the current scene, the server returns the first scene script to the client, and the client performs reverse analysis on the first scene script and runs the first scene script to implement the current scene, for example: turn on a certain music player-search for a certain song-play.

In one possible embodiment, the method further comprises:

receiving a response message returned by the server aiming at the script acquisition request; the response message is used for indicating that the first scene script does not exist at the server;

The second scene script is generated after the user operates the target application program to realize the simulation recording of the current scene. If the server does not have the first scene script, the server returns a response message without the first scene script to the client, the client can manually operate the target application program to realize the current scene under the condition that the client receives the response message, for example, manually operate American group to buy coffee, the client can record the whole process of operating the target application program by the user, and can also record the whole process of operating the target application program by the user by calling a third-party recording tool to generate a second scene script, and the second scene script, the name and the identification of the target application program are uploaded to the server to be stored. In the embodiment, when the server does not have the first scene corresponding to the current scene, the process of realizing the current scene by operating the target application program is recorded to generate the second scene script, and the second scene script is uploaded to the server to be stored, so that the scene scripts stored by the server are enriched, and the situation that the server cannot find the corresponding scene script when a user realizes a certain scene can be effectively avoided.

In a possible implementation manner, the recording the current scene by operating the target application program by the user to generate a second scene script corresponding to the current scene includes:

and recording the content to be recorded to obtain the second scene script.

Specifically, when the target application program operated by the user is recorded to realize the current scene, a preset script template corresponding to the current scene needs to be determined, the preset script template defines content to be recorded when the target application program operated by the user realizes the current scene, and the content to be recorded comprises a trigger event of the user on an operation interface of the target application program, trigger time of the trigger event, a type of an event, coordinates of a trigger position, a type of a triggered control and a hierarchical relationship of a view where the control is located. The type of the trigger event includes, but is not limited to, clicking and sliding, and the type of the control includes, but is not limited to, a button and a text display type control. And recording a trigger event, trigger time of the trigger event, the type of the trigger event, coordinates of a trigger position, the type of a triggered control and the hierarchical relationship of the control in the whole operation interface view aiming at each operation interface of the current scene realized by a user operating a target application program, and recording the operation sequence of each operation interface to obtain the second scene script.

For example, if the current scene is a photo, the target application may be a "camera", as shown in fig. 3A, the terminal is currently located on a desktop homepage, the user clicks the target application of the camera, and enters a photo operation interface of the camera, as shown in fig. 3B, the client monitors that the target application is opened, and may perform a recording operation, as shown in fig. 3C, when the user clicks a photo button on the photo operation interface, the client records a user-triggered event, i.e., clicks the photo button, and records one or more of a time when the photo button is clicked, an event type (i.e., a click), coordinates of a clicked position, a clicked control (i.e., a photo button), and a hierarchical relationship of the button on the whole photo operation interface, where the one or more items at least include a trigger time, thereby completing recording of a trigger event on an operation interface, similarly, when a plurality of trigger events exist on one operation interface, the recording is also completed according to the content to be recorded. Please continue to refer to fig. 3D, fig. 3D is a next operation interface entered after the user clicks the photo button, when the user clicks the text display control "save", the client will record the trigger time and the event type of the trigger event to wait for recording, after saving the photo, the target application returns to the photo operation interface shown in fig. 3C, as shown in fig. 3E, when the user clicks the return button at the lower left corner of the interface (the trigger event is also recorded), exits the target application, returns to the desktop homepage shown in fig. 3A, i.e., finishes the current scene of photo taking, and the sequence from the photo operation interface shown in fig. 3B to the return button is the operation sequence of each operation interface. It can be understood that when the trigger event is a slide, a slide track needs to be recorded, at this time, coordinates of the trigger position, that is, coordinates of a start point and coordinates of a terminal of the slide track, and when the user clicks the text display control, text on the text display control needs to be recorded, for example, "save" as described above. In the embodiment, the recording and learning of the content to be recorded on the plurality of operation interfaces of the target application program and each operation interface in the plurality of operation interfaces are facilitated, the scene scripts stored in the server side can be enriched and updated in time, and the error condition in later operation can be effectively avoided.

In a possible implementation manner, before sending the script obtaining request to the server, the method further includes:

Specifically, a plurality of application programs for implementing the current scene may exist on the terminal, for example, the current scene is order take-out, and a plurality of similar application programs for implementing the current scene, such as group beauty, hungry, hundred-degree take-out, and the like, may exist, and then the target application program may be determined according to the priority of each application program in the plurality of application programs, optionally, the priority of each application program in the plurality of application programs may be set by the user, for example: if the user is familiar with the American group to order takeaway, the American group can be set as a preferred application program for ordering takeaway. Optionally, the priority of each of the plurality of applications may be calculated by the client according to behavior habits of the user, as shown in table 2, the client may retrieve a frequency of taking out the plurality of applications used by the user within a period of time, and determine an application with a highest usage frequency or a usage frequency greater than or equal to a preset value among the plurality of applications as a target application, where the period of time is customizable. In this embodiment, when there are multiple application programs that can implement the current scene, the target application program is determined according to the priority of each application program, which can improve user experience.

TABLE 2

In one possible embodiment, the parsing and executing the first scenario script includes:

Specifically, the trigger event to be executed is the trigger event when the second scene script is recorded, and it can be known from the above embodiment of recording the second scene script that after the first scene script is analyzed, the specific content of operation is one or more items of content recorded when the second scene script is recorded. Continuing with the description of fig. 3B, in the current scene of taking a picture, assuming that the currently running trigger event on the operation interface of taking a picture is the trigger event, the client may determine that the current trigger event to be executed is the click of the shooting button according to one or more items of the time of clicking the shooting button, the type of the event, the coordinate of the click position, and the like recorded in the first scene script. Moreover, the time difference between every two adjacent to-be-executed trigger events can be calculated according to the trigger time of every two adjacent to-be-executed trigger events in all trigger events, for example, if two trigger events exist on the operation interface of fig. 3B, there are two to-be-executed trigger events, the trigger time of the first to-be-executed trigger event is 10:25, and the trigger time of the second to-be-executed trigger event is 10:27, so that the client end can calculate the time difference between the two adjacent to-be-executed trigger events to be 2 minutes. Of course, for the sake of illustration only, the two adjacent to-be-executed trigger events are adjacent in trigger time, and are not limited to the same operation interface. Therefore, the to-be-executed trigger events on all the operation interfaces are executed according to the operation sequence of each operation interface recorded during recording and the time difference, and the execution of the first scene script is completed, for example: and for two adjacent to-be-executed trigger events on the two operation interfaces, after the to-be-executed trigger event on the first operation interface is executed, waiting for 2 minutes and then executing the to-be-executed trigger event on the next operation interface. In the embodiment, the first scene script is operated according to the content recorded during recording, so that the hands of the user can be truly liberated, every two adjacent trigger events to be executed are executed according to the time difference, the user can know what operation is currently performed at a time interval, the user experience is better, in addition, the first scene script completely corresponds to the target application program, and the operation success rate is ensured.

In a possible implementation manner, the recognizing the speech information to obtain a speech recognition result includes:

Specifically, voice information of a user is input into a voice activity detection model to perform voice extraction, so as to obtain M segments of voice information, a neural network model is adopted to perform feature extraction on the M segments of voice information, so as to obtain M segments of feature sequences as shown in fig. 4, and if M is 3, a feature sequence segment 1 corresponds to 20 audio frames, so that a feature sequence segment 1[ K1, K2, K3, …, K20] is provided; the feature sequence segment 2 corresponds to 30 audio frames, and then there are feature sequence segments 2[ K21, K22, K23, …, K50 ]; the feature sequence segment 3 corresponds to 50 audio frames, and then there are feature sequence segments 3[ K51, K52, K53, …, K100 ]; the three characteristic sequences are spliced to obtain a first characteristic sequence [ K1, K2, K3, … and K100 ]. Because the splicing position of two adjacent feature sequences can have unsmooth condition, it needs to smooth the first feature sequence, and intercept P target feature subsequences from the first feature sequence, and from the above example, the first target feature subsequence [ K, K ] in the P target feature subsequences, the second target feature subsequence [ K, K ], the third target feature subsequence [ K, K100] in the P target feature subsequences, and P target features, i.e. the overlapped part [ K, K) in each target feature subsequence are obtained, and the similarity between each target feature in the P target feature subsequence and P target features is calculated by using a similarity calculation method, so as to obtain the similarity of P corresponding to each target feature in the P target feature subsequence, wherein the similarity algorithm may be euclidean distance, cosine similarity, etc. Normalizing the P2 similarity of each target feature to obtain P2 weights of each target feature, multiplying each target feature by the P2 weights corresponding to each target feature to obtain P2 output features corresponding to each target feature, summing and accumulating the P2 output features to obtain an updated feature of each target feature, forming an updated feature subsequence by the updated features of each target feature, splicing the updated feature subsequences into a second feature sequence, matching the second feature sequence with the feature sequences of texts stored in a corpus, and taking the text with the highest matching degree as a speech recognition result of speech information.

In the embodiment, during the voice recognition process, the extracted feature sequence of the voice of the human voice is processed at the spliced part after splicing, so that the spliced second feature sequence is more smooth, and the voice recognition precision is improved.

It can be seen that, in the embodiment of the application, the voice information of the user is obtained; determining a current scene according to the voice information; sending a script acquisition request to a server; the script acquisition request comprises a name and an identification of at least one target application program for realizing the current scene, and is used for requesting the server to detect whether a first scene script corresponding to the current scene exists according to the name and the identification of the target application program; and receiving the first scene script returned by the server, analyzing and running the first scene script. The name and the identification of the target application program are adopted to request the server to detect whether the first scene script corresponding to the current scene exists, if so, the first scene script is directly acquired and operated, so that the current scene which needs to be realized by the user is completed, the client is facilitated to operate the pre-configured scene script corresponding to the target application program, the success rate of the operation of the pre-configured scene script is improved, the user requirement is met, and the user experience is improved.

Referring to fig. 5, fig. 5 is a flowchart illustrating another script execution method provided by the embodiment of the present application, which is applied to a client, and as shown in fig. 5, the method includes steps S51-S56:

s51, acquiring the voice information of the user;

s52, determining the current scene according to the voice information;

s53, sending a script acquisition request to the server; the script acquisition request comprises a name and an identification of at least one target application program for realizing the current scene, and is used for requesting the server to detect whether a first scene script corresponding to the current scene exists according to the name and the identification of the target application program;

s54, receiving the first scene script returned by the server under the condition that the first scene script exists in the server, analyzing and running the first scene script;

s55, under the condition that the first scene script does not exist at the server, recording the current scene by operating the target application program by the user so as to generate a second scene script corresponding to the current scene;

s56, uploading the second scene script to the server, so that the server stores the second scene script in association with the name and the identification of the target application program.

The specific implementation of steps S51-S56 is described in the embodiment shown in fig. 2, and can achieve the same or similar advantages, and will not be described herein again.

Based on the description of the above-mentioned embodiment of the script execution method, please refer to fig. 6, fig. 6 is a schematic structural diagram of a script execution device provided in the embodiment of the present application, and as shown in fig. 6, the device includes:

the voice acquisition module 61 is used for acquiring voice information of a user;

a scene determining module 62, configured to determine a current scene according to the voice information;

a scenario script request module 63, configured to send a script obtaining request to the server; the script acquisition request comprises a name and an identification of at least one target application program for realizing the current scene, and is used for requesting the server to detect whether a first scene script corresponding to the current scene exists according to the name and the identification of the target application program;

and the scene script execution module 64 is configured to receive the first scene script returned by the server, analyze the first scene script, and run the first scene script.

In one possible embodiment, as shown in fig. 7, the apparatus further includes a scene script recording module 65; the scene script recording module 65 is configured to:

In a possible implementation manner, in recording the current scene by operating the target application program by the user to generate the second scene script corresponding to the current scene, the scene script recording module 65 is specifically configured to:

and recording the content to be recorded to obtain the second scene script.

In a possible implementation manner, the content to be recorded includes one or more items of a trigger event of a user on an operation interface of the target application program, trigger time of the trigger event, a type of the trigger event, coordinates of a trigger position, a type of a triggered control, and a hierarchical relationship of a view where the control is located; in terms of recording the content to be recorded to obtain the second scene script, the scene script recording module 65 is specifically configured to:

In a possible implementation, the scenario script request module 63 is further configured to:

In a possible implementation manner, in terms of parsing and executing the first scenario script, the scenario script execution module 64 is specifically configured to:

In one possible implementation, in determining the current scene according to the voice information, the scene determining module 62 is specifically configured to:

recognizing the voice information to obtain a voice recognition result;

In a possible implementation manner, in recognizing the speech information to obtain a speech recognition result, the scenario determination module 62 is specifically configured to:

In one possible implementation, in obtaining the updated feature subsequence of each of the P1 target feature subsequences based on the P2 similarity of each target feature, the scene determination module 62 is specifically configured to:

According to an embodiment of the present application, the units of the script execution apparatus shown in fig. 6 or fig. 7 may be respectively or entirely combined into one or several other units to form the script execution apparatus, or some unit(s) of the script execution apparatus may be further split into multiple units with smaller functions to form the script execution apparatus, which may implement the same operation without affecting implementation of technical effects of embodiments of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the script execution apparatus may also include other units, and in practical applications, these functions may also be implemented by being assisted by other units, and may be implemented by cooperation of a plurality of units.

According to another embodiment of the present application, the script execution apparatus device shown in fig. 6 or fig. 7 may be constructed by running a computer program (including program codes) capable of executing steps involved in the corresponding method shown in fig. 2 or fig. 5 on a general-purpose computing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and a storage element, and the script execution method of the embodiment of the present application may be implemented. The computer program may be recorded on a computer-readable recording medium, for example, and loaded and executed in the above-described computing apparatus via the computer-readable recording medium.

Based on the description of the method embodiment and the device embodiment, the embodiment of the application further provides an electronic device. Referring to fig. 8, the electronic device includes at least a processor 81, an input device 82, an output device 83, and a computer storage medium 84. The processor 81, input device 82, output device 83, and computer storage medium 84 within the electronic device may be connected by a bus or other means.

A computer storage medium 84 may be stored in the memory of the electronic device, the computer storage medium 84 being for storing a computer program comprising program instructions, the processor 81 being for executing the program instructions stored by the computer storage medium 84. The processor 81 (or CPU) is a computing core and a control core of the electronic device, and is adapted to implement one or more instructions, and in particular, is adapted to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function.

In one embodiment, the processor 81 of the electronic device provided in the embodiment of the present application may be configured to perform a series of script execution processes:

acquiring voice information of a user;

determining a current scene according to the voice information;

In yet another embodiment, the processor 81 is further configured to:

In another embodiment, the executing, by the processor 81, the recording of the current scene by the user operating the target application program to generate a second scene script corresponding to the current scene includes:

and recording the content to be recorded to obtain the second scene script.

In another embodiment, the content to be recorded includes one or more items of a trigger event of a user on an operation interface of the target application program, trigger time of the trigger event, a type of the trigger event, coordinates of a trigger position, a type of a triggered control, and a hierarchical relationship of a view where the control is located; the processor 81 executes the recording of the content to be recorded to obtain the second scene script, including:

In yet another embodiment, before sending the script obtaining request to the server, the processor 81 is further configured to:

In yet another embodiment, processor 81 performs the parsing and executing the first scenario script, including:

In another embodiment, the processor 81 performs the determining the current scene according to the voice information, including:

recognizing the voice information to obtain a voice recognition result;

In another embodiment, the processor 81 performs the recognition on the speech information to obtain a speech recognition result, including:

In another embodiment, the processor 81 performs the obtaining of the updated feature subsequence of each of the P1 target feature subsequences based on the P2 similarity of each target feature, including:

By way of example, clients may include, but are not limited to, a processor 81, an input device 82, an output device 83, and a computer storage medium 84. Those skilled in the art will appreciate that the schematic diagram is merely an example of a client and is not limiting of a client, and may include more or fewer components than those shown, or some components in combination, or different components.

It should be noted that, since the processor 81 of the client executes the computer program to implement the steps in the script execution method, the embodiments of the script execution method are all applicable to the client, and all can achieve the same or similar beneficial effects.

An embodiment of the present application further provides a computer storage medium (Memory), which is a Memory device in an electronic device and is used to store programs and data. It is understood that the computer storage medium herein may include a built-in storage medium in the terminal, and may also include an extended storage medium supported by the terminal. The computer storage medium provides a storage space that stores an operating system of the terminal. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 81. The computer storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; alternatively, it may be at least one computer storage medium located remotely from the processor 81. In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by processor 81 to perform the corresponding steps described above with respect to the script execution method.

Illustratively, the computer program of the computer storage medium includes computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.

It should be noted that, since the computer program of the computer storage medium implements the steps of the script execution method when executed by the processor, all the embodiments of the script execution method are applicable to the computer storage medium, and can achieve the same or similar advantages.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A script execution method applied to a client, the method comprising:

acquiring voice information of a user;

determining a current scene according to the voice information;

2. The method of claim 1, further comprising:

3. The method according to claim 2, wherein the recording the current scene by the user operating the target application program to generate a second scene script corresponding to the current scene comprises:

and recording the content to be recorded to obtain the second scene script.

4. The method according to claim 3, wherein the content to be recorded comprises one or more items of a trigger event of a user on an operation interface of the target application program, trigger time of the trigger event, a type of the trigger event, coordinates of a trigger position, a type of a triggered control and a hierarchical relationship of a view where the control is located;

5. The method according to any one of claims 1-4, wherein before sending the script fetch request to the server, the method further comprises:

6. The method of claim 4, wherein parsing and running the first scenario script comprises:

7. The method of claim 1, wherein determining the current scene from the speech information comprises:

recognizing the voice information to obtain a voice recognition result;

8. The method of claim 7, wherein the recognizing the voice information to obtain a voice recognition result comprises:

9. The method of claim 8, wherein the obtaining updated feature subsequences of each of the P1 target feature subsequences based on the P2 similarity of each target feature comprises:

10. A script execution apparatus, characterized in that the apparatus comprises:

the voice acquisition module is used for acquiring voice information of a user;

11. An electronic device comprising an input device and an output device, further comprising:

a processor adapted to implement one or more instructions; and the number of the first and second groups,

a computer storage medium having stored thereon one or more instructions adapted to be loaded by the processor and to perform the method of any of claims 1-9.

12. A computer storage medium having stored thereon one or more instructions adapted to be loaded by a processor and to perform the method of any of claims 1-9.