CN111367492A

CN111367492A - Webpage display method and device and storage medium

Info

Publication number: CN111367492A
Application number: CN202010144256.5A
Authority: CN
Inventors: 梁宇轩
Original assignee: Shenzhen Tencent Information Technology Co Ltd
Current assignee: Shenzhen Tencent Information Technology Co Ltd
Priority date: 2020-03-04
Filing date: 2020-03-04
Publication date: 2020-07-03
Anticipated expiration: 2040-03-04
Also published as: CN111367492B

Abstract

The application provides a webpage display method, a webpage display device and a storage medium, which belong to the field of artificial intelligence, and the method comprises the following steps: the method comprises the steps that a terminal sends first voice to a server, the server determines a target activity scene matched with the first voice from an activity scene library, the target activity scene comprises a plurality of target elements and a plurality of target actions corresponding to the target elements, for each target action in the target actions, the server determines a target action model matched with the target action from an action model library to obtain a plurality of target action models, the server sends the target elements, the target actions and the target action models to the terminal, and the terminal displays a webpage corresponding to the target activity scene according to the target elements, the target actions and the target action models. The method and the device are beneficial to improving the flexibility of the webpage display mode.

Description

Webpage display method and device and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a method and an apparatus for displaying a web page, and a storage medium.

Background

With the development of internet and information technology, network operation activities (such as advertisements and lottery activities) are increasingly popular, and the network operation activities are usually displayed in web pages.

Currently, a user typically triggers a terminal to present a web page via a keyboard and/or mouse. Such web page presentation is less flexible.

Disclosure of Invention

The application provides a webpage display method and device and a storage medium, which are beneficial to improving the flexibility of a webpage display mode. The technical scheme is as follows:

in a first aspect, a method for displaying a webpage is provided, where the method includes:

receiving a first voice sent by a terminal;

determining a target activity scene matched with the first voice from an activity scene library, wherein the target activity scene comprises a plurality of target elements and a plurality of target actions corresponding to the plurality of target elements;

for each target action in the target actions, determining a target action model matched with the target action from an action model library to obtain a plurality of target action models;

and sending the target elements, the target actions and the target action models to the terminal so that the terminal can display the webpage corresponding to the target activity scene according to the target elements, the target actions and the target action models.

In a second aspect, a method for displaying a webpage is provided, the method comprising:

sending a first voice to a server;

receiving a plurality of target elements, a plurality of target actions and a plurality of target action models sent by the server, wherein the plurality of target elements and the plurality of target actions belong to a target activity scene matched with the first voice, and the plurality of target action models are matched with the plurality of target actions;

and displaying a webpage corresponding to the target activity scene according to the target elements, the target actions and the target action models.

In a third aspect, an apparatus for displaying a web page is provided, the apparatus comprising:

the receiving module is used for receiving a first voice sent by the terminal;

a first determining module, configured to determine a target activity scene matching the first voice from an activity scene library, where the target activity scene includes a plurality of target elements and a plurality of target actions corresponding to the plurality of target elements;

the second determining module is used for determining a target action model matched with the target action from an action model library for each target action in the target actions to obtain a plurality of target action models;

and the sending module is used for sending the target elements, the target actions and the target action models to the terminal so that the terminal can display the webpage corresponding to the target activity scene according to the target elements, the target actions and the target action models.

In a fourth aspect, a device for displaying web pages is provided, the device comprising:

the sending module is used for sending the first voice to the server;

a receiving module, configured to receive a plurality of target elements, a plurality of target actions, and a plurality of target action models sent by the server, where the plurality of target elements and the plurality of target actions belong to a target activity scene matched with the first voice, and the plurality of target action models are matched with the plurality of target actions;

and the display module is used for displaying the webpage corresponding to the target activity scene according to the target elements, the target actions and the target action models.

In a fifth aspect, there is provided a web page presentation apparatus, the apparatus comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the method of the first aspect.

In a sixth aspect, there is provided a web page presentation apparatus, comprising a processor and a memory, wherein at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the method according to the second aspect.

In a seventh aspect, a computer-readable storage medium is provided, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, which is loaded and executed by a processor to implement the method of the above-mentioned aspects.

The eighth aspect provides a webpage display system, which comprises a server and a terminal;

in an implementation manner, the server includes the apparatus for displaying a webpage of the third aspect, and the terminal includes the apparatus for displaying a webpage of the fourth aspect;

in another implementation manner, the server includes the webpage page display apparatus of the fifth aspect, and the terminal includes the webpage page display apparatus of the sixth aspect.

The beneficial effect that technical scheme that this application provided brought is:

the application provides a webpage display method and device and a storage medium, after a terminal sends a first voice to a server, the server determines a target activity scene matched with the first voice from an activity scene library, the target activity scene comprises a plurality of target elements and a plurality of target actions corresponding to the target elements, for each target action in the target actions, the server determines a target action model matched with the target action from an action model library to obtain a plurality of target action models, sends the target elements, the target actions and the target action models to the terminal, and the terminal displays a webpage corresponding to the target activity scene according to the target elements, the target actions and the target action models. Because the terminal displays the webpage based on the voice, compared with a mode that the webpage is displayed by triggering the terminal through a keyboard and/or a mouse, the flexibility of the webpage display mode is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic illustration of an implementation environment to which various embodiments of the present application relate;

FIG. 2 is a schematic diagram of an audio API provided by an embodiment of the present application;

fig. 3 is a schematic interaction diagram of a terminal and a server according to an embodiment of the present disclosure;

fig. 4 is a flowchart of a method for displaying a webpage according to an embodiment of the present application;

FIG. 5 is a flowchart of another method for displaying a webpage according to an embodiment of the present application;

FIG. 6 is a flow chart of a method for analyzing a first voice according to an embodiment of the present application;

fig. 7 is a schematic diagram of a method for displaying a webpage provided in an embodiment of the present application;

FIG. 8 is a block diagram of a device for displaying web pages provided by an embodiment of the present application;

FIG. 9 is a block diagram of another apparatus for displaying web pages provided by an embodiment of the present application;

FIG. 10 is a block diagram of another apparatus for displaying web pages provided by an embodiment of the present application;

FIG. 11 is a block diagram of another apparatus for displaying web pages provided by an embodiment of the present application;

FIG. 12 is a block diagram of a web page display apparatus according to an embodiment of the present disclosure;

fig. 13 is a schematic structural diagram of a web page display apparatus according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of another apparatus for displaying a webpage according to an embodiment of the present application.

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Detailed Description

In order to make the principle, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, a schematic diagram of an implementation environment according to various embodiments of the present application is shown, and referring to fig. 1, the implementation environment includes: a terminal 110 and a server 120. The terminal 110 and the server 120 may be communicatively connected through a wired network or a wireless network, which may include but is not limited to: a Wireless Fidelity (WIFI) network, a bluetooth network, an infrared network, a Zigbee (Zigbee) network, or a data network, and the wired network may be a Universal Serial Bus (USB) network.

The terminal 110 may be an electronic device with a web page display function, for example, the terminal 110 may be a smart phone, a tablet computer, a notebook computer, or a desktop computer. Alternatively, a browser may be installed in the terminal 110, and the terminal 110 may display a web page, which may be an H5 web page, through the browser. In the embodiment of the present application, the terminal 110 may support various types of browsers, for example, the terminal 110 may support an IE (chinese: netexplorer; english: internet explorer) browser, an Edge browser, a Firefox (english: Firefox) browser, a Chrome browser, a Safari browser, an Opera (chinese: apond) browser, an IOS Safari browser, an Opera mini browser, an Android browser, a Chrome for Android browser, and the like. The server 120 may be a server, a server cluster composed of several servers, or a cloud computing service center. As shown in fig. 1, in the embodiment of the present application, the terminal 110 is a desktop computer, and the server 120 is a server.

In this embodiment, the terminal 110 may collect voice, the voice may be voice of a user, for convenience of description, this embodiment refers to the voice collected by the terminal 110 as first voice, after the terminal 110 collects the first voice, the first voice may be sent to the server 120, after the server 120 receives the first voice, a target activity scenario matching the first voice may be determined from the activity scenario library, the target activity scenario includes a plurality of target elements and a plurality of target actions corresponding to the plurality of target elements, for each target action in the plurality of target actions, the server 120 may determine a target action model matching the target action from the action model library to obtain a plurality of target action models, and then the server 120 sends the plurality of target elements, the plurality of target actions and the plurality of target action models to the terminal 110, the terminal 110 displays a webpage corresponding to the target activity scene according to the target elements, the target actions and the target action models. In this way, the terminal 110 can display the webpage based on the voice, which is helpful for improving the flexibility of the webpage display mode.

The method and the device for displaying the operation activities are suitable for the user to trigger the terminal to display the webpage of the operation activities, for example, in the process that the user browses the webpage, the terminal displays (can be in popup display) the operation activity information in the webpage, and the user can trigger the terminal to display the webpage of the operation activities through voice. For example, the operation activity may be a lottery activity of the mobile phone a, the first voice may be "i want to lottery, want to lottery the mobile phone a", the target activity scene matched with the first voice may be a lottery activity scene of the mobile phone a, the target element may be a web page element (e.g., animation, virtual object, etc., the virtual object may be a virtual character) of the lottery activity scene, the target action model may be a model according to which the target element performs a target action (e.g., a lottery action), and the terminal 110 may display a web page corresponding to the lottery activity scene of the mobile phone a, that is, a lottery web page displaying the mobile phone a, according to a plurality of target elements, a plurality of target actions corresponding to the plurality of target elements, and a plurality of target action models matched with the plurality of target actions.

Optionally, the terminal 110 may have a voice collecting component, and the terminal 110 may collect the first voice through the voice collecting component. The voice collecting component may be connected to the terminal 110 in a pluggable manner, or the voice collecting component is embedded in the terminal 110, which is not limited in this embodiment of the present application. Alternatively, the voice capturing component may be a microphone, and the microphone may communicate with the terminal 110 through an audio Application Programming Interface (API). By way of example, please refer to fig. 2, which illustrates a schematic diagram of an audio API provided in an embodiment of the present application, as shown in fig. 2, the audio API includes an audio node, and a microphone node, a filtering node, a data processing node, a volume setting node, and an audio buffering node, which are respectively connected to the audio node, the audio node is configured to implement intercommunication among the microphone node, the filtering node, the data processing node, the volume setting node, and the audio buffering node, the microphone node is configured to connect to a microphone, the filtering node is configured to filter voice collected by the microphone, the volume setting node is configured to set a volume of voice collected by the microphone, the audio buffering node is configured to buffer voice collected by the microphone, and the data processing node is configured to process voice collected by the microphone for being played.

Alternatively, the terminal 110 and the server 120 may establish a communication connection before performing the communication, and may disconnect the communication connection after completing the communication, wherein the communication connection may be a socket connection based on the above-mentioned various communication networks, and the socket is a network programming interface independent of a protocol. For example, referring to fig. 3, which shows an interaction diagram of a terminal 110 and a server 120 provided in an embodiment of the present application, as shown in fig. 3, the server 120 listens to a socket and waits for receiving a connection request, the terminal 110 may create a socket connection and send the connection request to the server 120, and the server 120 creates the socket connection after receiving the connection request sent by the terminal 110, so that the socket connection between the terminal 110 and the server 120 is successfully established. Then, the terminal 110 and the server 120 perform transmission of data streams (which may be a data stream of the first voice, a data stream of the target element, a data stream of the target action, and a data stream of the target action model) based on the set of byte connections, and after the data stream transmission is completed, the terminal 110 and the server 120 each close the set of byte connections.

Referring to fig. 4, a flowchart of a method for displaying a webpage according to an embodiment of the present application is shown, where the method for displaying a webpage can be used in the implementation environment shown in fig. 1, and referring to fig. 4, the method can include the following steps:

step 401, the terminal sends a first voice to the server.

Step 402, the server receives the first voice sent by the terminal.

In step 403, the server determines a target activity scene matching the first voice from the activity scene library, where the target activity scene includes a plurality of target elements and a plurality of target actions corresponding to the plurality of target elements.

The target activity scene may be, for example, a lottery activity scene, an advertisement scene of a certain activity, a certain game scene, a voting activity scene, etc., the target element may be, for example, an animation, a virtual object, etc. in the target activity scene, and the target element may be a planar element or a three-dimensional (3D) element, and the target action corresponding to each target element may be an action performed by the target element.

In step 404, the server determines a target action model matching the target action from the action model library for each target action of the target actions, so as to obtain a plurality of target action models.

The target action model may be a model according to which the target element executes the target action, the target action model may include model parameters, and the target element executes the target action according to the model parameters in the target action model.

Step 405, the server sends the target elements, the target actions and the target action models to the terminal.

Step 406, the terminal receives the target elements, the target actions and the target action models sent by the server.

Step 407, the terminal displays a webpage corresponding to the target activity scene according to the target elements, the target actions and the target action models.

Optionally, the plurality of target elements correspond to the plurality of target actions one to one, the plurality of target action models are matched with the plurality of target actions, and the terminal may input the target action corresponding to each target element into the corresponding target action model, so that the target element executes the target action according to the output of the target action model, thereby displaying the web page corresponding to the target activity scene. It is easy to understand that the web page corresponding to the target activity scene may further include a static element (for example, a picture, a text, and the like), and the terminal may display the static element according to the resource file of the web page corresponding to the target activity scene.

To sum up, in the webpage display method provided in the embodiment of the present application, after the terminal sends the first voice to the server, the server determines the target activity scene matched with the first voice from the activity scene library, where the target activity scene includes a plurality of target elements and a plurality of target actions corresponding to the plurality of target elements, and for each target action in the plurality of target actions, the server determines a target action model matched with the target action from the action model library to obtain a plurality of target action models, and sends the plurality of target elements, the plurality of target actions and the plurality of target action models to the terminal, and the terminal displays the webpage corresponding to the target activity scene according to the plurality of target elements, the plurality of target actions and the plurality of target action models. Because the terminal displays the webpage based on the voice, compared with a mode that the webpage is displayed by triggering the terminal through a keyboard and/or a mouse, the flexibility of the webpage display mode is improved.

Referring to fig. 5, a flowchart of another method for displaying a webpage according to an embodiment of the present application is shown, where the method for displaying a webpage can be used in the implementation environment shown in fig. 1, and referring to fig. 5, the method can include the following steps:

step 501, the terminal collects a first voice.

Optionally, the terminal may have a voice collecting component, and the terminal may collect the first voice through the voice collecting component, and the first voice may be a voice of the user. Illustratively, the voice capture component may be a microphone, and the terminal may call an audio API of the microphone to capture the first voice through the microphone.

Optionally, the terminal may be equipped with a browser, and before the first voice is collected, the terminal may display a web page through the browser, and the user may speak according to the content in the web page, so that the first voice collected by the terminal may be related to the content in the web page. The web page may be any web page, such as a game page, a news page, a video page, or an introduction page of a certain product, and the web page may include operation information, such as lottery information, advertisement information, voting information, or game push information, and the user may speak according to the operation information.

Exemplarily, lottery drawing information of the mobile phone a, "the mobile phone a is currently in a hot lottery drawing and invites you to participate in the lottery drawing," is displayed in the web page, and the first voice may be, for example, "i want to draw a lottery and want to draw the mobile phone a"; further illustratively, the webpage has advertisement information "buy the same item X as star B" displayed therein, and the first voice may be, for example, "go to buy item X"; as another example, the webpage page displays a voting information "yy is participating in xx voting and enters into the voting interface to vote", and the first voice may be "go to vote" for example; also illustratively, the web page has a game push message "your friend is conspicuous in game C for battle performance to challenge him" displayed therein, and the first voice may be "to play game C", for example. It should be noted that the operation activity information listed here is only an example, and in practical applications, the operation activity information may also include a picture, and the picture may be a dynamic picture or a static picture, which is not limited in this embodiment of the application.

Step 502, the terminal sends the first voice to the server.

Alternatively, the terminal may send the first voice to the server through a socket connection with the server.

Step 503, the server receives the first voice sent by the terminal.

Alternatively, the server may receive the first voice transmitted by the terminal through a socket connection with the terminal.

Step 504, the server analyzes the first voice to obtain a key sentence of the first voice.

Optionally, the server may have a voice index analysis unit, and the server analyzes the first voice through the voice index analysis unit to obtain the key sentence of the first voice.

For example, please refer to fig. 6, which shows a flowchart of a method for analyzing a first voice by a server according to an embodiment of the present application, and referring to fig. 6, the method may include:

sub-step 5041, split the first speech into a plurality of speech segments.

Optionally, the lengths (i.e., durations) of the plurality of voice segments may be equal, and the server may split the first voice into a plurality of voice segments with equal lengths on average according to the length of the first voice. Or, the server may split the first voice into a plurality of voice fragments according to the semantic meaning of the first voice, and place the same word in the same voice fragment, for example, place "hand" and "machine" in the word "mobile phone" in the same voice fragment, which is not limited in this embodiment of the present application.

Substep 5042, recognizing the plurality of speech segments to obtain a plurality of text segments.

Optionally, for each speech segment, the server may recognize the speech segment based on at least one speech recognition platform, and because there is a difference in speech recognition technologies adopted by different speech recognition platforms, at least one text segment may be obtained by recognizing the speech segment based on at least one speech recognition platform, and thus a plurality of text segments may be obtained by recognizing the plurality of speech segments.

For example, the server splits the first voice to obtain voice segments 1 to 6 (that is, voice segment 1, voice segment 2, voice segment 3, voice segment 4, voice segment 5, and voice segment 6), and taking the example that the server recognizes the voice segments based on one voice recognition platform, the server recognizes the voice segments 1 to 6 to obtain text segments 1 to 6 (that is, text segment 1, text segment 2, text segment 3, text segment 4, text segment 5, and text segment 6).

Substep 5043, processing the plurality of text segments to obtain a key sentence.

Optionally, the server may determine at least one target text segment belonging to a target category from the plurality of text segments, and process the at least one target text segment into a key sentence, where the target category is a category of the target activity scene. Optionally, the server may arrange the at least one target text segment according to a grammar rule to obtain a key sentence.

Optionally, the server may have a bayesian classifier for calculating a probability that the feature belongs to the target class, for each of the plurality of text segments, the server may input the text segment into the bayesian classifier, calculate a probability that the text segment belongs to the target class through the bayesian classifier, and determine at least one target text segment belonging to the target class from the plurality of text segments according to the probability that each of the plurality of text segments belongs to the target class. Optionally, the server may determine, as the target text segment belonging to the target category, a text segment, of the plurality of text segments, whose probability of belonging to the target category is greater than a preset probability value.

Or, optionally, the server may have a scene classification model, and for each text segment in the plurality of text segments, the server may classify the text segment by using the scene classification model to obtain a category of the text segment, and determine, according to the category of the plurality of text segments, a target text segment belonging to a target category from the plurality of text segments. The scene classification model can be obtained by the server through training of a machine learning algorithm, or obtained by other equipment through training of the machine learning algorithm and transplanted into the server. Optionally, the scene classification model is obtained by the server through training of a machine learning algorithm, and the server may train the scene classification model before classifying the text segments through the scene classification model.

Optionally, the process of training the scene classification model by the server may include: the method comprises the steps that a server obtains a training sample set, the training sample set comprises a plurality of sample character fragments belonging to different scenes and a mark class (a class obtained by marking the sample character fragments can be manually marked or mechanically marked), for each sample character fragment, the server inputs the sample character fragment into an initial classification model, determines an output class of the initial classification model as a prediction class of the sample character fragment, if the prediction class of the sample character fragment is different from the mark class of the sample character fragment, the server corrects model parameters of the initial classification model to obtain a corrected classification model, inputs the sample character fragment into the corrected classification model, determines the output class of the corrected classification model as the prediction class of the sample character fragment, and if the prediction class of the sample character fragment is different from the mark class of the sample character fragment, and the server corrects the model parameters of the corrected classification model until the prediction category of the sample text segment is the same as the mark category of the sample text segment, and the like is performed until the prediction category and the mark category of each sample text segment in the plurality of sample text segments are the same, and the corresponding classification model with the same prediction category and mark category of each sample text segment in the plurality of sample text segments is determined as the scene classification model.

Step 505, the server determines a target activity scene matched with the first voice from an activity scene library according to the key sentence of the first voice, wherein the target activity scene comprises a plurality of target elements and a plurality of target actions corresponding to the plurality of target elements.

Optionally, the server may have an activity scene library, where the activity scene library may include a plurality of different activity scenes, each activity scene may correspond to at least one key sentence, the server may determine, from the activity scene library, an activity scene matching the key sentence of the first voice, and determine the activity scene matching the key sentence as a target activity scene matching the first voice. Each activity scene in the activity scene library may include a plurality of elements and a plurality of actions corresponding to the plurality of elements, for example, each activity scene may include a mapping relationship of elements and actions, and the target activity scene may include a plurality of target elements and a plurality of target actions corresponding to the plurality of target elements.

Alternatively, the target event scene may be, for example, a lottery event scene, an advertisement scene of a certain event, a certain game scene, a voting event scene, and the like, the target elements may be, for example, animations, virtual objects, and the like in the target event scene, and the target elements may be planar elements, 3D elements, and the like, and the target action corresponding to each target element may be an action performed by the target element.

For example, the first voice may be "i want to draw a lottery, want to draw a mobile a", and the target activity scene matched with the first voice may be a lottery activity scene of the mobile a; as another example, the first voice may be "to purchase item X", and the target activity scene matched with the first voice may be a purchase scene of item X; as another example, the first voice may be "go to vote," and the target activity scenario matching the first voice may be a voting activity scenario of xx; as another example, the first voice may be "to play Game C" and the target activity scenario matching the first voice may be a game scenario of Game C.

Step 506, the server determines a target action model matched with the target action from the action model library for each target action in the plurality of target actions to obtain a plurality of target action models.

Optionally, the server may have therein an action model library, which may include therein a plurality of action models, each of which may include model parameters, and each of which may correspond to an action, each of which is used for an element corresponding to the action to perform the action. For each target action in the plurality of target actions, the server may determine, from the action model library, a target action model that matches (i.e., corresponds to) the target action, thereby obtaining a plurality of target action models, each target action model being used for a target element corresponding to the corresponding target action to perform the target action.

And step 507, the server determines a second voice matched with the first voice from the active voice library according to the key sentence of the first voice.

Alternatively, the server may have an active speech library, the active speech library may include a plurality of different active speeches, each of the active speeches may correspond to at least one key sentence, the server may determine an active speech matching the key sentence of the first speech from the active speech library, and determine the active speech matching the key sentence as the second speech matching the first speech.

For example, the first voice may be "i want to draw a lottery, want to draw cell a", and the second voice matching the first voice may be "start drawing"; for another example, the first voice may be "to purchase item X", and the second voice matching the first voice may be "to welcome purchase item X"; for another example, the first speech may be "vote to go", and the second speech that matches the first speech may be "welcome to yy vote"; further illustratively, the first voice may be "to play game C", and the second voice matching the first voice may be start music of game C.

Step 508, the server matches the second voice with the target activity scenario.

After the server determines a second voice matching the first voice, the second voice may be matched with the target activity scenario to associate the second voice with the target activity scenario.

Optionally, the server may detect whether the second voice matches the target activity scenario, and if the second voice matches the target activity scenario, the server may bind the second voice to the target activity scenario, and if the second voice does not match the target activity scenario, the server may re-perform steps 507 and 508 until the second voice matching the target activity scenario is determined.

For example, the first voice may be "i want to draw a lottery, want to draw cell phone a", the target activity scene matching the first voice may be the lottery activity scene of cell phone a, the second voice may match the target activity scene if the second voice is "start lottery", and the second voice may not match the target activity scene if the second voice is "go to purchase item X". As another example, the first voice may be "go to purchase item X", the target activity scene matching the first voice may be a purchase scene of item X, the second voice may match the target activity scene if the second voice is "welcome to purchase item X", and the second voice may not match the target activity scene if the second voice is "welcome to vote for yy".

Step 509, the server sends the target elements, the target actions, the target action models and the second voice to the terminal.

Optionally, the server may send the plurality of target elements, the plurality of target actions, the plurality of target action models, and the second voice to the terminal through a socket connection with the terminal.

Optionally, the target elements, the target actions and the target action models may correspond to each other one by one, and the server may send the correspondence between the target elements, the target actions and the target action models to the terminal. Optionally, the server may send the corresponding relationship and the second voice to the terminal at the same time, or send the corresponding relationship and the second voice to the terminal in multiple times, which is not limited in this embodiment of the application.

As will be understood by those skilled in the art, the server transmitting the target element, the target action and the target action model to the terminal means that the server transmits data of the target element, data of the target action and data of the target action model to the terminal.

Step 510, the terminal receives the target elements, the target actions, the target action models and the second voice sent by the server.

Optionally, the terminal may receive the target elements, the target actions, the target action models, and the second voice sent by the server through a socket connection with the server.

Optionally, the multiple target elements, the multiple target actions, and the multiple target action models may correspond one to one, and the terminal may receive a correspondence relationship between the multiple target elements, the multiple target actions, and the multiple target action models, which is sent by the server.

It will be understood by those skilled in the art that the target element, the target action and the target action model transmitted by the terminal reception server, corresponding to step 509, refer to data of the target element, data of the target action and data of the target action model transmitted by the terminal reception server.

Step 511, the terminal displays the web page corresponding to the target activity scene according to the target elements, the target actions and the target action models, and plays the second voice in the process of displaying the web page corresponding to the target activity scene.

Optionally, the multiple target elements, the multiple target actions, and the multiple target action models may correspond to one another, and the terminal may input the target action corresponding to each target element into the corresponding target action model, so that the target element executes the target action according to the output of the target action model, thereby displaying a web page corresponding to a target activity scene, where the web page may be a dynamic page. Optionally, the terminal may display, through a plurality of threads, a webpage corresponding to the target activity scene according to the plurality of target elements, the plurality of target actions, and the plurality of target action models at the same time, the plurality of threads may correspond to the plurality of target elements one to one, and each thread inputs a target action corresponding to a corresponding target element into a corresponding target action model, so that the target element executes the target action according to an output of the target action model. Optionally, the terminal may play the second voice in the process of displaying the webpage corresponding to the target activity scene, so that the terminal may provide a visual and auditory webpage display atmosphere to the user at the same time. It is easy to understand that the web page corresponding to the target activity scene may further include a static element (for example, a picture, a text, and the like), and the terminal may display the static element according to the resource file of the web page corresponding to the target activity scene, which is not limited in this embodiment of the application.

For example, the terminal may play a second voice "start lottery" in the process of displaying a web page corresponding to the lottery activity scene of the mobile phone a (i.e., the lottery page of the mobile phone a); for another example, the terminal may play a second voice "welcome to purchase the commodity X" in the process of displaying a webpage page corresponding to the purchase scene of the commodity X (i.e., the purchase page of the commodity X); for another example, the terminal may play a second voice "welcome to yy vote" in the process of showing a web page corresponding to the xx voting event scene (i.e., the xx voting event page); for another example, the terminal may play start music of the game C in the process of displaying a webpage page corresponding to the game scene of the game C (i.e., the game page of the game C).

It should be noted that the webpage display method provided in the embodiment of the present application is only exemplary, and when the terminal displays the webpage corresponding to the target activity scene, besides playing the second voice, vibration (for example, vibration of the webpage, or vibration of the body of the terminal when the terminal is a mobile terminal such as a mobile phone) may be performed according to the content of the target activity scene, so as to provide a webpage display atmosphere in the visual, auditory and tactile senses to the user, improve the participation degree of activity experience, and improve the interactivity between the voice and the operation activity.

To sum up, in the webpage display method provided in the embodiment of the present application, after the terminal sends the first voice to the server, the server determines the target activity scene matched with the first voice from the activity scene library, where the target activity scene includes a plurality of target elements and a plurality of target actions corresponding to the plurality of target elements, and for each target action in the plurality of target actions, the server determines a target action model matched with the target action from the action model library to obtain a plurality of target action models, and sends the plurality of target elements, the plurality of target actions and the plurality of target action models to the terminal, and the terminal displays the webpage corresponding to the target activity scene according to the plurality of target elements, the plurality of target actions and the plurality of target action models. Because the terminal shows the webpage based on the voice, compared with a mode that the webpage is shown by triggering the terminal through a keyboard and/or a mouse, the flexibility of the webpage showing mode is improved, the interestingness of interaction between the user and the terminal can be reflected, and the probability of triggering the operation activity page is improved.

The sequence of the steps of the webpage displaying method provided in the embodiment of the present application can be appropriately adjusted, and the steps can be correspondingly increased or decreased according to the situation, and any method that can be easily conceived by a person skilled in the art within the technical scope disclosed in the present application shall be covered within the protection scope of the present application, and therefore, no further description is given.

Referring to fig. 7, which shows a schematic diagram of a web page presentation method provided in an embodiment of the present application, as shown in fig. 7, when the web page presentation method is executed, a browser of a terminal first presents a web page, a user can speak according to content in the web page, the terminal opens a microphone to collect a first voice (i.e., the voice of the user), then sends the first voice to a server, the server analyzes the first voice to obtain a key sentence of the first voice, analyzes an activity scene according to the key sentence of the first voice, determines a target activity scene (the target activity scene includes a plurality of target elements and a plurality of target actions corresponding to the target elements) matching the first voice, then matches the action models, determines a target action model matching each target action, and sends the plurality of target elements, the plurality of target actions and the plurality of target action models to the terminal, the terminal triggers the target action models to execute the target actions through the threads according to the target elements, and therefore the webpage corresponding to the target scene is displayed. As shown in fig. 7, after the server analyzes the first voice to obtain a key sentence of the first voice, a second voice matched with the first voice may also be determined according to the key sentence, the second voice is sent to the terminal after being matched with the target process scene, and the terminal may trigger the voice model to play the second voice in the process of displaying the webpage corresponding to the target scene.

The following are embodiments of the apparatus of the present application that may be used to implement embodiments of the apparatus of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the apparatus of the present application.

Referring to fig. 8, which shows a block diagram of a web page presentation apparatus 800 according to an embodiment of the present application, the web page presentation apparatus 800 may be a program component in a server, and referring to fig. 8, the web page presentation apparatus 800 may include, but is not limited to:

a receiving module 810, configured to receive a first voice sent by a terminal;

a first determining module 820, configured to determine a target activity scene matching the first voice from an activity scene library, where the target activity scene includes a plurality of target elements and a plurality of target actions corresponding to the plurality of target elements;

a second determining module 830, configured to determine, for each target action in the multiple target actions, a target action model matched with the target action from the action model library, so as to obtain multiple target action models;

a first sending module 840, configured to send the target elements, the target actions, and the target action models to a terminal, so that the terminal displays a webpage corresponding to the target activity scene according to the target elements, the target actions, and the target action models.

Optionally, referring to fig. 9, which shows a block diagram of another apparatus 800 for displaying a webpage according to an embodiment of the present application, referring to fig. 9, on the basis of fig. 8, the apparatus 800 further includes:

a third determining module 850, configured to determine a second voice matching the first voice from the active voice library;

a matching module 860 for matching the second speech with the target activity scenario;

the second sending module 870 is configured to send the second voice to the terminal, so that the terminal plays the second voice in the process of displaying the webpage corresponding to the target activity scene.

Optionally, referring to fig. 10, which shows a block diagram of a further webpage displaying apparatus 800 provided in an embodiment of the present application, referring to fig. 10, on the basis of fig. 9, the apparatus 800 further includes:

an analysis module 880, configured to analyze the first voice to obtain a key sentence of the first voice;

a first determining module 820 for:

determining an activity scene matched with the key sentence from an activity scene library;

determining the activity scene matched with the key sentence as a target activity scene matched with the first voice;

a third determining module 850 for:

determining active voice matched with the key sentence from an active voice library;

the active speech that matches the key sentence is determined to be a second speech that matches the first speech.

Optionally, an analyzing module 880 for:

splitting a first voice into a plurality of voice fragments;

recognizing the voice fragments to obtain a plurality of character fragments;

and processing the plurality of character fragments to obtain the key sentence.

Optionally, an analyzing module 880 for:

determining at least one target text segment belonging to a target category from the plurality of text segments, wherein the target category is the category of the target activity scene;

and processing the at least one target text segment into the key sentence.

To sum up, according to the webpage display apparatus provided in the embodiment of the present application, after receiving a first voice sent by a terminal, a server determines a target activity scene matched with the first voice from an activity scene library, where the target activity scene includes a plurality of target elements and a plurality of target actions corresponding to the plurality of target elements, and for each target action in the plurality of target actions, the server determines a target action model matched with the target action from an action model library to obtain a plurality of target action models, and sends the plurality of target elements, the plurality of target actions and the plurality of target action models to the terminal, and the terminal displays a webpage corresponding to the target activity scene according to the plurality of target elements, the plurality of target actions and the plurality of target action models. Because the terminal displays the webpage based on the voice, compared with a mode that the webpage is displayed by triggering the terminal through a keyboard and/or a mouse, the flexibility of the webpage display mode is improved.

Referring to fig. 11, which shows a block diagram of a web page presentation apparatus 1100 according to an embodiment of the present application, the web page presentation apparatus 1100 may be a program component in a terminal, and referring to fig. 11, the web page presentation apparatus 1100 may include, but is not limited to:

a sending module 1110, configured to send a first voice to a server;

a first receiving module 1120, configured to receive a plurality of target elements, a plurality of target actions and a plurality of target action models, which are sent by a server, where the plurality of target elements and the plurality of target actions belong to a target activity scene matched with a first voice, and the plurality of target action models are matched with the plurality of target actions;

a displaying module 1130, configured to display a web page corresponding to the target activity scene according to the target elements, the target actions, and the target action models.

Optionally, referring to fig. 12, which shows a block diagram of another web page display apparatus 1100 provided in the embodiment of the present application, referring to fig. 12, the apparatus 1100 further includes:

a second receiving module 1140, configured to receive a second voice sent by the server, where the second voice is matched with the first voice, and the second voice is matched with the target activity scene;

the playing module 1150 is configured to play the second voice in the process of displaying the webpage corresponding to the target activity scene.

To sum up, according to the webpage display device provided by the embodiment of the present application, after the terminal sends the first voice to the server, the terminal receives the multiple target elements, the multiple target actions and the multiple target action models sent by the server, the multiple target elements and the multiple target actions belong to a target activity scene matched with the first voice, the multiple target action models are matched with the multiple target actions, and the webpage corresponding to the target activity scene is displayed according to the multiple target elements, the multiple target actions and the multiple target action models. Because the terminal displays the webpage based on the voice, compared with a mode that the webpage is displayed by triggering the terminal through a keyboard and/or a mouse, the flexibility of the webpage display mode is improved.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Please refer to fig. 13, which illustrates a schematic structural diagram of a web page display apparatus 1300 according to an embodiment of the present application. The apparatus 1300 may be, for example: a terminal such as a smart phone, a tablet computer, a notebook computer or a desktop computer. The apparatus 1300 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.

In general, the apparatus 1300 includes: a processor 1301 and a memory 1302.

Processor 1301 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 1301 may be implemented in at least one hardware form of Digital Signal Processing (DSP), Field Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 1301 may also include a main processor and a coprocessor, where the main processor is a processor, also called a CPU, for processing data in an awake state; a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1301 may be integrated with an image processor (GPU), and the GPU is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, processor 1301 may further include an Artificial Intelligence (AI) processor for processing computing operations related to machine learning.

Memory 1302 may include one or more computer-readable storage media, which may be non-transitory. The memory 1302 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1302 is used to store at least one instruction for execution by processor 1301 to implement the methods provided by embodiments of the present application.

In some embodiments, the apparatus 1300 may further include: a peripheral interface 1303 and at least one peripheral. Processor 1301, memory 1302, and peripheral interface 1303 may be connected by a bus or signal line. Each peripheral device may be connected to the peripheral device interface 1303 via a bus, signal line, or circuit board. The peripheral device may include: at least one of radio frequency circuitry 1304, display screen 1305, camera assembly 1306, audio circuitry 1307, positioning assembly 1308, or power supply 1309.

Peripheral interface 1303 may be used to connect at least one peripheral associated with I/O to processor 1301 and memory 1302. In some embodiments, processor 1301, memory 1302, and peripheral interface 1303 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1301, the memory 1302, and the peripheral device interface 1303 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 1304 is used for receiving and transmitting Radio Frequency (RF) signals, also called electromagnetic signals. The radio frequency circuitry 1304 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1304 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1304 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 1304 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi networks. In some embodiments, RF circuit 1304 may also include circuitry related to Near Field Communication (NFC), although this is not a limitation of the present application.

The display 1305 is used to display a User Interface (UI). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1305 is a touch display screen, the display screen 1305 also has the ability to capture touch signals on or over the surface of the display screen 1305. The touch signal may be input to the processor 1301 as a control signal for processing. At this point, the display 1305 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 1305 may be one, disposed on the front panel of the apparatus 1300; in other embodiments, the display 1305 may be at least two, respectively disposed on different surfaces of the device 1300 or in a folded design; in still other embodiments, the display 1305 may be a flexible display disposed on a curved surface or on a folded surface of the device 1300. Even further, the display 1305 may be arranged in a non-rectangular irregular figure, i.e., a shaped screen. The Display 1305 may be made of Liquid Crystal Display (LCD), Organic Light-Emitting Diode (OLED), or the like.

The camera assembly 1306 is used to capture images or video. Optionally, camera assembly 1306 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each of the rear cameras is any one of a main camera, a depth-of-field camera, a wide-angle camera or a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting function and a Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 1306 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuit 1307 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, inputting the electric signals into the processor 1301 for processing, or inputting the electric signals into the radio frequency circuit 1304 for realizing voice communication, and the microphone can be communicated with the processor 1301 and the radio frequency circuit 1304 through an audio API. The microphones may be multiple and placed at different locations of the apparatus 1300 for stereo sound capture or noise reduction purposes. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1301 or the radio frequency circuitry 1304 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 1307 may also include a headphone jack.

The positioning component 1308 is used for positioning the current geographic Location of the device 1300 for navigation or Location Based Service (LBS). The Positioning component 1308 can be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.

A power supply 1309 is used to power the various components in the device 1300. The power source 1309 may be alternating current, direct current, disposable or rechargeable. When the power source 1309 comprises a rechargeable battery, the rechargeable battery may support wired charging or wireless charging. Rechargeable batteries may also be used to support fast charge technologies.

In some embodiments, the apparatus 1300 further includes one or more sensors 1310. The one or more sensors 1310 include, but are not limited to: acceleration sensor 1311, gyro sensor 1312, pressure sensor 1313, fingerprint sensor 1314, optical sensor 1315, and proximity sensor 1316.

The acceleration sensor 1311 can detect the magnitude of acceleration in three coordinate axes of the coordinate system established with the apparatus 1300. For example, the acceleration sensor 1311 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 1301 may control the display screen 1305 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1311. The acceleration sensor 1311 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 1312 may detect a body direction and a rotation angle of the apparatus 1300, and the gyro sensor 1312 may cooperate with the acceleration sensor 1311 to collect a 3D motion of the user with respect to the apparatus 1300. Processor 1301, based on the data collected by gyroscope sensor 1312, may perform the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

The pressure sensors 1313 may be disposed on the side bezel of the device 1300 and/or underneath the display 1305. When the pressure sensor 1313 is disposed on the side frame of the apparatus 1300, a user's holding signal of the apparatus 1300 may be detected, and the processor 1301 performs left-right hand recognition or shortcut operation according to the holding signal acquired by the pressure sensor 1313. When the pressure sensor 1313 is disposed at a lower layer of the display screen 1305, the processor 1301 controls an operability control on the UI interface according to a pressure operation of the user on the display screen 1305. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 1314 is used for collecting the fingerprint of the user, and the processor 1301 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 1314, or the fingerprint sensor 1314 identifies the identity of the user according to the collected fingerprint. When the identity of the user is identified as a trusted identity, the processor 1301 authorizes the user to perform relevant sensitive operations, including unlocking a screen, viewing encrypted information, downloading software, paying, changing settings, and the like. The fingerprint sensor 1314 may be disposed on the front, back, or side of the device 1300. When a physical button or vendor Logo is provided on the device 1300, the fingerprint sensor 1314 may be integrated with the physical button or vendor Logo.

The optical sensor 1315 is used to collect the ambient light intensity. In one embodiment, the processor 1301 may control the display brightness of the display screen 1305 according to the ambient light intensity collected by the optical sensor 1315. Specifically, when the ambient light intensity is high, the display brightness of the display screen 1305 is increased; when the ambient light intensity is low, the display brightness of the display screen 1305 is reduced. In another embodiment, the processor 1301 can also dynamically adjust the shooting parameters of the camera assembly 1306 according to the ambient light intensity collected by the optical sensor 1315.

A proximity sensor 1316, also known as a distance sensor, is typically disposed on a front panel of the apparatus 1300. The proximity sensor 1316 is used to capture the distance between the user and the front of the device 1300. In one embodiment, the processor 1301 controls the display 1305 to switch from the bright screen state to the dark screen state when the proximity sensor 1316 detects that the distance between the user and the front surface of the apparatus 1300 gradually decreases; the display 1305 is controlled by the processor 1301 to switch from the breath-screen state to the light-screen state when the proximity sensor 1316 detects that the distance between the user and the front surface of the apparatus 1300 is gradually increasing.

Those skilled in the art will appreciate that the configuration shown in fig. 13 is not intended to be limiting of the apparatus 1300, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

Referring to fig. 14, a schematic structural diagram of a web page display apparatus 1400 provided in an embodiment of the present application is shown, where the apparatus 1400 may be a server. Illustratively, as shown in FIG. 14, the apparatus 1400 includes a Central Processing Unit (CPU) 1401, a system Memory 1404 including a Random-Access Memory (RAM) 1402 and a Read-Only Memory (ROM) 1403, and a system bus 1405 connecting the system Memory 1404 and the CPU 1401. The apparatus 1400 also includes a basic Input/Output (I/O) system 1406, which facilitates transfer of information between devices within the computer, and a mass storage device 1407 for storing an operating system 1413, application programs 1414, and other program modules 1415.

The basic input/output system 1406 includes a display 1408 for displaying information and an input device 1409, such as a mouse, keyboard, etc., for user input of information. Wherein a display 1408 and an input device 1409 are both connected to the central processing unit 1401 via an input-output controller 1410 connected to the system bus 1405. The basic input/output system 1406 may also include an input/output controller 1410 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, an input/output controller 1410 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 1407 is connected to the central processing unit 1401 through a mass storage controller (not shown) connected to the system bus 1405. The mass storage device 1407 and its associated computer-readable media provide non-volatile storage for the apparatus 1400. That is, the mass storage device 1407 may include a computer readable medium (not shown) such as a hard disk or CD-ROM drive.

Without loss of generality, computer-readable storage media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media include RAM, ROM, Erasable Programmable Read Only Memory (EPROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash Memory or other solid state Memory technologies, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, or other magnetic storage devices. Of course, those skilled in the art will appreciate that computer storage media is not limited to the foregoing. The system memory 1404 and mass storage device 1407 described above may collectively be referred to as memory.

According to various embodiments of the present application, the apparatus 1400 may also operate as a remote computer connected to a network via a network, such as the Internet. That is, the apparatus 1400 may be connected to the network 1412 through the network interface unit 1411 coupled to the system bus 1405, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 1411.

The memory further includes one or more programs, and the one or more programs are stored in the memory and configured to be executed by the CPU to implement the method provided by the embodiment of the present application.

An embodiment of the present application provides a web page presentation system, which may include a terminal and a server, in a possible implementation manner, the terminal may include the apparatus 1100 shown in fig. 11 or 12, and the server may include the apparatus 800 shown in any one of fig. 8 to 10; in another possible implementation manner, the terminal may be the apparatus 1300 shown in fig. 13, and the server may be the apparatus 1400 shown in fig. 14.

Also provided in embodiments of the present application is a computer-readable storage medium having at least one instruction, at least one program, code set, or instruction set stored therein, where the at least one instruction, the at least one program, code set, or instruction set is loaded by a processor and executed to implement the methods shown in fig. 4 to 6.

In this application, the terms "first," "second," "third," "fourth," "fifth," "sixth," "seventh," and "eighth," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The term "plurality" means two or more unless expressly limited otherwise. The term "and/or" is merely an associative relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. The term "at least one of a or B" is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, at least one of a or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. Similarly, "A, B or at least one of C" means that there may be seven relationships that may represent: seven cases of A alone, B alone, C alone, A and B together, A and C together, C and B together, and A, B and C together exist.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A webpage display method is characterized by comprising the following steps:

receiving a first voice sent by a terminal;

2. The method of claim 1, wherein after receiving the first voice transmitted by the terminal, the method further comprises:

determining a second voice matching the first voice from an active voice library;

matching the second voice with the target activity scenario;

and sending the second voice to the terminal so that the second voice is played in the process that the terminal displays the webpage corresponding to the target activity scene.

3. The method of claim 2,

after receiving the first voice sent by the terminal, the method further comprises:

analyzing the first voice to obtain a key sentence of the first voice;

the determining a target activity scenario matching the first voice from an activity scenario library comprises:

determining an activity scene matched with the key sentence from the activity scene library;

the determining a second voice matching the first voice from an active voice library comprises:

determining active voice matched with the key sentence from the active voice library;

and determining the active voice matched with the key sentence as a second voice matched with the first voice.

4. The method of claim 3, wherein analyzing the first speech to obtain the key sentence of the first speech comprises:

splitting the first voice into a plurality of voice segments;

recognizing the voice fragments to obtain a plurality of character fragments;

and processing the character fragments to obtain the key sentence.

5. The method of claim 4, wherein the processing the plurality of text segments to obtain the key sentence comprises:

determining at least one target text segment belonging to a target category from the plurality of text segments, wherein the target category is a category of the target activity scene;

and processing the at least one target text segment into the key sentence.

6. A webpage display method is characterized by comprising the following steps:

sending a first voice to a server;

7. The method of claim 6, wherein after sending the first voice to the server, the method further comprises:

receiving second voice sent by the server, wherein the second voice is matched with the first voice, and the second voice is matched with the target activity scene;

and playing the second voice in the process of displaying the webpage corresponding to the target activity scene.

8. A web page presentation apparatus comprising modules for performing the method of any one of claims 1 to 7.

9. An apparatus for web page presentation, the apparatus comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the method of any one of claims 1 to 7.

10. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method of any of claims 1 to 7.