CN110597395B

CN110597395B - Object interaction control method and device, storage medium and electronic device

Info

Publication number: CN110597395B
Application number: CN201910889018.4A
Authority: CN
Inventors: 俞一鹏; 唐海玉; 孙子荀; 朱城伟
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-09-19
Filing date: 2019-09-19
Publication date: 2021-02-12
Anticipated expiration: 2039-09-19
Also published as: CN110597395A

Abstract

The invention discloses an object interaction control method and device, a storage medium and an electronic device. Wherein, the method comprises the following steps: running a local human-computer interaction task in a human-computer interaction application client, wherein a virtual role controlled by the human-computer interaction application client interacts with at least one virtual object participating in the human-computer interaction task; acquiring an audio clip acquired by terminal equipment where a human-computer interaction application client is located; identifying a target virtual object from the audio clip, wherein the target virtual object is included in at least one virtual object; marking and displaying the target virtual object; and controlling the virtual character to perform interactive action with the target virtual object. The invention solves the technical problem of poor interaction control accuracy caused by the fact that a user manually operates and determines the interaction object.

Description

Object interaction control method and device, storage medium and electronic device

Technical Field

The invention relates to the field of computers, in particular to an object interaction control method and device, a storage medium and an electronic device.

Background

In a terminal application in which multiple users participate, a user is often required to select a specific interactive object from multiple virtual objects through manual operation, for example, manually clicking an icon corresponding to the specific interactive object, manually adjusting the direction of an operation dial to select the specific interactive object, and the like. And then, the interaction operation is completed with the interaction object through the virtual role currently controlled by the application client.

That is, in the object interaction control process provided in the related art, a user manual operation is generally required to determine an interaction object. However, when the interactive object is selected from a plurality of virtual objects, the interactive object desired by the user often cannot be ensured to be accurately selected due to the influence of the recognition sensitivity of manual operation, thereby causing the problem of poor accuracy of object interaction control.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides an object interaction control method and device, a storage medium and an electronic device, and at least solves the technical problem of poor interaction control accuracy caused by the fact that a user manually determines an interaction object.

According to an aspect of an embodiment of the present invention, there is provided an object interaction control method, including: running a local human-computer interaction task in a human-computer interaction application client, wherein a virtual role controlled by the human-computer interaction application client interacts with at least one virtual object participating in the human-computer interaction task; acquiring an audio clip acquired by terminal equipment where the human-computer interaction application client is located; identifying a target virtual object from the audio clip, wherein the target virtual object is included in the at least one virtual object; marking and displaying the target virtual object; and controlling the virtual character to perform interactive action with the target virtual object.

According to another aspect of the embodiments of the present invention, there is also provided an object interaction control apparatus, including: the system comprises an operation unit, a task execution unit and a task execution unit, wherein the operation unit is used for operating a local human-computer interaction task in a human-computer interaction application client, and a virtual role controlled by the human-computer interaction application client interacts with at least one virtual object participating in the human-computer interaction task; the first acquisition unit is used for acquiring the audio clip acquired by the terminal equipment where the human-computer interaction application client is located; an identifying unit, configured to identify a target virtual object from the audio clip, where the target virtual object is included in the at least one virtual object; a marking unit, configured to mark and display the target virtual object; and the control unit is used for controlling the virtual character to perform interactive action with the target virtual object.

As an alternative embodiment, the marking unit includes at least one of: the first display module is used for highlighting and marking the object icon matched with the target virtual object; the second display module is used for highlighting the target virtual object; and the third display module is used for displaying the highlighted mark in the display area where the target virtual object is located.

As an alternative embodiment, the method further comprises: a fourth obtaining module, configured to obtain a first sample audio clip before extracting the audio features in the audio clip, where the first sample audio clip is a different audio clip collected in different scenes; the first training module is used for training the initial speech recognition classification model by using the first sample audio frequency segment to obtain a candidate speech recognition classification model; a fifth obtaining module, configured to obtain a second sample audio clip associated with a target user using the human-computer interaction application client; and the second training module is used for training the candidate recognition and classification model by using the second sample audio clip to obtain the voice recognition and classification model.

As an alternative embodiment, the control unit includes: an adjusting module, configured to adjust an operating state of the target virtual object displayed by the mark to a locked state, where the target virtual object in the locked state is a virtual object to be interacted with by the virtual character; and the control module is used for controlling the virtual role to directly perform interactive action with the target virtual object in the locked state.

According to still another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to perform the above object interaction control method when running.

According to another aspect of the embodiments of the present invention, there is also provided an electronic apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor performs the object interaction control method through the computer program.

In the embodiment of the invention, in the process of running a game task in the human-computer interaction application client, an audio clip acquired by the terminal equipment where the human-computer interaction application client is located is obtained, and a target virtual object is identified from the audio clip. The virtual character controlled by the human-computer interaction application client interacts with at least one virtual object participating in the human-computer interaction task, wherein the at least one virtual object comprises an identified target virtual object. And then, marking and displaying the target virtual object in the human-computer interaction application client, and controlling the virtual character to directly perform interaction action with the target virtual object. That is to say, in the process of executing the human-computer interaction task, an audio clip acquired by the terminal device can be acquired, and a target virtual object to be interacted with by the virtual character controlled by the human-computer interaction application client can be accurately identified from the audio clip. The target virtual object is locked and marked and displayed, so that the virtual character can finish interactive action with the accurately identified target virtual object, and a user does not need to manually execute selection or steering operation through a touch screen or a key to determine the interactive object, so that the target virtual object to be interacted is accurately determined through audio control, and the accuracy of object interactive control is ensured. And the technical problem of poor interaction control accuracy caused by the fact that the user manually operates and determines the interaction object is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a schematic diagram of a network environment for an alternative method of object interaction control, according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating an alternative method for controlling object interaction according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a configuration interface in an alternative object interaction control method according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating effects of an alternative object interaction control method according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a training process in an alternative object interaction control method according to an embodiment of the present invention;

FIG. 6 is a flow chart illustrating an alternative method for controlling object interaction according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating an alternative method of object interaction control, in accordance with an embodiment of the present invention;

FIG. 8 is a schematic diagram of an alternative object interaction control method according to an embodiment of the invention;

FIG. 9 is a schematic diagram of yet another alternative object interaction control method in accordance with an embodiment of the present invention;

FIG. 10 is a diagram illustrating effects of an alternative object interaction control method according to an embodiment of the present invention;

FIG. 11 is a diagram illustrating effects of another alternative object interaction control method according to an embodiment of the present invention;

FIG. 12 is a flow chart illustrating an alternative method for controlling object interaction according to an embodiment of the present invention;

FIG. 13 is a diagram illustrating a training process in an alternative object interaction control method according to an embodiment of the present invention;

FIG. 14 is a schematic diagram of yet another alternative object interaction control method in accordance with an embodiment of the present invention;

FIG. 15 is a schematic structural diagram of an alternative object interaction control apparatus according to an embodiment of the present invention;

fig. 16 is a schematic structural diagram of an alternative electronic device according to an embodiment of the invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of the embodiments of the present invention, an object interaction control method is provided, optionally, as an optional implementation manner, the object interaction control method may be but is not limited to be applied to an object interaction control system in a network environment as shown in fig. 1, where the object interaction control system may include but is not limited to: user equipment 102, network 110, and server 112. The user equipment 102 has a client installed therein for a human-computer interaction application, and the user equipment 102 includes a human-computer interaction screen 104, a processor 106 and a memory 108. Further, the human-computer interaction screen 104 is configured to detect a human-computer interaction operation through a human-computer interaction interface corresponding to the client; the processor 106 is used for generating a corresponding operation instruction according to the human-computer interaction operation and responding to the operation instruction to control the virtual character to execute a corresponding action; the memory 108 is used for storing the operation command and the attribute information related to the virtual character. The server 112 includes functional modules for providing services for the human-computer interaction application, such as a database 114 for storing data and a processing engine 116 for executing data processing. Specifically, the process of implementing the object interaction control method by the object interaction control system may include the following steps:

in the process of running a local human-computer interaction task in the human-computer interaction application client, a human-computer interaction interface is presented in the human-computer interaction screen 104, and an audio clip acquired by the terminal device where the human-computer interaction application client is located is obtained as in steps S102-S104. Then, step S106 is executed to transmit the audio clip to the server 112 via the network 110. Further as shown in steps S108-S110, the server 112 will call the speech recognition classification model in the database 114 to recognize the audio segment through the processing engine 116, so as to recognize the target virtual object. And sends the object identification of the target virtual object to the user device 102 via the network 110.

Then, the user equipment 102 executes step S112, and displays a mark on the target virtual object in the human-computer interaction screen 104, and controls the virtual character controlled by the human-computer interaction application client to perform an interaction with the target virtual object. For example, assume that a human-machine-interaction application client in user device 102 is running a local human-machine-interaction task, as shown in FIG. 1, where the human-machine-interaction application client controls avatar A. And then, identifying a target virtual object B participating in the human-computer interaction task from an audio clip acquired by the user equipment 102 where the human-computer interaction application client is located, and locking the target virtual object B so as to enable the virtual character to directly perform interaction action with the target virtual object.

It should be noted that, in this embodiment, in the process of running a game task in the human-computer interaction application client, an audio clip acquired by the terminal device where the human-computer interaction application client is located is obtained, and a target virtual object is identified from the audio clip. The virtual character controlled by the human-computer interaction application client interacts with at least one virtual object participating in the human-computer interaction task, wherein the at least one virtual object comprises an identified target virtual object. And then, marking and displaying the target virtual object in the human-computer interaction application client, and controlling the virtual character to directly perform interaction action with the target virtual object. That is to say, in the process of executing the human-computer interaction task, an audio clip acquired by the terminal device can be acquired, and a target virtual object to be interacted with by the virtual character controlled by the human-computer interaction application client can be accurately identified from the audio clip. The target virtual object is locked and marked and displayed, so that the virtual character can finish interactive action with the accurately identified target virtual object, and a user does not need to manually execute selection or steering operation through a touch screen or a key to determine the interactive object, so that the target virtual object to be interacted is accurately determined through audio control, and the accuracy of object interactive control is ensured. Furthermore, under the condition that the target virtual object to be interacted is accurately determined through audio control, a user does not need to repeatedly execute selection operation, the time for determining the target virtual object is shortened, and the efficiency of object interaction control is improved.

Optionally, in this embodiment, the user equipment may be, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a PC, and other computer equipment that supports running an application client. The server and the user equipment may implement data interaction through a network, which may include but is not limited to a wireless network or a wired network. Wherein, this wireless network includes: bluetooth, WIFI, and other networks that enable wireless communication. Such wired networks may include, but are not limited to: wide area networks, metropolitan area networks, and local area networks. The above is merely an example, and this is not limited in this embodiment.

Optionally, as an optional implementation manner, as shown in fig. 2, the object interaction control method includes:

s202, running a local human-computer interaction task in the human-computer interaction application client, wherein a virtual role controlled by the human-computer interaction application client interacts with at least one virtual object participating in the human-computer interaction task;

s204, acquiring an audio clip acquired by a terminal device where the human-computer interaction application client is located;

s206, identifying a target virtual object from the audio clip, wherein at least one virtual object comprises the target virtual object;

s208, marking and displaying the target virtual object;

and S210, controlling the virtual character to perform interactive action with the target virtual object.

Optionally, in this embodiment, the object interaction control method may be, but is not limited to, applied to a human-computer interaction application that needs to control an interaction process between virtual objects to complete a human-computer interaction task, where the human-computer interaction application may include, but is not limited to: the system comprises virtual simulation interaction applications such as game applications and shopping applications, wherein the human-computer interaction tasks are a series of level cards preset for achieving a set purpose in the human-computer interaction applications. Further, here, gaming applications may include, but are not limited to: a Multiplayer Online Battle Arena (MOBA) or a Single-Player Game (SPG). Further, the gaming application may include, but is not limited to, at least one of: two-dimensional (2D) game applications, Three-dimensional (3D) game applications, Virtual Reality (VR) game applications, Augmented Reality (AR) game applications, Mixed Reality (MR) game applications. Correspondingly, the human-computer interaction task may be, but is not limited to, a game task set in a game application. The virtual object may be, but not limited to, an object that allows interaction in a human-computer interaction task, for example, a virtual Character (also referred to as a player Character) controlled by another client in a game task, a Non-player Character (NPC) set in the game task, a building attacked in the game task, and the like. The above are merely examples, and the present embodiment is not limited thereto.

For example, in the case of an MOBA game, a game task is set to allow one party to win a victory by competing against a team of two parties to the friend or foe. And assuming that the virtual character A controlled by the human-computer interaction application client belongs to my party, and the target virtual object identified from the audio clip is an enemy virtual character B. The virtual character B can be marked and displayed in a human-computer interaction interface corresponding to a human-computer interaction application client, and the virtual character B is locked in a background, so that the virtual character A can directly and accurately act on the virtual character B when the attack skill is triggered and released, and the interaction action between the virtual character A and the virtual character B is completed. Therefore, the accuracy of object interaction control is ensured, and the virtual object to be interacted is not required to be selected through repeated operation.

It should be noted that, in this embodiment, the object interaction control method may be independently completed in a user equipment installed with a human-computer interaction application client (not shown in the figure), or may be jointly completed through interaction between the user equipment installed with the human-computer interaction application client and a remote server (also referred to as a cloud) (as shown in fig. 1).

Optionally, in this embodiment, before running a local human-computer interaction task in the human-computer interaction application client, an object locking mode may be configured in the human-computer interaction application client, where the object locking mode is used to indicate a mode in which a virtual character controlled by the human-computer interaction application client locks a virtual object to be interacted during execution of the human-computer interaction task. For example, the object locking mode may include, but is not limited to, at least one of: a voice lock mode, a click lock mode, and a pointing lock mode. For example, as shown in fig. 3, if a "sound locking mode" is selected in the configuration interface, the target virtual object to be interacted is locked according to the collected audio clip during the process of running the human-computer interaction task.

Optionally, in this embodiment, after the audio clip acquired by the terminal device is acquired, but not limited to, a speech recognition classification model is called to recognize an object identifier of the target virtual object indicated in the audio clip. The identification process can be directly completed in the terminal equipment, and the audio clip can also be sent to a server in the cloud so as to complete the identification process in the server. In other words, the speech recognition classification model may be applied to a terminal device or a server. The above is merely an example, and this is not limited in this embodiment.

Furthermore, in the present embodiment, after the target virtual object is identified, the display position corresponding to the target virtual object may be, but is not limited to, determined. When the target virtual object is displayed in the human-computer interaction interface currently presented by the terminal device, the highlight mark display may be performed on the display area where the target virtual object is located, but not limited to. And in the case that the target virtual object is not displayed in the human-computer interaction interface currently presented by the terminal device, the target virtual object may be, but not limited to, highlighted in a map matched with the human-computer interaction task. The policy for highlighting the target virtual object may include, but is not limited to, at least one of the following: highlighting an object icon of the target virtual object; highlighting the target virtual object; and highlighting the display area where the target virtual object is located. Further, the highlighting mark display mode may include, but is not limited to: highlighting, framing a mark, zooming in, etc., to highlight the target virtual object itself. For example, as shown in fig. 4, in the case where the target virtual object identified by the audio clip is the virtual character B, the object icon thereof may be subjected to a box selection mark (as shown by the dashed box in fig. 4), and the target virtual object may be displayed with a highlight mark.

Optionally, in this embodiment, the speech recognition classification model used for recognizing the target virtual object from the audio segment may be, but is not limited to, a classification model obtained after training with a plurality of sample audio segments and used for recognizing a virtual object indicated in the audio segment.

For example, in the process of training the speech recognition classification model, a user using the human-computer interaction application client is prompted to input different audio segments in different scenes according to the prompts of the training interface. Fig. 5(a) is a schematic diagram of a training interface. The "attack enemy character B" shown in fig. 5(a) is used to indicate that the target by voice lock is "enemy character B". And under the condition that the user clicks a button to start recording, speaks 'attack enemy role B' or a similar sound sequence and clicks 'save', automatically recording the audio clip through the training module. Further, the training module may learn by referring to different audio clips input in different scenes collected in the above manner. Such as learning audio features in different audio segments to facilitate the recognition and classification process based on the audio features.

For example, the training effect at each training time can be, but is not limited to, the process as shown in fig. 5 (b): after the user clicks the button to start recording, the audio clip of the attack enemy role B is spoken according to the prompt information 'please say the target you want to lock'. The target virtual object may be identified from the audio segment by the speech recognition classification model. Assuming that the identified target virtual object is "role B," further learning may be performed through user feedback (e.g., clicking on the "correct" or "wrong" feedback button).

And in the output result of the trained speech recognition classification model, under the condition that the user feedback received for N times continuously is correct, determining that the speech recognition classification has finished training, and applying the speech recognition classification model to an actual scene. The above is merely an example, and this is not limited in this embodiment.

The following examples are specifically given:

firstly, obtaining a plurality of sample audio segments, training an initial speech recognition classification model to correspond the audio segments with target virtual objects, for example, in the audio segments, the method comprises the following steps: when expressions such as "attack enemy character B", "dry enemy character B", "enemy character B", and the like are expressed, the locking target is "enemy character B"; when expressions such as "hitting NPC", "cleaning NPC", and the like are included in the audio clip, it is indicated that the lock target is "NPC". And by analogy, the corresponding relation between different virtual objects and virtual roles in the human-computer interaction task is established.

Then, using the trained speech recognition classification model to execute the object interaction control method, as shown in fig. 6, steps S602-S608: after the audio clip input by the user and acquired by the terminal device is acquired, the audio clip is recognized and classified by using the voice recognition classification model, and an object identifier (such as a virtual character B) of the target virtual object is obtained. And then determining the interactive action by combining the information (such as verb expressions of 'hitting', 'attacking' and the like) or manual operation (such as operation of clicking to trigger skill release and the like) in the audio segment. The virtual character a and the virtual character B are controlled to perform the above-determined interactive action. In addition, while the target virtual object is determined to be the virtual character B, the determination result can be fed back to the user in real time, for example, the target virtual object is marked and displayed in a human-computer interaction interface of the terminal device where the human-computer interaction client is located.

It should be noted that, in this embodiment, there is no limitation on the execution order of marking and displaying the target virtual object at the front end and locking the target virtual object at the back end.

Further, in this embodiment, the execution process and the execution result of the object interaction control method may be, but are not limited to, applied to the data sharing system shown in fig. 7. The data sharing system 700 refers to a system for performing data sharing between nodes, the data sharing system may include a plurality of nodes 701, and the plurality of nodes 701 may refer to respective clients in the data sharing system. Each node 701 may receive input information during normal operation and maintain shared data within the data sharing system based on the received input information. In order to ensure information intercommunication in the data sharing system, information connection can exist between each node in the data sharing system, and information transmission can be carried out between the nodes through the information connection. For example, when an arbitrary node in the data sharing system receives input information, other nodes in the data sharing system acquire the input information according to a consensus algorithm, and store the input information as data in shared data, so that the data stored on all the nodes in the data sharing system are consistent.

Each node in the data sharing system has a node identifier corresponding thereto, and each node in the data sharing system may store a node identifier of another node in the data sharing system, so that the generated block is broadcast to the other node in the data sharing system according to the node identifier of the other node in the following. Each node may maintain a node identifier list as shown in the following table, and store the node name and the node identifier in the node identifier list correspondingly. The node identifier may be an Internet Protocol (IP) address and any other information that can be used to identify the node, and table 1 only illustrates the IP address as an example.

TABLE 1

Node name	Node identification
		Node
1	117.114.151.174
		Node 2	117.116.189.145
…	…
		Node N	119.123.789.258

Each node in the data sharing system stores one identical blockchain. The block chain is composed of a plurality of blocks, as shown in fig. 8, the block chain is composed of a plurality of blocks, the starting block includes a block header and a block main body, the block header stores an input information characteristic value, a version number, a timestamp and a difficulty value, and the block main body stores input information; the next block of the starting block takes the starting block as a parent block, the next block also comprises a block head and a block main body, the block head stores the input information characteristic value of the current block, the block head characteristic value of the parent block, the version number, the timestamp and the difficulty value, and the like, so that the block data stored in each block in the block chain is associated with the block data stored in the parent block, and the safety of the input information in the block is ensured.

When each block in the block chain is generated, referring to fig. 9, when a node where the block chain is located receives input information, the input information is verified, after the verification is completed, the input information is stored in the memory pool, and the hash tree for recording the input information is updated; and then, updating the updating time stamp to the time when the input information is received, trying different random numbers, and calculating the characteristic value for multiple times, so that the calculated characteristic value can meet the following formula:

SHA 256(SHA 256(version+prev_hash+merkle_root+ntime+nbits+x))＜TARGET

wherein, SHA256 is a characteristic value algorithm used for calculating a characteristic value; version is version information of the relevant block protocol in the block chain; prev _ hash is a block head characteristic value of a parent block of the current block; merkle _ root is a characteristic value of the input information; ntime is the update time of the update timestamp; nbits is the current difficulty, is a fixed value within a period of time, and is determined again after exceeding a fixed time period; x is a random number; TARGET is a feature threshold, which can be determined from nbits.

Therefore, when the random number meeting the formula is obtained through calculation, the information can be correspondingly stored, and the block head and the block main body are generated to obtain the current block. And then, the node where the block chain is located respectively sends the newly generated blocks to other nodes in the data sharing system where the newly generated blocks are located according to the node identifications of the other nodes in the data sharing system, the newly generated blocks are verified by the other nodes, and the newly generated blocks are added to the block chain stored in the newly generated blocks after the verification is completed.

By the embodiment provided by the application, in the process of executing the human-computer interaction task, the audio clip collected by the terminal equipment can be obtained, and the target virtual object to be interacted with by the virtual character controlled by the human-computer interaction application client side can be accurately identified from the audio clip. The target virtual object is locked and marked and displayed, so that the virtual character can finish interactive action with the accurately identified target virtual object, and a user does not need to manually execute selection or steering operation through a touch screen or a key to determine the interactive object, so that the target virtual object to be interacted is accurately determined through audio control, and the accuracy of object interactive control is ensured.

As an optional scheme, the displaying the mark of the target virtual object includes:

s1, determining the target position of the target virtual object;

s2, indicating the target virtual object at the target position to be displayed in the human-computer interaction interface currently presented by the terminal equipment, and acquiring the display area of the target virtual object in the human-computer interaction interface;

s3, highlighting the target virtual object in the display area.

It should be noted that, in order to distinguish and display a target virtual object determined to be locked from a plurality of virtual objects displayed on the human-computer interaction interface, when it is determined that a target position where the target virtual object is located in the human-computer interaction interface currently presented by the terminal device, a display area where the target virtual object is located is obtained, and the target virtual object in the display area is highlighted and displayed.

Optionally, in this embodiment, when it is determined that the target position where the target virtual object is located is not located in the human-computer interaction interface currently presented by the terminal device, a display position of the target virtual object in a map matched with the human-computer interaction task is obtained; and highlighting the target virtual object on the display position.

The following description is made with specific reference to the examples shown in fig. 4 and 10:

the man-machine interaction application is assumed to be an MOBA game application, and the man-machine interaction task is a game task aiming at fighting by both the enemy and the my and occupying the buildings of the other party. As shown in fig. 4, assume that the game application client is currently in control of virtual character a. After the terminal device collects the audio clip, the audio clip is identified, and the target virtual object to be interacted can be determined to be the virtual character B.

Further, if it is determined that the target position of the target virtual object is in the game interface currently presented by the terminal device, as shown in fig. 4, the virtual character B is located beside the virtual character C. The display area where the virtual character B is located can be obtained, and the virtual character B in the display area is highlighted and displayed, as shown in fig. 4, the object icon of the virtual character B is marked (as shown, the dashed box corresponding to the virtual character B is shown).

If it is determined that the target position of the target virtual object is not in the game interface currently presented by the terminal device, as shown in fig. 10, and the virtual character B has exceeded the display range of the game interface, the display position of the virtual character B is determined in the map corresponding to the game task, and the target virtual object at the display position is highlighted (as shown in the dashed box corresponding to the virtual character B).

According to the embodiment provided by the application, the locked target virtual object is marked and displayed in the terminal equipment where the human-computer interaction application client side is located, so that convenience is brought to a user to visually determine the target virtual object to be interacted in real time.

As an optional scheme, the displaying the mark of the target virtual object includes at least one of the following:

1) highlighting the object icon matched with the target virtual object;

2) highlighting the target virtual object;

3) and highlighting the display area where the target virtual object is located.

Specifically, as described with reference to fig. 11, assuming that the target virtual object is determined to be the virtual character B, the object icon of the virtual character B may be highlighted, and as shown in fig. 11, the object icon is highlighted by a dashed box; the virtual character B can be highlighted, and the virtual character B is highlighted in a bold mode as shown in FIG. 11; in addition, a highlight mark may be displayed in the display area where the virtual character B is located, and the display area may be highlighted as shown in fig. 11 (as shown, a hatched area).

It should be noted that the above means for highlighting may also include other means, such as using an aid line to provide a relevant text description. The above is merely an example, and this is not limited in this embodiment.

Through the embodiment provided by the application, the target virtual object identified from the audio clip is highlighted and displayed, so that the aim of feeding back the locked target virtual object to the user in real time is fulfilled, the user can visually see the locked target virtual object, and the selected target virtual object is flexibly switched according to the displayed content.

As an alternative, identifying the target virtual object from the audio clip includes:

s1, extracting audio features in the audio clips;

s2, inputting the audio features into a speech recognition classification model, wherein the speech recognition classification model is a classification model which is obtained by training a plurality of sample audio clips and is used for recognizing virtual objects indicated in the audio clips;

and S3, obtaining an output result of the speech recognition classification model, wherein the output result carries the object identification of the recognized target virtual object.

Optionally, in this embodiment, before extracting the audio features in the audio segment, the method may further include, but is not limited to, performing preprocessing on the audio segment, where the preprocessing process may include, but is not limited to: denoising, filtering, smoothing and the like.

In addition, in this embodiment, the audio features in the extracted audio segment may include, but are not limited to, at least one of the following: the method comprises the steps of sound signal feature extraction (such as Mel Frequency Cepstral Coefficients, MFCC for short), time domain feature extraction (such as Root Mean Square Error (RMSE) of waveforms), Frequency domain feature extraction (such as threshold of each Frequency band), window feature (such as Hamming distance of time window), deep learning network extracted feature and the like. The hamming distance is the number of bits encoded differently in the corresponding bits of the two legal codes in the information encoding, and is called the code distance. The above is merely an example, and this is not limited in this embodiment.

Optionally, in this embodiment, the initial model adopted by the speech recognition classification model may include, but is not limited to, at least one of the following: support Vector Machines (SVMs), deep learning Neural networks (e.g., Convolutional Neural Networks (CNNs) and Recursive Neural Networks (RNNs) and their variant combinations), decision trees, and the like.

Specifically, the following steps S1202 to S1210 shown in fig. 12 are described:

the voice of the user is collected through an audio collecting component (such as a microphone) in the terminal device, such as collecting the audio clip S. Preprocessing such as denoising, filtering and smoothing is performed on the audio segment S, and feature extraction is performed on the preprocessed audio segment S. Assuming that the audio features { T1, T2 … } are extracted, the audio features are input into a speech recognition classification model so as to recognize the object identification of the target virtual object carried in the audio piece S by using the audio features.

Through the embodiment provided by the application, the input audio clip is recognized through the trained voice recognition classification model, so that the voice recognition classification model can accurately recognize the object identification of the target virtual object carried in the audio clip by utilizing the specific audio characteristics, and the accuracy of sound recognition is improved.

As an optional scheme, before extracting the audio features in the audio segment, the method further includes:

s1, acquiring a first sample audio clip, wherein the first sample audio clip is different audio clips acquired under different scenes;

s2, training the initial speech recognition classification model by using the first sample audio frequency fragment to obtain a candidate speech recognition classification model;

s3, acquiring a second sample audio clip associated with a target user using the human-computer interaction application client;

and S4, training the candidate recognition classification model by using the second sample audio clip to obtain the speech recognition classification model.

It should be noted that the training process may include, but is not limited to, two stages, where the first stage is training in a general scene, where the first sample audio segment used in the training is a sound segment of many users in multiple usage scenes, and the trained general model is M1. The second stage is training (transfer learning) in a user scene, wherein the second sample audio segment used for training is a sound segment collected by a specific user currently using the human-computer interaction application client.

Further, the training process in the second stage may include, but is not limited to, one of:

1) as shown in fig. 13(a), feature extraction is performed by using a general model M1, and then a light-weight classifier is selected for further classification learning;

2) as shown in fig. 13(b), the model layers conv1 to fc1 can be subjected to parameter fixing and then training learning;

3) the generic model M1 is retrained end-to-end. The specific training method can be selected according to the use scene of the user (mobile device, Personal Computer (PC), or cloud).

Where conv shown in fig. 13 is used to represent a convolutional layer in the neural network model, softmax is used to represent a classification layer, and fc is used to represent a fully-connected layer.

It should be noted that the training process of the second stage may be performed in the terminal device used by the user, or may be performed in the cloud. This is not limited in this embodiment.

According to the embodiment provided by the application, the initial speech recognition classification model is trained by adopting the first sample audio segment to obtain a candidate speech recognition classification model (also called as a universal model) for recognition, and the candidate speech recognition classification model is further trained by adopting the second sample audio segment to obtain a speech recognition classification model adaptive to the current user. The classification model is trained through the two stages, so that the object identification of the target virtual object carried in the audio clip provided by the user can be more accurately identified by the voice recognition classification model provided in the embodiment, the accuracy of object interaction control is improved, repeated operation is avoided to determine the interactive object again, and the efficiency of object interaction control is improved.

As an optional scheme, after acquiring an audio clip acquired by a terminal device where a human-computer interaction application client is located, the method further includes:

s1, acquiring audio amplitude information of the sound signal in the audio clip;

s2, determining the slope of the amplitude of each sound signal according to the audio amplitude information;

s3, sequentially obtaining the variation of the slope of the amplitudes of two adjacent sound signals, wherein the two adjacent sound signals include: a sound signal in a first audio frame and a sound signal in a second audio frame located after the first audio frame;

s4, determining the position of the second audio frame as the initial recognition position of the audio clip under the condition that the variation is larger than the target threshold;

s5, in case of detecting the start recognition position, triggering the recognition of the audio piece.

The amplitude of the sound signal is used to indicate the amplitude of the sound signal, and is related to the volume. In addition, in the present embodiment, the slope is determined according to a comparison result of the amplitude and the time of the sound signal. The amount of change in the slope may be, but is not limited to, a difference between the slope of the sound signal of the second audio frame and the slope of the sound signal of the first audio frame. The above is merely an example, and this is not limited in this embodiment.

The description is made with specific reference to the example shown in fig. 14: assume that the duration of the acquired audio piece is 4 seconds. The abscissa shown in the figure is the collection time point of the sound signal in the audio piece in units of s, and the ordinate is the amplitude of the sound signal in units of v.

In order to ensure the accuracy of recognition, the initial recognition position may be determined according to the slope of the sound signal in the audio segment in the embodiment. As shown in fig. 14, although the sound amplitude is small, there is a significant change in the slope magnitude from slope 1 to slope 2, i.e., the amount of change is greater than the target threshold. The point in time at which the slope 2 is determined in this example is the starting identification position.

It should be noted that the time lengths of the audio clips are merely examples, and in this embodiment, the audio clips with different time lengths may be determined according to actual scene requirements.

Through the embodiment provided by the application, the recognition of the control sound is triggered through the change of the slope, and the response is faster and more accurate.

As an optional solution, the controlling the virtual character to interact with the target virtual object includes:

s1, adjusting the running state of the target virtual object displayed by the mark to be a locked state, wherein the target virtual object in the locked state is a virtual object to be interacted by the virtual character;

and S2, controlling the virtual character to directly interact with the target virtual object in the locked state.

It should be noted that, in this embodiment, the running state of the target virtual object displayed by the mark is adjusted to be in a locked state in the background, so that the virtual character can directly apply the skill to be started on the target virtual object. Therefore, the target virtual object to be interacted is prevented from being selected again through manual operation, the accuracy of determining the target virtual object is improved, and the accuracy and the efficiency of object interaction control are improved.

In addition, in this embodiment, the execution sequence between the adjustment of the running state of the target virtual object to the locked state and the highlighting of the target virtual object is not limited in this embodiment.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

According to another aspect of the embodiment of the present invention, there is also provided an object interaction control apparatus for implementing the object interaction control method. As shown in fig. 15, the apparatus includes:

1) an operation unit 1502, configured to operate a local human-computer interaction task in a human-computer interaction application client, where a virtual character controlled by the human-computer interaction application client interacts with at least one virtual object participating in the human-computer interaction task;

2) a first obtaining unit 1504, configured to obtain an audio clip collected by a terminal device where a client is located;

3) the recognition unit 1506 is configured to recognize a target virtual object from the audio clip, where at least one virtual object includes the target virtual object;

4) a marking unit 1508, configured to mark and display the target virtual object;

5) a control unit 1510, configured to control the virtual character to interact with the target virtual object.

For a specific embodiment, reference may be made to the example shown in the object interaction control method, which is not described herein again in this example.

As an alternative, the marking unit 1508 includes:

1) the first determining module is used for determining the target position of the target virtual object;

2) the first acquisition module is used for indicating a target virtual object at a target position to be displayed in a human-computer interaction interface currently presented by the terminal equipment and acquiring a display area of the target virtual object in the human-computer interaction interface;

3) and the first marking module is used for highlighting and marking the target virtual object in the display area.

As an optional scheme, the method further comprises the following steps:

1) the second acquisition module is used for acquiring the display position of the target virtual object in a map matched with the human-computer interaction task when the target position indicates that the target virtual object is not displayed in a human-computer interaction interface currently presented by the terminal equipment after the target position where the target virtual object is located is determined;

2) and the second marking module is used for highlighting and marking the target virtual object on the display position.

As an alternative, the marking unit 1508 includes at least one of:

1) the first display module is used for highlighting and marking the object icon matched with the target virtual object;

2) the second display module is used for highlighting the target virtual object;

3) and the third display module is used for displaying the highlighted mark in the display area where the target virtual object is located.

As an alternative, the recognition unit 1506 includes:

1) the extraction module is used for extracting audio features in the audio clips;

2) the input module is used for inputting the audio features into a speech recognition classification model, wherein the speech recognition classification model is a classification model which is obtained by training a plurality of sample audio segments and is used for recognizing virtual objects indicated in the audio segments;

3) and the third acquisition module is used for acquiring an output result of the voice recognition classification model, wherein the output result carries the object identification of the recognized target virtual object.

As an optional scheme, the method further comprises the following steps:

1) the fourth obtaining module is used for obtaining a first sample audio clip before extracting the audio features in the audio clip, wherein the first sample audio clip is different audio clips collected under different scenes;

2) the first training module is used for training the initial speech recognition classification model by utilizing the first sample audio frequency segment to obtain a candidate speech recognition classification model;

3) a fifth obtaining module, configured to obtain a second sample audio clip associated with a target user using a human-computer interaction application client;

4) and the second training module is used for training the candidate recognition classification model by utilizing the second sample audio clip to obtain the voice recognition classification model.

As an optional scheme, the method further comprises the following steps:

1) the second acquisition unit is used for acquiring audio amplitude information of a sound signal in an audio clip after acquiring the audio clip acquired by the terminal equipment where the human-computer interaction application client is located;

2) a first determining unit for determining a slope of the amplitude of each sound signal according to the audio amplitude information;

3) a third obtaining unit, configured to sequentially obtain a variation amount of a slope of amplitudes of two adjacent sound signals, where the two adjacent sound signals include: a sound signal in a first audio frame and a sound signal in a second audio frame located after the first audio frame;

4) the second determining unit is used for determining the position of the second audio frame as the initial identification position of the audio clip under the condition that the variation is determined to be larger than the target threshold;

5) and the triggering unit is used for triggering the audio clip to be identified under the condition that the initial identification position is detected.

As an alternative, the control unit 1510 includes:

1) the adjusting module is used for adjusting the running state of the target virtual object displayed by the mark into a locking state, wherein the target virtual object in the locking state is a virtual object to be interacted by the virtual character;

2) and the control module is used for controlling the virtual role to directly perform interactive action with the target virtual object in the locking state.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device for implementing the object interaction control method, as shown in fig. 16, the electronic device includes a memory 1602 and a processor 1604, the memory 1602 stores therein a computer program, and the processor 1604 is configured to perform the steps in any one of the method embodiments by the computer program.

Optionally, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of a computer network.

Optionally, in this embodiment, the processor may be configured to perform the following steps by a computer program:

s1, running a local human-computer interaction task in the human-computer interaction application client, wherein the virtual character controlled by the human-computer interaction application client interacts with at least one virtual object participating in the human-computer interaction task;

s2, acquiring an audio clip acquired by the terminal equipment where the human-computer interaction application client is located;

s3, identifying a target virtual object from the audio clip, wherein at least one virtual object comprises the target virtual object;

s4, marking and displaying the target virtual object;

and S5, controlling the virtual character to interact with the target virtual object.

Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 16 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 16 is a diagram illustrating a structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 16, or have a different configuration than shown in FIG. 16.

The memory 1602 may be used for storing software programs and modules, such as object interaction control in embodiments of the present invention. The processor 1604 executes software programs and modules stored in the memory 1602 to perform various functional applications and data processing, that is, to implement the object interaction control method. The memory 1602 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1602 can further include memory located remotely from the processor 1604, which can be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1602 may be used for storing information such as attribute information of an audio clip and a target virtual object, but is not limited thereto. As an example, as shown in fig. 16, the memory 1602 may include, but is not limited to, an execution unit 1502, a first obtaining unit 1504, a recognition unit 1506, a marking unit 1508, and a control unit 1510 in the object interaction control apparatus. In addition, other module units in the object interaction control apparatus may also be included, but are not limited to, and are not described in detail in this example.

Optionally, the transmission device 1606 is configured to receive or transmit data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 1606 includes a Network adapter (NIC) that can be connected to a router via a Network line to communicate with the internet or a local area Network. In one example, the transmission device 1606 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In addition, the electronic device further includes: a display 1608 for displaying a human-computer interaction interface in the human-computer interaction application client, which may include: the target virtual object and the virtual role controlled by the human-computer interaction application client side; and a connection bus 1610 for connecting respective module components in the above-described electronic apparatus.

According to a further aspect of an embodiment of the present invention, there is also provided a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for performing the steps of:

s4, marking and displaying the target virtual object;

Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be substantially or partially implemented in the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, and including instructions for causing one or more computer devices (which may be personal computers, servers, network devices, or the like) to perform all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted or not performed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. An object interaction control method is applied to a non-virtual reality interaction scene, and comprises the following steps:

running a local human-computer interaction task in a human-computer interaction application client, wherein a virtual role controlled by the human-computer interaction application client interacts with at least one virtual object participating in the human-computer interaction task;

acquiring an audio clip acquired by terminal equipment where the human-computer interaction application client is located;

identifying a target virtual object from the audio clip, wherein the target virtual object is included in the at least one virtual object;

performing mark display on the target virtual object;

when the object locking mode of the local human-computer interaction task is detected to be a sound locking mode, locking the target virtual object;

and controlling the virtual character to directly interact with the locked target virtual object according to the interaction action information between the virtual character and the target virtual object indicated in the acquired audio clip.

2. The method of claim 1, wherein said displaying the target virtual object as a mark comprises:

determining a target position of the target virtual object;

when the target position indicates that the target virtual object is displayed in a human-computer interaction interface currently presented by the terminal equipment, acquiring a display area of the target virtual object in the human-computer interaction interface;

highlighting the target virtual object in the display area.

3. The method of claim 2, after determining the target location of the target virtual object, further comprising:

when the target position indicates that the target virtual object is not displayed in the human-computer interaction interface currently presented by the terminal equipment, acquiring the display position of the target virtual object in a map matched with the human-computer interaction task;

and highlighting the target virtual object on the display position.

4. The method of claim 1, wherein said displaying the target virtual object as a mark comprises at least one of:

highlighting the object icon matched with the target virtual object;

highlighting the target virtual object;

and highlighting the display area where the target virtual object is located.

5. The method of claim 1, wherein identifying a target virtual object from the audio clip comprises:

extracting audio features in the audio segments;

inputting the audio features into a speech recognition classification model, wherein the speech recognition classification model is a classification model obtained by training a plurality of sample audio segments and used for recognizing virtual objects indicated in the audio segments;

and acquiring an output result of the voice recognition classification model, wherein the output result carries the identified object identifier of the target virtual object.

6. The method of claim 5, further comprising, prior to said extracting audio features in the audio segment:

acquiring a first sample audio clip, wherein the first sample audio clip is a different audio clip acquired under different scenes;

training an initial speech recognition classification model by using the first sample audio clip to obtain a candidate speech recognition classification model;

obtaining a second sample audio clip associated with a target user using the human-computer interaction application client;

and training the candidate recognition classification model by using the second sample audio clip to obtain the speech recognition classification model.

7. The method according to claim 1, further comprising, after the obtaining of the audio clip collected by the terminal device where the human-computer interaction application client is located:

acquiring audio amplitude information of a sound signal in the audio clip;

determining the slope of the amplitude of each sound signal according to the audio amplitude information;

sequentially acquiring the variation of the slope of the amplitudes of two adjacent sound signals, wherein the two adjacent sound signals comprise: a sound signal in a first audio frame and a sound signal in a second audio frame following the first audio frame;

under the condition that the variation is determined to be larger than a target threshold value, determining the position of the second audio frame as the initial identification position of the audio clip;

triggering the recognition of the audio segment in case the start recognition position is detected.

8. The method of any of claims 1-7, wherein the controlling the virtual character to interact with the target virtual object comprises:

adjusting the running state of the target virtual object displayed by the mark into a locking state, wherein the target virtual object in the locking state is a virtual object to be interacted by the virtual character;

and controlling the virtual role to directly perform interactive action with the target virtual object in the locked state.

9. An object interaction control device is applied to a non-virtual reality interaction scene, and comprises the following components:

the system comprises an operation unit, a processing unit and a processing unit, wherein the operation unit is used for operating a local human-computer interaction task in a human-computer interaction application client, and a virtual role controlled by the human-computer interaction application client interacts with at least one virtual object participating in the human-computer interaction task;

the first acquisition unit is used for acquiring an audio clip acquired by terminal equipment where the human-computer interaction application client is located;

the identification unit is used for identifying a target virtual object from the audio clip, wherein the target virtual object is included in the at least one virtual object;

the marking unit is used for marking and displaying the target virtual object;

the locking unit is used for locking the target virtual object when detecting that an object locking mode of the local human-computer interaction task is a sound locking mode;

and the control unit is used for controlling the virtual character to directly interact with the locked target virtual object according to the interaction action information between the virtual character and the target virtual object indicated in the acquired audio clip.

10. The apparatus of claim 9, wherein the marking unit comprises:

the first determination module is used for determining the target position of the target virtual object;

the first obtaining module is used for indicating the target virtual object to be displayed in a human-computer interaction interface currently presented by the terminal equipment at the target position, and obtaining a display area of the target virtual object in the human-computer interaction interface;

and the first marking module is used for highlighting and marking the target virtual object in the display area.

11. The apparatus of claim 10, further comprising:

a second obtaining module, configured to, after determining a target position where the target virtual object is located, obtain, when the target position indicates that the target virtual object is not displayed in the human-computer interaction interface currently presented by the terminal device, a display position of the target virtual object in a map that is matched with the human-computer interaction task;

and the second marking module is used for highlighting and marking the target virtual object on the display position.

12. The apparatus of claim 9, wherein the identification unit comprises:

the extraction module is used for extracting audio features in the audio clips;

the input module is used for inputting the audio features into a speech recognition classification model, wherein the speech recognition classification model is a classification model which is obtained by training a plurality of sample audio segments and is used for recognizing virtual objects indicated in the audio segments;

and the third obtaining module is used for obtaining an output result of the speech recognition classification model, wherein the output result carries the identified object identifier of the target virtual object.

13. The apparatus of claim 9, further comprising:

the second obtaining unit is used for obtaining audio amplitude information of a sound signal in an audio clip after the audio clip collected by the terminal equipment where the human-computer interaction application client is located is obtained;

a first determining unit for determining a slope of the amplitude of each sound signal according to the audio amplitude information;

a third obtaining unit, configured to sequentially obtain a variation amount of a slope of amplitudes of two adjacent sound signals, where the two adjacent sound signals include: a sound signal in a first audio frame and a sound signal in a second audio frame following the first audio frame;

the second determining unit is used for determining the position of the second audio frame as the initial identification position of the audio clip under the condition that the variation is determined to be larger than the target threshold;

and the triggering unit is used for triggering the audio clip to be identified under the condition that the initial identification position is detected.

14. A computer-readable storage medium comprising a stored program, wherein the program when executed performs the method of any of claims 1 to 8.

15. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to carry out the method of any one of claims 1 to 8 by means of the computer program.