CN113311936B - AR-based voice commenting method, device, equipment and storage medium - Google Patents

AR-based voice commenting method, device, equipment and storage medium Download PDF

Info

Publication number
CN113311936B
CN113311936B CN202010125581.7A CN202010125581A CN113311936B CN 113311936 B CN113311936 B CN 113311936B CN 202010125581 A CN202010125581 A CN 202010125581A CN 113311936 B CN113311936 B CN 113311936B
Authority
CN
China
Prior art keywords
comment
user
client
identifier
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010125581.7A
Other languages
Chinese (zh)
Other versions
CN113311936A (en
Inventor
范涛
唐健明
曾琦娟
李廷龙
张瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Chengdu ICT Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Chengdu ICT Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Chengdu ICT Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010125581.7A priority Critical patent/CN113311936B/en
Publication of CN113311936A publication Critical patent/CN113311936A/en
Application granted granted Critical
Publication of CN113311936B publication Critical patent/CN113311936B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0282Rating or review of business operators or products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0807Network architectures or network communication protocols for network security for authentication of entities using tickets, e.g. Kerberos

Abstract

The embodiment of the invention provides an AR-based voice commenting method, an AR-based voice commenting device, AR-based voice commenting equipment and an AR-based voice commenting storage medium, wherein the AR-based voice commenting method is used for a client and comprises the following steps: identifying received first voice information input by a user to obtain a first identification result, wherein the first identification result comprises a first keyword; when the first keyword is a preset comment keyword, sending authentication request information to a server, wherein the authentication request information comprises a user identifier, a client identifier and a scene identifier; receiving an authentication token sent by a server; triggering a comment mode according to the first keyword and the authentication token, and receiving second voice information input by the user; and sending second voice information, a user identifier, a client identifier and comment scene data to a server, wherein the comment scene data comprises a current real object scene picture and one frame of a virtual image when the user starts to comment. The method and the device can realize the voice comment on the AR content, meet the comment requirement of the user, and improve the use experience of the AR application of the user.

Description

AR-based voice commenting method, device, equipment and storage medium
Technical Field
The invention relates to the field of digital services, in particular to an AR-based voice commenting method, device, equipment and storage medium.
Background
Augmented Reality (AR) technology is a technology for skillfully fusing virtual information and a real world, a virtual model is projected to the real world through computer operation, a real environment and a virtual object are superimposed on the same picture or space in real time, and people can interact with the virtual information. When the AR technology is applied to a real scene, a user often faces a demand for commenting or remarking on AR content being played, and when the demand of the user is solved, the prior art is mainly realized through text input.
However, the prior art has the following disadvantages: on one hand, the mode of commenting or remarking through character input is only suitable for mobile phones or handheld devices, and is not suitable for AR glasses or head-mounted devices; on the other hand, when the AR content is commented, the efficiency of manual character input is low, the requirements of the user cannot be well met, and the AR application use experience of the user is reduced.
Disclosure of Invention
The embodiment of the invention provides an AR-based voice commenting method, an AR-based voice commenting device, AR-based voice commenting equipment and an AR application use experience of a user.
In a first aspect, an AR-based voice commenting method is provided, which is used for a client, and includes: identifying received first voice information input by a user to obtain a first identification result, wherein the first identification result comprises a first keyword; when the first keyword is a preset comment keyword, sending authentication request information to a server, wherein the authentication request information comprises a user identifier, a client identifier and a scene identifier, and the authentication request information is used for the server to generate an authentication token according to the user identifier, the client identifier and the scene identifier; receiving an authentication token sent by a server; triggering a comment mode according to the first key words and the authentication token, and receiving second voice information input by a user; and sending second voice information, a user identifier, a client identifier and comment scene data to a server, wherein the comment scene data comprise a current real object scene picture and one frame of a virtual image when the user starts to comment, and the second voice information is comment information of a scene corresponding to the comment scene data by the user.
In some implementations of the first aspect, before identifying the received first voice information entered by the user, further comprising: sending configuration request information to a server; and receiving initialization parameters sent by the server according to the configuration request information, wherein the initialization parameters comprise preset comment keywords.
In some implementation manners of the first aspect, the initialization parameter further includes a preset end keyword, and the method further includes: identifying the received third voice information input by the user to obtain a second identification result, wherein the second identification result comprises a second keyword; and when the second keyword is a preset finishing keyword, exiting the comment mode.
In some implementations of the first aspect, the initialization parameter further includes a preset cleaning keyword, and the method further includes: and when the second voice information contains the preset cleaning keyword, cleaning the second voice information.
In some implementations of the first aspect, recognizing the received first voice information entered by the user to obtain a first recognition result includes: obtaining a voice feature sequence according to the first voice information; and identifying the voice characteristic sequence to obtain a first identification result.
In some implementations of the first aspect, sending the second voice information, the user identifier, the client identifier, and the comment scene data to the server includes: performing noise reduction, echo cancellation and coding on the second voice information to obtain comment voice data; and sending the comment voice data, the user identification, the client identification and the comment scene data to a server.
In a second aspect, an AR-based voice commenting method is provided, which is used for a server, and includes: receiving authentication request information sent by a client, wherein the authentication request information comprises a user identifier, a client identifier and a scene identifier; authenticating the user, the client and the current scene according to the user identifier, the client identifier and the scene identifier to generate an authentication token; sending an authentication token to the client, wherein the authentication token is used for triggering the comment mode by the client; and receiving second voice information, a user identifier, a client identifier and comment scene data sent by the client, wherein the comment scene data comprise a current real object scene picture and one frame of a virtual image when the user starts to comment, and the second voice information is comment information of a scene corresponding to the comment scene data by the user.
In some implementations of the second aspect, before receiving the authentication request information sent by the client, the method further includes: receiving configuration request information sent by a client; and sending initialization parameters to the client according to the configuration request information, wherein the initialization parameters comprise preset comment keywords.
In some implementations of the second aspect, initializing the parameters further includes: and presetting an end keyword and a cleaning keyword.
In a third aspect, an AR-based voice commenting apparatus is provided for a client, the apparatus including: the recognition module is used for recognizing the received first voice information input by the user to obtain a first recognition result, and the first recognition result comprises a first keyword; the sending module is used for sending authentication request information to the server when the first keyword is a preset comment keyword, wherein the authentication request information comprises a user identifier, a client identifier and a scene identifier, and the authentication request information is used for the server to generate an authentication token according to the user identifier, the client identifier and the scene identifier; the receiving module is used for receiving the authentication token sent by the server; the triggering module is used for triggering the comment mode according to the first keyword and the authentication token and receiving second voice information input by the user; the sending module is further used for sending second voice information, a user identification, a client identification and comment scene data to the server, wherein the comment scene data comprise a current real object scene picture and one frame of a virtual image when the user starts to comment, and the second voice information is comment information of the user on a scene corresponding to the comment scene data.
In some realizations of the third aspect, before identifying the received first voice information entered by the user, the method further includes a transceiver module for sending configuration request information to the server; and receiving initialization parameters sent by the server according to the configuration request information, wherein the initialization parameters comprise preset comment keywords.
In some implementation manners of the third aspect, the initialization parameter further includes a preset ending keyword, and the recognition module is further configured to recognize the received third voice information entered by the user to obtain a second recognition result, where the second recognition result includes the second keyword; and when the second keyword is a preset ending keyword, exiting the comment mode.
In some implementation manners of the third aspect, the initialization parameter further includes a preset cleaning keyword, and the recognition module is further configured to clean the second voice message when the second voice message includes the preset cleaning keyword.
In some implementations of the third aspect, the recognition module is specifically configured to obtain a speech feature sequence according to the first speech information; and identifying the voice characteristic sequence to obtain a first identification result.
In some implementation manners of the third aspect, the sending module is specifically configured to perform noise reduction, echo cancellation, and encoding on the second voice information to obtain comment voice data; and sending the comment voice data, the user identification, the client identification and the comment scene data to a server.
In a fourth aspect, there is provided an AR-based voice commenting apparatus for a server, the apparatus including: the receiving module is used for receiving authentication request information sent by a client, wherein the authentication request information comprises a user identifier, a client identifier and a scene identifier; the authentication module is used for authenticating the user, the client and the current scene according to the user identifier, the client identifier and the scene identifier to generate an authentication token; the sending module is used for sending an authentication token to the client, and the authentication token is used for triggering the comment mode by the client; the receiving module is further used for receiving second voice information, a user identifier, a client identifier and comment scene data sent by the client, wherein the comment scene data comprise a current real object scene picture and a frame of a virtual image when the user starts to comment, and the second voice information is comment information of the user on a scene corresponding to the comment scene data.
In some realizations of the fourth aspect, before receiving the authentication request information sent by the client, the system further includes a configuration module, configured to receive the configuration request information sent by the client; and sending initialization parameters to the client according to the configuration request information, wherein the initialization parameters comprise preset comment keywords.
In some implementations of the fourth aspect, initializing the parameters further includes: and presetting an ending keyword and a cleaning keyword.
In a fifth aspect, there is provided an AR-based voice commenting device, including: a processor and a memory storing computer program instructions; the processor, when executing the computer program instructions, implements the method for AR-based voice critique in the first or second aspect, or in some realizations of the first or second aspect.
A sixth aspect provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the first or second aspect, or an AR-based voice commenting method in some realizations of the first or second aspect.
The invention relates to the field of digital services, in particular to an AR-based voice commenting method, an AR-based voice commenting device, AR-based voice commenting equipment and an AR-based voice commenting computer-readable storage medium, wherein a first recognition result is obtained by recognizing received first voice information input by a user and comprises a first keyword; when the first keyword is a preset comment keyword, sending authentication request information to a server, wherein the authentication request information is used for the server to generate an authentication token according to a user identifier, a client identifier and a scene identifier; receiving an authentication token sent by a server; triggering a comment mode according to the first key words and the authentication token, and receiving second voice information input by a user; the second voice information, the user identification, the client identification and the comment scene data are sent to the server, the comment scene data comprise a current real object scene picture and one frame of a virtual image when the user begins to comment, the second voice information is comment information of a scene corresponding to the comment scene data, a comment mode can be triggered by specific preset comment keywords, the user can realize voice comment on AR content only by sending the voice information without typing, the voice information cannot influence playing of the current AR content, the comment requirement of the user is met, and the use experience of AR application of the user is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of an AR-based voice commenting method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an AR-based voice commenting device according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of another AR-based voice commenting apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an AR-based voice critique device according to an embodiment of the present invention.
Detailed Description
Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising 8230; \8230;" comprises 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone.
The AR technology can be applied to many scenes, such as classroom education, traditional classrooms generally carry out infusion type teaching, matched courseware or auxiliary tools are lacked, students have low absorption efficiency in the boring and tasteless learning process, the AR technology is introduced, related virtual scenes are displayed in a triggering mode by combining book contents, complex and abstract concepts are displayed in a vivid mode, accordingly, the students are helped to strengthen knowledge cognition, interaction means are provided, enthusiasm of the students is mobilized, and learning efficiency is improved.
When a user watches the AR content, the user often needs to comment or remark the AR content in real time so as to store additional information or share information, for example, in classroom education, students can enter notes for the AR content so as to consolidate learning.
In the prior art, when the comment or remark requirement of the user is met, a text input mode is generally adopted, which is similar to a 'barrage' displayed on playing content by typing comments when a traditional video website watches movies, but the text input mode is only suitable for handheld devices such as mobile phones and is not suitable for AR glasses or head-mounted devices; on the other hand, when the AR content is commented, the efficiency of manual character input is low, the requirements of the user cannot be well met, and the use experience of the AR application of the user is reduced.
In order to solve the problems that the existing text commenting mode is not suitable for AR glasses and the commenting efficiency is low, the embodiment of the invention provides an AR-based voice commenting method, device, equipment and medium. The technical solutions of the embodiments of the present invention are described below with reference to the accompanying drawings.
Fig. 1 is a schematic flowchart of an AR-based voice commenting method according to an embodiment of the present invention, and as shown in fig. 1, the AR-based voice commenting method may include the following steps:
s101, the client identifies the received first voice information input by the user to obtain a first identification result, and the first identification result comprises a first keyword.
Specifically, a user can input first voice information through a microphone, a client receives the first voice information of the user through a monitoring microphone, a voice feature sequence changing along with time is extracted from voice waveform data of the first voice information, and then the voice feature sequence is identified to identify a first keyword in the first voice information.
The first voice information input by the user can be similar instructions such as "i want to comment", "begin to comment", "note", "remark", "record my idea", and the like, and the client recognizes the voice information to obtain a first recognition result including similar words such as "comment", "note", "remark", "record", "idea", and the like.
Before identifying the received first voice information input by the user, the client acquires configuration from the server, and specifically includes: the client sends configuration request information to the server, and receives initialization parameters sent by the server according to the configuration request information, wherein the initialization parameters comprise preset comment keywords.
Here, the comment keyword may be preset as a word related to the comment instruction, such as a related word or phrase, for example, "comment", "note", "remark", "record", "idea", and the like.
Specifically, in the AR application, the invocation of the client may be realized by a Software Development Kit (SDK).
S102, when the first keyword is a preset comment keyword, the client sends authentication request information to the server, wherein the authentication request information comprises a user identifier, a client identifier and a scene identifier.
Firstly, a client acquires a preset comment keyword in an initialization parameter, compares an acoustic model of the preset keyword with a first keyword, and judges whether the first keyword is consistent with the preset comment keyword.
And then, when the first keyword is consistent with the preset comment keyword, the client sends authentication request information to the server by calling the server-side interface, and requests the server to perform authentication and comment function authorization on the client.
S103, authenticating the user, the client and the current scene according to the user identifier, the client identifier and the scene identifier to generate an authentication token.
The server receives request authentication information sent by the client, checks the current user, the client and the scene according to the user identification, the client identification and the scene identification in the authentication request information, judges whether the current user and the client can perform voice comment in the current scene or not, and generates an authentication token (token) after the check is passed.
Before receiving the authentication request information sent by the client, the server further includes: receiving configuration request information sent by a client; and sending initialization parameters to the client according to the configuration request information, wherein the initialization parameters comprise preset comment keywords.
S104, the server sends the authentication token to the client.
And the server sends a token to the client to perform comment function authorization on the client.
And S105, the client triggers a comment mode according to the first keyword and the authentication token and receives second voice information input by the user.
And the client receives the token sent by the server, successfully passes the verification of the server and completes the authorization of the comment function. When the client receives the token sent by the server, the comment mode is triggered according to the first keyword and the token, the user can be prompted to enter the comment mode successfully, and second voice information input by the user is received.
The second voice information entered by the user may be information entered by the user through voice to comment or remark on the current AR scene or AR content.
As a specific example, in the AR application for classroom education, students wear AR glasses to view chemical molecular structures, and in an immersive teaching scene, the students recite "note taking", the client obtains authorization for a comment function from the server according to the recognized keyword "note", then triggers a recording (comment) mode according to token and keyword "note" returned by the server, and prompts the students that the "recording mode has been successfully entered", the students can start voice note recording of the chemical molecular structures after hearing the prompt tone, and the client stores voice information entered by the students.
And S106, the client sends the second voice information, the user identification, the client identification and the comment scene data to the server.
The client caches the received second voice information input by the user, and performs preprocessing such as noise reduction, echo cancellation and coding on the second voice information to obtain comment voice data; then, acquiring a current real object scene picture and one frame of a virtual image when a user starts to comment as comment scene data, wherein the second voice information is comment information of the user on a scene corresponding to the comment scene data; and finally, sending the comment voice data, the user identification, the client identification and the comment scene data to the server.
Optionally, in an embodiment, before the client sends the comment voice data, the user identifier, the client identifier, and the comment scene data to the server, the client may encapsulate the comment voice data, the user identifier, the client identifier, and the comment scene data, and send an obtained data encapsulation packet to the server through a Hypertext Transfer Protocol (HTTP).
Optionally, in one embodiment, the client may package and send the scene identifier and the authentication token to the server together with the comment voice data, the user identifier, the client identifier, and the comment scene data.
S107, the server stores the second voice information, the user identification, the client identification and the comment scene data.
When the server receives second voice information, user identification, client identification and comment scene data sent by the client, the received second voice information, user identification, client identification and comment scene data are stored, the second voice information and comment scene data are stored in a file system or a distributed system, a data storage path is obtained, and then the data storage path, the user identification and the client identification are stored in a database.
Specifically, the server may store the received comment voice data, the user identifier, the client identifier, the scene identifier, the authentication token, and the comment scene data sent by the client, store the comment voice data and the comment scene data in a file system or a distributed system to obtain a data storage path, and then store the data storage path, the user identifier, the client identifier, the scene identifier, and the authentication token together in the database.
The server can also provide visual management for the stored comment voice data.
Optionally, in an embodiment, the server may further parse the received data encapsulation packet sent by the client, and store the obtained parsed data.
In some embodiments, the initialization parameter may further include a preset cleaning keyword, the user may enter second voice information through a microphone, the client receives the second voice information of the user through a monitoring microphone, extracts a voice feature sequence that changes with time from voice waveform data of the second voice information, identifies the voice feature sequence, identifies a keyword in the second voice information, and cleans the second voice information when the second voice information includes the preset cleaning keyword.
The cleaning keyword may be preset as a word related to the indication of the cleaning comment information, such as a related word or phrase of "cleaning", "invalidation", and the like. The second voice information input by the user can be a piece of comment voice information containing similar instructions of 'clearing the comment, recording information invalidity' and the like, and the client identifies the voice information to obtain similar words of 'clearing', 'invalidity' and the like.
In some embodiments, the initialization parameter may further include a preset end keyword, the user may enter third voice information through a microphone, the client receives the third voice information of the user through a monitoring microphone, extracts a voice feature sequence that changes with time from voice waveform data of the third voice information, identifies the voice feature sequence, identifies a second keyword in the third voice information, and exits the comment mode when the second keyword is the preset end keyword or the user has no voice within a preset time (e.g., 5 seconds).
Here, the end keyword may be preset as a word related to the end comment indication, such as "complete", "end", or a related word or phrase. The third voice information input by the user may be similar indications such as "i want to end the comment", "i have completed the comment", and the client recognizes the voice information to obtain a second recognition result including similar words such as "end", "complete", and the like.
In some embodiments, initializing parameters may further include: maximum supported voice time, maximum voice comment number, voice data format, image format, data transmission timeout time and the like.
According to the AR-based voice commenting method, the first voice information input by the user is identified to obtain the first identification result comprising the first keyword, when the first keyword is the preset commenting keyword, the real-time voice commenting of the user on the current AR scene is realized by sending the authentication request information to authorize the commenting function and triggering the commenting mode, the voice information does not influence the playing of the current AR content, the commenting efficiency is improved, the commenting requirement of the user is met, and the use experience of AR application of the user is improved.
Fig. 2 is a schematic structural diagram of an AR-based voice commenting apparatus according to an embodiment of the present invention, which is used for a client, and as shown in fig. 2, the AR-based voice commenting apparatus 200 may include: the device comprises an identification module 201, a sending module 202, a receiving module 203 and a triggering module 204.
The recognition module 201 is configured to recognize received first voice information input by a user to obtain a first recognition result, where the first recognition result includes a first keyword.
The sending module 202 is configured to send authentication request information to the server when the first keyword is a preset comment keyword, where the authentication request information includes a user identifier, a client identifier, and a scene identifier, and the authentication request information is used by the server to generate an authentication token according to the user identifier, the client identifier, and the scene identifier.
The receiving module 203 is configured to receive the authentication token sent by the server.
And the triggering module 204 is used for triggering the comment mode according to the first keyword and the authentication token and receiving second voice information input by the user.
The sending module 202 is further configured to send, to the server, second voice information, a user identifier, a client identifier, and comment scene data, where the comment scene data includes a current real object scene picture and a frame of a virtual image when the user starts to comment, where the second voice information is comment information of a scene corresponding to the comment scene data by the user.
In some embodiments, before identifying the received first voice information input by the user, the system further comprises a transceiver module for sending configuration request information to the server; and receiving initialization parameters sent by the server according to the configuration request information, wherein the initialization parameters comprise preset comment keywords.
In some embodiments, the initialization parameter further includes a preset ending keyword, and the recognition module 201 is further configured to recognize the received third voice information input by the user to obtain a second recognition result, where the second recognition result includes the second keyword; and when the second keyword is a preset ending keyword, exiting the comment mode.
In some embodiments, the initialization parameter further includes a preset cleaning keyword, and the recognition module 201 is further configured to clean the second voice message when the second voice message includes the preset cleaning keyword.
In some embodiments, the recognition module 201 is specifically configured to obtain a speech feature sequence according to the first speech information; and identifying the voice characteristic sequence to obtain a first identification result.
In some embodiments, the sending module 202 is specifically configured to perform noise reduction, echo cancellation, and encoding on the second voice information to obtain comment voice data; and sending the comment voice data, the user identification, the client identification and the comment scene data to a server.
The AR-based voice commenting device is used for a client to obtain a first recognition result by recognizing received first voice information input by a user, wherein the first recognition result comprises a first keyword; when the first keyword is a preset comment keyword, sending authentication request information to a server; receiving an authentication token sent by a server; triggering a comment mode according to the first keyword and the authentication token, and receiving second voice information input by the user; the second voice information, the user identification, the client identification and the comment scene data are sent to the server, the comment mode can be triggered by utilizing the specific preset comment keyword, the user does not need to type, the voice comment of the AR content can be achieved only by sending the voice information, the voice information cannot influence the playing of the current AR content, the comment requirement of the user is met, and the use experience of the AR application of the user is improved.
It can be understood that the AR-based voice commenting apparatus 200 according to the embodiment of the present invention may correspond to a client executing the embodiment shown in fig. 1, and specific details of operations and/or functions of each module/unit of the AR-based voice commenting apparatus 200 may refer to descriptions of corresponding parts in the AR-based voice commenting method in the embodiment shown in fig. 1, and are not described herein again for brevity.
Fig. 3 is a schematic structural diagram of another AR-based voice commenting apparatus according to an embodiment of the present invention, which is used for a server, and as shown in fig. 3, the AR-based voice commenting apparatus 300 may include: a receiving module 301, an authentication module 302, and a transmitting module 303.
The receiving module 301 is configured to receive authentication request information sent by a client, where the authentication request information includes a user identifier, a client identifier, and a scene identifier.
And the authentication module 302 is configured to authenticate the user, the client and the current scene according to the user identifier, the client identifier and the scene identifier, and generate an authentication token.
A sending module 303, configured to send an authentication token to the client, where the authentication token is used for the client to trigger the comment mode.
The receiving module 301 is further configured to receive second voice information, a user identifier, a client identifier, and comment scene data sent by the client, where the comment scene data includes a current real object scene picture and a frame of a virtual image when the user starts to comment, where the second voice information is comment information of a scene corresponding to the comment scene data by the user.
In some embodiments, before receiving the authentication request information sent by the client, the system further comprises a configuration module, configured to receive the configuration request information sent by the client; and sending initialization parameters to the client according to the configuration request information, wherein the initialization parameters comprise preset comment keywords.
In some embodiments, initializing the parameters further comprises: and presetting an ending keyword and a cleaning keyword.
The AR-based voice comment device is used for a server, and receives authentication request information sent by a client, wherein the authentication request information comprises a user identifier, a client identifier and a scene identifier; authenticating the user, the client and the current scene according to the user identifier, the client identifier and the scene identifier to generate an authentication token; sending an authentication token to the client, wherein the authentication token is used for triggering the comment mode by the client; the second voice information, the user identification, the client identification and the comment scene data sent by the client are received, authorization of the client endpoint comment function can be completed, the user can successfully enter the voice comment mode and comment on the current AR scene, the comment requirement of the user is met, and the use experience of AR application of the user is improved.
It can be understood that the AR-based voice commenting apparatus 300 according to the embodiment of the present invention may correspond to a server executing the embodiment shown in fig. 1, and specific details of the operation and/or function of each module/unit of the AR-based voice commenting apparatus 300 may refer to the description of corresponding parts in the AR-based voice commenting method in the embodiment shown in fig. 1, and are not repeated herein for brevity.
Fig. 4 is a schematic diagram of a hardware structure of an AR-based voice commenting device according to an embodiment of the present invention.
As shown in fig. 4, the AR-based voice commenting device 400 in the present embodiment includes an input device 401, an input interface 402, a central processor 403, a memory 404, an output interface 405, and an output device 406. The input interface 402, the central processing unit 403, the memory 404, and the output interface 405 are connected to each other through a bus 410, and the input device 401 and the output device 406 are connected to the bus 410 through the input interface 402 and the output interface 405, respectively, and further connected to other components of the networking information determination device 400.
Specifically, the input device 401 receives input information from the outside and transmits the input information to the central processor 403 through the input interface 402; the central processor 403 processes the input information based on computer-executable instructions stored in the memory 404 to generate output information, stores the output information temporarily or permanently in the memory 404, and then transmits the output information to the output device 406 through the output interface 405; the output device 406 outputs the output information to the outside of the AR-based voice commenting device 400 for use by the user.
That is, the AR-based voice commenting apparatus shown in fig. 4 may also be implemented to include: a memory storing computer-executable instructions; and a processor which, when executing the computer executable instructions, may implement the AR-based voice commenting method described in connection with the example shown in fig. 1.
In one embodiment, the AR-based voice commenting apparatus 400 illustrated in fig. 4 includes: a memory 404 for storing programs; the processor 403 is configured to execute the program stored in the memory to perform the method of the embodiment shown in fig. 1 according to the embodiment of the present invention.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium has computer program instructions stored thereon; which when executed by a processor implement the method of the embodiment shown in fig. 1 provided by an embodiment of the present invention.
It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic Circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor Memory devices, read-Only memories (ROMs), flash memories, erasable ROMs (EROMs), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments noted in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
As described above, only the specific embodiments of the present invention are provided, and it can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present invention.

Claims (13)

1. An AR-based voice commenting method, which is used for a client, and comprises the following steps:
identifying received first voice information input by a user to obtain a first identification result, wherein the first identification result comprises a first keyword;
when the first keyword is a preset comment keyword, sending authentication request information to a server, wherein the authentication request information comprises a user identifier, a client identifier and a scene identifier, and the authentication request information is used for the server to generate an authentication token according to the user identifier, the client identifier and the scene identifier;
receiving the authentication token sent by the server;
triggering a comment mode according to the first keyword and the authentication token, and receiving second voice information input by the user;
and sending the second voice information, the user identifier, the client identifier and comment scene data to the server, wherein the comment scene data comprises a current real object scene picture and a frame of a virtual image when the user starts to comment, and the second voice information is comment information of the user on a scene corresponding to the comment scene data.
2. The method of claim 1, wherein prior to the identifying the received first voice information entered by the user, the method further comprises:
sending configuration request information to the server;
and receiving initialization parameters sent by the server according to the configuration request information, wherein the initialization parameters comprise the preset comment keywords.
3. The method of claim 2, wherein the initialization parameters further include a preset end keyword, the method further comprising:
identifying the received third voice information input by the user to obtain a second identification result, wherein the second identification result comprises a second keyword;
and when the second keyword is a preset ending keyword, exiting the commenting mode.
4. The method of claim 2, wherein the initialization parameters further include a preset cleaning keyword, the method further comprising:
and when the second voice information contains the preset cleaning keywords, cleaning the second voice information.
5. The method according to claim 1, wherein the recognizing the received first voice information entered by the user to obtain a first recognition result comprises:
obtaining a voice feature sequence according to the first voice information;
and identifying the voice feature sequence to obtain the first identification result.
6. The method of claim 1, wherein the sending the second voice information, the user identifier, the client identifier, and comment context data to the server comprises:
performing noise reduction, echo cancellation and coding on the second voice information to obtain comment voice data;
and sending the comment voice data, the user identification, the client identification and the comment scene data to the server.
7. An AR-based voice commenting method, wherein the method is used for a server, and the method comprises the following steps:
receiving authentication request information sent by a client, wherein the authentication request information comprises a user identifier, a client identifier and a scene identifier;
authenticating the user, the client and the current scene according to the user identifier, the client identifier and the scene identifier to generate an authentication token;
sending the authentication token to the client, wherein the authentication token is used for triggering a comment mode by the client;
and receiving second voice information, the user identifier, the client identifier and comment scene data sent by the client, wherein the comment scene data comprises a current real object scene picture and a frame of a virtual image when the user starts to comment, and the second voice information is comment information of the user on a scene corresponding to the comment scene data.
8. The method according to claim 7, wherein before the receiving the authentication request information sent by the client, the method further comprises:
receiving configuration request information sent by the client;
and sending initialization parameters to the client according to the configuration request information, wherein the initialization parameters comprise preset comment keywords.
9. The method of claim 8, wherein the initializing parameters further comprises: and presetting an end keyword and a cleaning keyword.
10. An AR-based voice commenting device, wherein the device is used for a client, the device comprising:
the recognition module is used for recognizing received first voice information input by a user to obtain a first recognition result, and the first recognition result comprises a first keyword;
the sending module is used for sending authentication request information to a server when the first keyword is a preset comment keyword, wherein the authentication request information comprises a user identifier, a client identifier and a scene identifier, and the authentication request information is used for the server to generate an authentication token according to the user identifier, the client identifier and the scene identifier;
a receiving module, configured to receive the authentication token sent by the server;
the triggering module is used for triggering a comment mode according to the first keyword and the authentication token and receiving second voice information input by the user;
the sending module is further configured to send the second voice information, the user identifier, the client identifier and comment scene data to the server, where the comment scene data includes a current real object scene picture and a frame of a virtual image when the user starts to comment, and the second voice information is comment information of a scene corresponding to the comment scene data by the user.
11. An AR-based voice commenting device, wherein the device is used for a server, the device comprising:
the system comprises a receiving module, a judging module and a sending module, wherein the receiving module is used for receiving authentication request information sent by a client, and the authentication request information comprises a user identifier, a client identifier and a scene identifier;
the authentication module is used for authenticating the user, the client and the current scene according to the user identifier, the client identifier and the scene identifier to generate an authentication token;
the sending module is used for sending the authentication token to the client, and the authentication token is used for triggering the comment mode by the client;
the receiving module is further configured to receive second voice information, the user identifier, the client identifier and comment scene data sent by the client, where the comment scene data includes a current real object scene picture and a frame of a virtual image when the user starts to comment, and the second voice information is comment information of a scene corresponding to the comment scene data by the user.
12. An AR-based voice commenting device, comprising: a processor and a memory storing computer program instructions;
the processor, when executing the computer instructions, implements the AR-based voice commenting method according to any one of claims 1 to 9.
13. A computer-readable storage medium, wherein computer program instructions are stored thereon, which when executed by a processor, implement the AR-based voice commenting method according to any one of claims 1 to 9.
CN202010125581.7A 2020-02-27 2020-02-27 AR-based voice commenting method, device, equipment and storage medium Active CN113311936B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010125581.7A CN113311936B (en) 2020-02-27 2020-02-27 AR-based voice commenting method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010125581.7A CN113311936B (en) 2020-02-27 2020-02-27 AR-based voice commenting method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113311936A CN113311936A (en) 2021-08-27
CN113311936B true CN113311936B (en) 2022-12-02

Family

ID=77370465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010125581.7A Active CN113311936B (en) 2020-02-27 2020-02-27 AR-based voice commenting method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113311936B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186555A (en) * 2011-12-28 2013-07-03 腾讯科技(深圳)有限公司 Evaluation information generation method and system
CN103440603A (en) * 2013-08-30 2013-12-11 苏州跨界软件科技有限公司 Order system based on augmented reality
CN105448292A (en) * 2014-08-19 2016-03-30 北京羽扇智信息科技有限公司 Scene-based real-time voice recognition system and method
JP2017037212A (en) * 2015-08-11 2017-02-16 セイコーエプソン株式会社 Voice recognizer, control method and computer program
CN107038361A (en) * 2016-10-13 2017-08-11 阿里巴巴集团控股有限公司 Service implementation method and device based on virtual reality scenario
CN109087639A (en) * 2018-08-02 2018-12-25 泰康保险集团股份有限公司 Method for voice recognition, device, electronic equipment and computer-readable medium
CN109191180A (en) * 2018-08-06 2019-01-11 百度在线网络技术(北京)有限公司 The acquisition methods and device of evaluation
CN110139164A (en) * 2019-06-17 2019-08-16 北京小桨搏浪科技有限公司 A kind of voice remark playback method, device, terminal device and storage medium
CN110472099A (en) * 2018-05-10 2019-11-19 腾讯科技(深圳)有限公司 Interdynamic video generation method and device, storage medium
CN110686354A (en) * 2019-10-12 2020-01-14 宁波奥克斯电气股份有限公司 Voice air conditioner control method, voice air conditioner control device and air conditioner

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3489943A4 (en) * 2016-07-19 2019-07-24 FUJIFILM Corporation Image display system, head-mounted-display control device, and method and program for actuating same
US10685386B2 (en) * 2016-11-30 2020-06-16 Bank Of America Corporation Virtual assessments using augmented reality user devices

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186555A (en) * 2011-12-28 2013-07-03 腾讯科技(深圳)有限公司 Evaluation information generation method and system
CN103440603A (en) * 2013-08-30 2013-12-11 苏州跨界软件科技有限公司 Order system based on augmented reality
CN105448292A (en) * 2014-08-19 2016-03-30 北京羽扇智信息科技有限公司 Scene-based real-time voice recognition system and method
JP2017037212A (en) * 2015-08-11 2017-02-16 セイコーエプソン株式会社 Voice recognizer, control method and computer program
CN107038361A (en) * 2016-10-13 2017-08-11 阿里巴巴集团控股有限公司 Service implementation method and device based on virtual reality scenario
CN110472099A (en) * 2018-05-10 2019-11-19 腾讯科技(深圳)有限公司 Interdynamic video generation method and device, storage medium
CN109087639A (en) * 2018-08-02 2018-12-25 泰康保险集团股份有限公司 Method for voice recognition, device, electronic equipment and computer-readable medium
CN109191180A (en) * 2018-08-06 2019-01-11 百度在线网络技术(北京)有限公司 The acquisition methods and device of evaluation
CN110139164A (en) * 2019-06-17 2019-08-16 北京小桨搏浪科技有限公司 A kind of voice remark playback method, device, terminal device and storage medium
CN110686354A (en) * 2019-10-12 2020-01-14 宁波奥克斯电气股份有限公司 Voice air conditioner control method, voice air conditioner control device and air conditioner

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"增强现实技术的汉语教学研究";曾丹 等;《语文学刊》;20171025;第37卷(第5期);第163-166页 *

Also Published As

Publication number Publication date
CN113311936A (en) 2021-08-27

Similar Documents

Publication Publication Date Title
US7653183B2 (en) Method and apparatus to provide data to an interactive voice response (IVR) system
CN109378006B (en) Cross-device voiceprint recognition method and system
CN106782551A (en) A kind of speech recognition system and method
JP7094485B2 (en) Business data processing method, equipment and related equipment
CN105681278A (en) Registering method and system of face identification access control
US20080225845A1 (en) Methods and Apparatus for Receiving Data in a Packet Network
CN105070290A (en) Man-machine voice interaction method and system
US9195641B1 (en) Method and apparatus of processing user text input information
WO2007103575A3 (en) Enhanced transaction resolution techniques
CN110392054A (en) Log in method of calibration, device, system, equipment and readable storage medium storing program for executing
CN104820944A (en) Method and system for bank self-service terminal authentication, and device
KR102549204B1 (en) Device, server and method for providing speech recognition service
CN102508475B (en) Remote control method and remote control system
CN103076893A (en) Method and equipment for realizing voice input
WO2021047197A1 (en) Speech processing method, apparatus and device, and computer storage medium
CN109040049A (en) User registering method and device, electronic equipment
CN113311936B (en) AR-based voice commenting method, device, equipment and storage medium
CN111524508A (en) Voice conversation system and voice conversation implementation method
CN113630309B (en) Robot conversation system, method, device, computer equipment and storage medium
KR100442979B1 (en) Method of on-line educating for network apparatus using simulation interface, and computer readable medium having stored thereon computer executable instruction for performing the same
WO2023029476A1 (en) Method for determining account information when user is in non-login state, and system
CN106850539A (en) A kind of checking information acquisition methods and device
CN111968630B (en) Information processing method and device and electronic equipment
CN107316644A (en) Method and device for information exchange
CN106878018A (en) Operation demonstration method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant