CN112380871A

CN112380871A - Semantic recognition method, apparatus, and medium

Info

Publication number: CN112380871A
Application number: CN202011306856.3A
Authority: CN
Inventors: 程俊
Original assignee: Shenzhen TCL New Technology Co Ltd
Current assignee: Shenzhen TCL New Technology Co Ltd
Priority date: 2020-11-19
Filing date: 2020-11-19
Publication date: 2021-02-19

Abstract

The invention discloses a method for semantic recognition, which comprises the following steps: when voice information is received, identifying text information corresponding to a current display page, wherein the text information comprises text information of a control corresponding to the display interface and text information displayed in the display interface; acquiring character information corresponding to the voice information; acquiring a character string matched with the character information in the text information; and executing the control instruction corresponding to the matched character string. The method and the device solve the technical problems of low efficiency and accuracy of recognizing the keywords corresponding to the voices in different display pages in the prior art, realize the effect of quickly recognizing the keywords in the text information corresponding to the voices in the display pages, and improve the accuracy of semantic recognition.

Description

Semantic recognition method, apparatus, and medium

Technical Field

The invention relates to the technical field of intelligent large screens, in particular to a semantic recognition method, semantic recognition equipment and a semantic recognition medium.

Background

Because the smart television is provided with a fully-open platform, each operating system is carried on. Therefore, with the continuous development of voice interaction technology, various smart home-mounted voice interaction technologies have become more and more popular. While enjoying traditional television content, users can install and uninstall various application software by themselves, and continuously expand and upgrade functions, thereby continuously bringing rich personalized experience to the users. The voice control of the smart television is very popular at present, and can be realized only by one remote controller or a mobile phone multi-screen interaction program. The voice control needs strong processing and operation capacity of the television, firstly voice is recognized through a remote controller or a mobile phone, voice commands of smart television users are converted into characters, matching recognition is carried out through a voice knowledge base, then a system is needed to automatically execute corresponding tasks, and volume adjustment, channel changing, mode switching and the like are carried out.

The key technology of natural semantic interaction is that semantic recognition is accurate and rapid, and the key technology depends on the page type displayed by a foreground scene of a large-screen system. The foreground scene of the large screen system displays the following types: activity page (Activity), browser view (WebView), and Dialog box (Dialog). However, in the prior art, the efficiency and accuracy of recognizing the keywords corresponding to the speech in different display pages are low.

Disclosure of Invention

The embodiment of the invention aims to solve the technical problems of low efficiency and accuracy of recognizing keywords corresponding to voices in different display pages in the prior art by providing a semantic recognition method, equipment and a medium,

in order to achieve the above object, an embodiment of the present invention provides a semantic recognition method, where the semantic recognition method includes the following steps:

when voice information is received, identifying text information corresponding to a current display page, wherein the text information comprises text information of a control corresponding to the display interface and text information displayed in the display interface;

acquiring character information corresponding to the voice information;

acquiring a character string matched with the character information in the text information;

and executing the control instruction corresponding to the matched character string.

Optionally, before the identifying the corresponding text information of the currently displayed page, the method further includes:

acquiring the type of a current display page, wherein the type of the display page comprises an activity page, a browser view and a dialog box;

and identifying text information corresponding to the display page according to the type of the display page.

Optionally, the obtaining of the character string in the text information, which is matched with the text information, includes:

acquiring a keyword in the text information corresponding to the voice information;

acquiring character strings in text information corresponding to the display page;

and determining a character string matched with the keyword.

Optionally, the identifying the text information corresponding to the currently displayed page includes:

acquiring text information and a control list of the display page;

packaging the text information and the control list into a preset data format;

and sending the text information packaged into a preset data format and the control list to a server for identification, and receiving a character string corresponding to the text information returned by the server.

Optionally, the acquiring the text information of the display page and the control list includes:

and when the browser view is displayed in a form of being embedded into the active page, the active page acquires the text information and the control list of the browser view through an internal interface.

Optionally, the step of executing the control instruction corresponding to the matched character string includes:

acquiring a control instruction corresponding to the matched character string;

when the control instructions are multiple, outputting a selection interface of the control instructions;

and executing the control instruction selected through the selection interface.

Optionally, after the step of obtaining the character string in the text information, further includes:

when the character string matched with the text information is obtained, executing a control instruction corresponding to the matched character string;

and outputting prompt information for re-inputting the voice information when the character string matched with the text information is not acquired.

To achieve the above object, an embodiment of the present invention provides a semantic recognition device, including:

the identification unit is used for identifying text information corresponding to a current display page when voice information is received, wherein the text information comprises text information of a control corresponding to the display interface and text information displayed in the display interface;

the first acquisition unit is used for acquiring the text information corresponding to the voice information;

the second acquisition unit is used for acquiring a character string matched with the character information in the text information;

and the execution unit is used for executing the control instruction corresponding to the matched character string.

To achieve the above object, an embodiment of the present invention provides a semantic recognition apparatus, which includes a memory, a processor, and a semantic recognition program stored in the memory and executable on the processor, and when executed by the processor, the semantic recognition program implements the steps of the semantic recognition method as described above.

To achieve the above object, an embodiment of the present invention provides a computer-readable storage medium, which stores a semantic recognition program, and when the semantic recognition program is executed by a processor, the method implements the steps of the semantic recognition method as described above.

According to the semantic recognition method, the semantic recognition equipment and the computer-readable storage medium, when the intelligent large-screen system receives the voice information, the type of the current display page is obtained, the corresponding obtaining strategy is determined according to the type of the display page to obtain the corresponding text information, and the text information is sent to the server for recognition. Recognizing corresponding text information according to the voice information, dividing the text information into at least one keyword, determining a character string matched with the keyword in text information of a display page, displaying the content corresponding to the character string matched with the keyword in an intelligent large-screen display area, and executing a responsive control event according to a control instruction and the type of a component of the currently displayed page so as to complete the control of the page. Therefore, the intelligent large-screen system can accurately acquire the text information of different display page types according to the display page types, and obtain the character string content corresponding to the control instruction according to the matching of the text information of the current display page and the keywords corresponding to the voice information, so that the effect of quickly identifying the keywords in the text information corresponding to the voice in the display page is realized, and the accuracy of semantic identification is improved.

Drawings

FIG. 1 is a schematic diagram of an apparatus in a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a semantic recognition method according to a first embodiment of the present invention;

FIG. 3 is a flowchart illustrating a semantic recognition method according to a second embodiment of the present invention;

FIG. 4 is a functional block diagram of the semantic recognition device according to the present invention.

Detailed Description

For a better understanding of the above technical solutions, exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As an implementation, the semantic recognition device may be as shown in fig. 1.

The embodiment of the invention relates to a semantic recognition device, which comprises: a processor 101, e.g. a CPU, a memory 102, a communication bus 103, a determination unit 104, a second acquisition unit 105 and a processing unit 106. The communication bus 103 is configured to implement connection communication between these components, the determining unit 104 is configured to determine actual authentication information of a device in the wireless network according to the device information and the device type, the second obtaining unit 105 is configured to obtain reference authentication information corresponding to the target data, and the processing unit 106 is configured to process the target data if the actual authentication information matches the reference authentication information.

The memory 102 may be a high-speed RAM memory or a non-volatile memory (e.g., a disk memory). As shown in fig. 1, a semantic recognition program may be included in the memory 103 as a kind of computer storage medium; and the processor 101 may be configured to call the semantic recognition program stored in the memory 102 and perform the following operations:

acquiring character information corresponding to the voice information;

Further, the processor 101 may be configured to call the semantic recognition program stored in the memory 102, and perform the following operations:

and determining a character string matched with the keyword.

acquiring text information and a control list of the display page;

packaging the text information and the control list into a preset data format;

acquiring a control instruction corresponding to the matched character string;

and executing the control instruction selected through the selection interface.

According to the scheme, when the intelligent large-screen system receives the voice information, the type of the current display page is obtained, the corresponding obtaining strategy is determined according to the type of the display page to obtain the corresponding text information, and the text information is sent to the server for identification. Recognizing corresponding text information according to the voice information, dividing the text information into at least one keyword, determining a character string matched with the keyword in text information of a display page, displaying the content corresponding to the character string matched with the keyword in an intelligent large-screen display area, and executing a responsive control event according to a control instruction and the type of a component of the currently displayed page so as to complete the control of the page. Therefore, the intelligent large-screen system can accurately acquire the text information of different display page types according to the display page types, and obtain the character string content corresponding to the control instruction according to the matching of the text information of the current display page and the keywords corresponding to the voice information, so that the effect of quickly identifying the keywords in the text information corresponding to the voice in the display page is realized, and the accuracy of semantic identification is improved.

Based on the hardware architecture of the semantic recognition equipment, the embodiment of the semantic recognition method is provided.

Referring to fig. 2, fig. 2 is a first embodiment of the semantic recognition method of the present invention, and the semantic recognition method includes the following steps:

step S10, when voice information is received, identifying text information corresponding to a current display page, wherein the text information comprises text information of a control corresponding to the display interface and text information displayed in the display interface;

the intelligent home voice recognition technology focuses on accurate and rapid semantic recognition, and the efficiency and accuracy of recognition of keywords corresponding to voices in different display pages in the prior art are low. The execution main body of the embodiment may be an intelligent large-screen system, and the voice information may be received by a remote controller corresponding to the intelligent large-screen system or received by a voice receiving device of the intelligent large-screen system. When voice information is received, a semantic recognition server sends a page report broadcast to start a text information collection function of a current display page, the server can be a cloud server or a server of an intelligent large-screen system, the intelligent large-screen system obtains the content of the current display page and extracts the text information of the current display page, and the text information comprises the text information of a controllable file list corresponding to the display interface and the text information displayed in the display interface.

Step S20, acquiring character information corresponding to the voice information;

when receiving voice information, the intelligent large-screen system converts the voice information into character information according to the mapping relation between voiceprints (the voiceprints refer to sound wave frequency spectrums which are displayed by an electro-acoustic instrument and carry speech information) and characters.

Step S30, acquiring a character string matched with the character information in the text information;

optionally, step S30 includes: acquiring character strings in text information corresponding to the display page;

and determining a character string matched with the keyword.

Optionally, obtaining a keyword in the text information corresponding to the voice information;

further, when the character string matched with the text information is acquired, step S40 is executed;

And step S40, executing the control instruction corresponding to the matched character string.

For example, when the speech information "play the first set and the second set of the good mr. of the drama" is received, the text information corresponding to the speech information can be divided into at least one keyword, and the keyword is cached, for example, the played text, the drama, the good mr., the first set and the second set are used as the keyword. And after the text information of the display page is identified, comparing the text information of the display page with the keywords in the voice information, and searching for the character strings matched with the keywords in the text information. When a character string matched with the keyword exists in text information of a display page, displaying content corresponding to the character string in an intelligent large-screen display area, executing a control instruction corresponding to the character string, such as playing a second set of Mr. Hao, or displaying an interface corresponding to the character string to a top active page and prompting whether to play the content corresponding to a current page, playing the content corresponding to the character string when receiving confirmation information, and clicking input by a remote controller or voice input by a confirmation information department. When the text information of the displayed page does not have the character string matched with the keyword, the prompt information of the voice information input again is output, the prompt information can be pictures and/or characters and/or audio information, for example, the output box information 'voice recognition fails, please input again' or 'content required by you cannot be retrieved, please input voice instruction again' and the information in the box can be played through the audio information.

According to the technical scheme provided by the embodiment of the invention, when the intelligent large-screen system receives the voice information, the content of the current display page is obtained, the page text information is extracted and sent to the server for identification. Identifying corresponding text information according to the voice information, dividing the text information into at least one keyword, determining a character string matched with the keyword in the text information of the display page, displaying the content corresponding to the character string matched with the keyword in an intelligent large-screen display area and executing a control instruction corresponding to the character string. Therefore, the intelligent large-screen system can obtain the character string content corresponding to the control instruction according to the matching of the text information of the current display page and the keywords corresponding to the voice information, and the effect of quickly identifying the keywords in the text information corresponding to the voice in the display page is achieved.

Referring to fig. 3, fig. 3 is a second embodiment of the semantic recognition method according to the present invention, and based on the first embodiment, step S10 includes:

step S11, acquiring the type of the current display page, wherein the type of the display page comprises an active page, a browser view and a dialog box;

The intelligent large-screen system acquires the type of a current display page when receiving the voice information, wherein the type of the display page comprises an active page, a browser view and a dialog box. And identifying text information corresponding to the display page according to the type of the current display page. An Activity page (Activity) is one of the four major components of the android system. The visual interface is operated by the user and provides a window for completing the operation instruction for the user. After we have created Activity, we need to call the setContentView () method of the content interface to complete the display of the interface, so as to provide the user with an interactive entry. Almost all the information that can be seen in android applications depends on Activity page Activity, so the Activity page Activity is a component which is most frequently used in development. The browser view WebView is a term, is a webpage view, can be embedded in a mobile terminal, and realizes hybrid development of a front end, and most hybrid development frames are secondarily developed based on a WebView mode. Dialog boxes, also called Dialog boxes, are special windows in graphical user interfaces for displaying information to a user in the user interface or for obtaining input responses from the user when needed. These are called "dialog boxes" because they allow a dialog to be formed between the computer and the user-either to notify the user of some information, to request input from the user, or both. Different user interactions use different dialog boxes. The simplest dialog is a warning that displays a message and requires only a response that the message has been read (usually a dialog to click on the 'ok' button or the 'close' command).

Optionally, the step S10 includes:

step S12, acquiring the text information and the control list of the display page;

step S13, packaging the text information and the control list into a preset data format;

and step S14, sending the text information and the control list packaged into the preset data format to a server for identification, and receiving a character string corresponding to the text information returned by the server.

Optionally, when receiving the voice information, the intelligent large-screen system starts a scene processing function to obtain the type of the current display page, and when the display page is an active page, obtains the active page currently in the foreground, and registers the voice-controlled receiving broadcast to realize the connection and interaction between the remote controller and the server. And acquiring all component lists in the current active page, including the text information in the view and the sub-view and the text information of the display page. Such as:

lis large screen iew (x1) { "haoxinsheng 1", "haoxinsheng 2" }// Haisheng 1, 2. The obtained text information is then encapsulated into a preset data format, such as a JSON data format (the full name of JSON is "JavaScript Object notification", which means JavaScript Object Notation, which is a text-based, language-independent lightweight data exchange format) to satisfy a communication protocol with a server. Such as: jason _ Lis large screen iew (x1) { "name 1", "haoxinsheng 1",

“name2”,”haoxiansheng2”,}

and when receiving the broadcast of the uploaded scene information, sending the acquired text information of the current page to a server for semantic recognition of the voice information.

Optionally, when the browser view is displayed in a form of being embedded in the active page, the active page obtains the text information and the control list of the browser view through an internal interface. When the display page is a browser View, because the browser View WebView is taken as a page number of View, the page information collection mechanism of the browser View WebView is different from the Activity page Activity and the Dialog box Dialog, and needs to be adapted independently, in the whole-process voice control module, the browser View WebView is taken as a special sub-View of the Activity page Activity and is responsible for uploading page information to the Activity page Activity and providing an interface for the Activity to complete a corresponding control command. Therefore, when the browser view is displayed in a form of being embedded into the active page, the active page acquires the text information and the control list of the browser view through the internal interface. And then packaging the text information of the current browser view into a preset data format, such as a JSON data format, so as to meet a communication protocol with a server side. And when receiving the broadcast of the uploaded scene information, sending the acquired text information of the current browser view to a server for semantic recognition of the voice information.

Optionally, when the display page is the Dialog box view Dialog, acquiring the Dialog box view Dialog currently on the foreground, and registering receiving broadcast controlled by voice to realize interaction between the remote controller and the cloud. And performing page analysis to obtain controllable components arranged in the dialog box object for performing page analysis: text information and a controllable component list are extracted, and separation of texts and controls can be carried out on analyzed controls. And then packaging the text information of the current dialog box view into a preset data format, such as a JSON data format, so as to meet a communication protocol with a server side. And when receiving the broadcast of the uploaded scene information, sending the acquired text information of the current dialog box view to a server for semantic recognition of the voice information.

After the server identifies the text information of the displayed page, the identification result, namely the character string corresponding to the text information, is sent to the intelligent large-screen end, when the intelligent large-screen end receives the character string corresponding to the text information, a control instruction is analyzed according to the keywords corresponding to the voice information and the character string corresponding to the text information, and the control instruction is the control instruction corresponding to the character string matched with the keywords. And according to the component type of the current display page, executing the responding control time to finish the control of the current display page. For example, when the currently displayed page is an active page:

the activity _ Cmd (x1) { "play", "Haoxinsheng 01" }// play active page key event Mr. 01.

When the current display page is the browser view, executing the following steps:

WebViewCmd (x1) { "return", "pre page" }// return to the previous page.

When the current display page is the dialog view:

DialogViewCmd (x1) { "pause", "haoxinsheng 01" }// pause good mr. 01.

In the technical scheme provided by the embodiment of the invention, when the intelligent large-screen system receives voice information, the type of the current display page is obtained, the corresponding obtaining strategy is determined according to the type of the display page to obtain the corresponding text information, the text information is sent to the server for recognition, the recognition result of the text information is obtained after the recognition is finished, and the control event of response is executed according to the control instruction and the component type of the current display page to finish the control of the page. Therefore, the intelligent large-screen system can accurately acquire the text information of different display page types according to the display page types, and obtain the character string content corresponding to the control instruction according to the matching of the text information of the current display page and the keywords corresponding to the voice information, so that the effect of quickly identifying the keywords in the text information corresponding to the voice in the display page is realized, and the accuracy of semantic identification is improved.

Referring to fig. 4, fig. 4 is a functional module diagram of the semantic recognition device of the present invention.

As shown in fig. 4, the semantic recognition apparatus includes:

the identifying unit 201 is configured to identify text information corresponding to a currently displayed page when voice information is received, where the text information includes text information of a control corresponding to the display interface and text information displayed in the display interface;

a first obtaining unit 202, configured to obtain text information corresponding to the voice information;

a second obtaining unit 203, configured to obtain a character string in the text information, where the character string matches the text information;

and the execution unit 204 is configured to execute the control instruction corresponding to the matched character string.

The function implementation of each module in the semantic recognition device corresponds to each step in the embodiment of the semantic recognition method, and the functions and implementation processes are not described in detail here.

Embodiments of the present invention further provide a computer-readable storage medium, on which a semantic recognition program is stored, where the semantic recognition program, when executed by a processor, implements the steps of the semantic recognition method described above.

Therefore, the intelligent large-screen system can obtain the character string content corresponding to the control instruction according to the matching of the text information of the current display page and the keywords corresponding to the voice information, and the effect of quickly identifying the keywords in the text information corresponding to the voice in the display page is achieved.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A semantic recognition method, characterized in that the semantic recognition method comprises the steps of:

acquiring character information corresponding to the voice information;

2. The semantic recognition method of claim 1, wherein prior to recognizing corresponding textual information for the currently displayed page, further comprising:

3. The semantic recognition method of claim 1, wherein the obtaining of the character string in the text information that matches the text information comprises:

and determining a character string matched with the keyword.

4. The semantic recognition method of claim 2, wherein the recognizing the text information corresponding to the currently displayed page comprises:

acquiring text information and a control list of the display page;

packaging the text information and the control list into a preset data format;

5. The semantic recognition method of claim 2 or 4, wherein the obtaining of the text information of the display page and the control list comprises:

6. The semantic recognition method according to claim 1, wherein the step of executing the control instruction corresponding to the matched character string comprises:

acquiring a control instruction corresponding to the matched character string;

and executing the control instruction selected through the selection interface.

7. The semantic recognition method according to claim 1, wherein the step of obtaining the character string in the text information that matches the text information further comprises:

8. A semantic recognition device, comprising:

9. A semantic recognition device comprising a memory, a processor and a semantic recognition program stored in the memory and executable on the processor, the semantic recognition program when executed by the processor implementing the steps of the semantic recognition method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a semantic recognition program, which when executed by a processor implements the steps of the semantic recognition method according to any one of claims 1 to 7.