CN117807178A

CN117807178A - Display device and adaptation method of semantic engine

Info

Publication number: CN117807178A
Application number: CN202311264673.3A
Authority: CN
Inventors: 徐侃; 朱飞
Original assignee: Vidaa Netherlands International Holdings BV
Current assignee: Vidaa Netherlands International Holdings BV
Priority date: 2023-09-27
Filing date: 2023-09-27
Publication date: 2024-04-02

Abstract

The application provides a display device and a semantic engine adaptation method, wherein the method comprises the following steps: receiving a voice instruction and converting the voice instruction into text information; the method comprises the steps of sending text information to a plurality of semantic engines in display equipment, and obtaining a plurality of semantic recognition results of the semantic engines after text information recognition; calculating the matching degree of the semantic recognition result based on a machine learning classification algorithm, and determining a target semantic engine according to the matching degree, wherein the target semantic engine is the semantic engine corresponding to the semantic recognition result with the highest matching degree; and outputting a target recognition result of the target semantic engine on the text information. The display device can determine a target semantic engine according to semantic recognition results output by each semantic engine through a machine learning classification algorithm, and the target semantic engine outputs a target recognition result which has highest matching degree and is most in line with a user voice instruction, so that the problems that the matching degree of voice recognition is low and the user semantics cannot be accurately recognized are solved.

Description

Display device and adaptation method of semantic engine

Technical Field

The application relates to the technical field of display equipment, in particular to an adaptation method of display equipment and a semantic engine.

Background

Natural language processing is a processing method for interactive communication with a machine by using natural language used by user communication. For example, sentences of natural language, syntax structures, understanding semantics of natural language, and the like are learned by means of machine learning, thereby obtaining a language that is machine-readable and understood.

In order to realize the function of understanding natural language, for example, understanding the voice content of the user, a voice assistant can be arranged in the display device, and a semantic engine can be arranged in the voice assistant, so that the voice content of the user can be identified through the semantic engine. Part of the display device comprises three voice assistants, wherein one voice assistant is integrated at the cloud end, and the other two assistants are integrated at the terminal of the display device. Another part of the display device also comprises three voice assistants, but all three are integrated in the terminal of the display device.

When the voice assistant is integrated in the terminal of the display device, the memory of the terminal is consumed, and terminal resources are occupied. In addition, when the voice function is used, only one voice assistant can be selected, so that under different use scenes, the same voice engine executes semantic recognition, and the problems that the matching degree of the voice recognition is low and the user semantic cannot be accurately recognized are caused.

Disclosure of Invention

Some embodiments of the present application provide a method for adapting a display device and a semantic engine, so as to solve the problems that the matching degree of speech recognition is low and the user semantic cannot be accurately recognized.

In a first aspect, some embodiments of the present application provide a display device, including:

a display configured to display a user interface;

a controller configured to:

receiving a voice instruction and converting the voice instruction into text information;

sending the text information to a plurality of semantic engines in a display device, and acquiring a plurality of semantic recognition results of the semantic engines after the text information is recognized;

calculating the matching degree of the semantic recognition result based on a machine learning classification algorithm, and determining a target semantic engine according to the matching degree, wherein the target semantic engine is the semantic engine corresponding to the semantic recognition result with the highest matching degree;

and outputting a target recognition result of the target semantic engine on the text information.

In some possible implementations, the controller performs the step of converting the voice instruction into text information, further configured to:

extracting a voice signal in the voice command;

Converting the voice signal into a digital signal;

performing ambient noise elimination and signal enhancement on the digital signal to obtain a preprocessed signal;

and converting the preprocessing signal into text information through a text recognition model.

In some possible implementations, the controller is further configured to:

acquiring terminal parameters of display equipment;

setting adaptation information of a semantic engine according to the terminal parameters;

detecting the supporting state of the semantic engine on the text information according to the adaptation information; the supporting states comprise a supporting text state and a non-supporting text state;

if the supporting state is a supporting text state, generating a semantic recognition result of the semantic engine on the text information;

and if the supporting state is the non-supporting text state, generating a prompting message for prompting the non-supporting text state.

In some possible implementations, the controller is further configured to:

acquiring characteristic categories in the classification model; the classification model is used for executing feature analysis on the voice command and outputting a semantic recognition result corresponding to the voice command through a semantic engine;

extracting characteristic information in the voice instruction;

Acquiring the number of engines of the semantic engine;

and performing feature stitching on the feature category, the feature information and the engine number to obtain feature vectors.

In some possible implementations, the controller performs the step of calculating a degree of matching of the semantic recognition results based on a machine learning classification algorithm, and determining a target semantic engine from the degree of matching, further configured to:

inputting the feature vector into the classification model;

performing offline learning on the classifier based on the classification model to obtain an offline learning result;

calculating a semantic recognition result and a matching degree of the semantic engine translation according to the offline learning result and the feature vector;

and determining a target semantic engine according to the semantic recognition result and the matching degree.

In some possible implementations, the controller is further configured to:

acquiring a user log about voice instructions;

invoking the semantic engine offline, and recognizing the voice command through the semantic engine to obtain a semantic recognition result of the voice command;

marking the semantic recognition result based on the feature class, and obtaining a marking label related to the semantic recognition result;

And screening training data according to the marking labels according to the preset data proportion.

In some possible implementations, the classification model is a distributed gradient enhancement library model, and the controller is further configured to:

acquiring a plurality of decision trees in the distributed gradient enhancement library model;

obtaining a prediction result of a target decision tree on the training data, wherein the target decision tree is any decision tree except a last decision tree; when the target decision tree is a non-first decision tree, the prediction result comprises the prediction deviation of the decision tree before the target decision tree; the prediction deviation is calculated according to an objective function;

inputting the prediction result into a next decision tree;

and outputting the prediction result of each decision tree on the training data.

In some possible implementations, the controller is further configured to:

acquiring new sample data to be predicted;

inputting the new sample data into each decision tree in turn, and generating a prediction result by each decision tree;

and accumulating the predicted result to take the accumulated predicted result as a final predicted result of the new sample data.

In some possible implementations, the controller is further configured to:

transmitting the text information and the target identification result to a message queue of the display device;

controlling the display to display the target recognition result, and enabling the semantic engine to asynchronously subscribe to the target recognition result in the message queue;

and training the semantic engines by taking the target recognition result as an output value of each semantic engine according to the text information as an input value of the semantic engine.

In a second aspect, some embodiments of the present application provide a method for adapting a semantic engine, which may be applied to the display device of the first aspect, where the method for adapting a semantic engine includes:

As can be seen from the above technical solutions, some embodiments of the present application provide a method for adapting a display device and a semantic engine, where the method includes: receiving a voice instruction and converting the voice instruction into text information; the method comprises the steps of sending text information to a plurality of semantic engines in display equipment, and obtaining a plurality of semantic recognition results of the semantic engines after text information recognition; calculating the matching degree of the semantic recognition result based on a machine learning classification algorithm, and determining a target semantic engine according to the matching degree, wherein the target semantic engine is the semantic engine corresponding to the semantic recognition result with the highest matching degree; and outputting a target recognition result of the target semantic engine on the text information. The display device can determine a target semantic engine according to semantic recognition results output by each semantic engine through a machine learning classification algorithm, and the target semantic engine outputs a target recognition result which has highest matching degree and is most in line with a user voice instruction, so that the problems that the matching degree of voice recognition is low and the user semantics cannot be accurately recognized are solved.

Drawings

In order to more clearly illustrate some embodiments of the present application or technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of an operation scenario between a display device and a control device provided in some embodiments of the present application;

FIG. 2 is a block diagram of a hardware configuration of a display device provided in some embodiments of the present application;

FIG. 3 is a block diagram of a hardware configuration of a control device provided in some embodiments of the present application;

fig. 4 is a schematic diagram of software configuration in a display device according to some embodiments of the present application;

FIG. 5 is a flowchart illustrating an adaptation method for a display device to execute a semantic engine according to some embodiments of the present application;

fig. 6 is a schematic flow chart of converting a voice command into text information by a display device according to some embodiments of the present application;

FIG. 7 is a schematic flow chart of a semantic engine supporting text information status detection according to some embodiments of the present application;

FIG. 8 is a schematic diagram of a hint that the semantic engine does not support text status according to some embodiments of the present application;

fig. 9 is a schematic flow chart of generating feature vectors according to a voice command by a display device according to some embodiments of the present application;

fig. 10 is a schematic flow diagram of a semantic engine corresponding to a semantic recognition result with the highest matching degree, where the display device provided in some embodiments of the present application calculates the matching degree of the semantic recognition result based on a machine learning classification algorithm, and determines a target semantic engine according to the matching degree;

FIG. 11 is a schematic diagram of a scenario in which a display device according to some embodiments of the present application performs decisions based on a machine learning classification algorithm;

FIG. 12 is a flow chart of a display device training semantic engine provided in some embodiments of the present application;

FIG. 13 is a flowchart of a decision tree based prediction of training data according to some embodiments of the present application;

FIG. 14 is a flowchart of a display device according to some embodiments of the present application calculating a prediction result for new sample data through a decision tree;

fig. 15 is a flowchart of an adaptation method of a semantic engine according to some embodiments of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of some embodiments of the present application more clear, the technical solutions of some embodiments of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application.

It should be noted that the brief description of the terms in some embodiments of the present application is only for convenience in understanding the embodiments described below, and is not intended to limit the implementation of some embodiments of the present application. Unless otherwise indicated, these terms should be construed in their ordinary and customary meaning.

The terms first, second, third and the like in the description and in the claims and in the above-described figures are used for distinguishing between similar or similar objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.

The terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements explicitly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The term "module" refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware or/and software code that is capable of performing the function associated with that element.

Fig. 1 is a schematic diagram of an operation scenario between a display device and a control device according to some embodiments of the present application. As shown in fig. 1, a user may operate the display device 200 through the mobile terminal 300 and the control device 100.

In some embodiments, the control device 100 may be a remote control, and the communication between the remote control and the display device may include infrared protocol communication or bluetooth protocol communication, and other short-range communication modes, etc., to control the display device 200 in a wireless mode or other wired mode. The user may control the display device 200 by inputting user instructions through keys on a remote control, voice input, control panel input, etc.

In some embodiments, the mobile terminal 300 may install a software application with the display device 200, implement connection communication through a network communication protocol, and achieve the purpose of one-to-one control operation and data communication. The audio/video content displayed on the mobile terminal 300 can also be transmitted to the display device 200, so as to realize the synchronous display function.

As also shown in fig. 1, the display device 200 is also in data communication with the server 400 via a variety of communication means. The display device 200 may be permitted to make communication connections via a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks.

The display apparatus 200 may additionally provide a smart network television function of a computer support function, including, but not limited to, a network television, a smart television, an Internet Protocol Television (IPTV), etc., in addition to the broadcast receiving television function.

Fig. 2 is a block diagram of a hardware configuration of a display device according to some embodiments of the present application.

In some embodiments, display apparatus 200 includes at least one of a modem 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, memory, a power supply, a user interface.

In some embodiments, the detector 230 is used to collect signals of the external environment or interaction with the outside.

In some embodiments, the display 260 includes a display screen component for presenting a picture, and a driving component for driving an image display, for receiving an image signal from the controller output, for displaying video content, image content, and components of a menu manipulation interface, and a user manipulation UI interface, etc.

In some embodiments, communicator 220 is a component for communicating with external devices or servers according to various communication protocol types.

In some embodiments, the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored on the memory. The controller 250 controls the overall operation of the display apparatus 200.

In some embodiments, a user may input a user command through a Graphical User Interface (GUI) displayed on the display 260, and the user input interface receives the user input command through the Graphical User Interface (GUI).

In some embodiments, user interface 280 is an interface that may be used to receive control inputs.

Fig. 3 is a block diagram of a hardware configuration of a control device according to some embodiments of the present application. As shown in fig. 3, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface, a memory, and a power supply.

The control device 100 is configured to control the display device 200, and can receive an input operation instruction of a user, and convert the operation instruction into an instruction recognizable and responsive to the display device 200, functioning as an interaction between the user and the display device 200.

In some embodiments, the control device 100 may be a smart device. Such as: the control apparatus 100 may install various applications for controlling the display apparatus 200 according to user's needs.

In some embodiments, as shown in fig. 1, a mobile terminal 300 or other intelligent electronic device may function similarly to the control device 100 after installing an application that manipulates the display device 200.

The controller 110 includes a processor 112 and RAM 113 and ROM 114, a communication interface 130, and a communication bus. The controller 110 is used to control the operation and operation of the control device 100, as well as the communication collaboration among the internal components and the external and internal data processing functions.

The communication interface 130 enables communication of control signals and data signals with the display device 200 under the control of the controller 110. The communication interface 130 may include at least one of a WiFi chip 131, a bluetooth module 132, an NFC module 133, and other near field communication modules.

A user input/output interface 140, wherein the input interface includes at least one of a microphone 141, a touchpad 142, a sensor 143, keys 144, and other input interfaces.

In some embodiments, the control device 100 includes at least one of a communication interface 130 and an input-output interface 140. The control device 100 is provided with a communication interface 130 such as: the WiFi, bluetooth, NFC, etc. modules may send the user input instruction to the display device 200 through a WiFi protocol, or a bluetooth protocol, or an NFC protocol code.

A memory 190 for storing various operation programs, data and applications for driving and controlling the control device 100 under the control of the controller. The memory 190 may store various control signal instructions input by a user.

A power supply 180 for providing operating power support for the various elements of the control device 100 under the control of the controller.

Fig. 4 is a schematic view of software configuration in a display device according to some embodiments of the present application, in some embodiments, the system is divided into four layers, namely, an application layer (application layer), an application framework layer (Application Framework layer), a An Zhuoyun line (Android run) layer and a system library layer (system runtime layer), and a kernel layer from top to bottom.

In some embodiments, at least one application program is running in the application program layer, and these application programs may be a Window (Window) program of an operating system, a system setting program, a clock program, a camera application, and the like; or may be an application developed by a third party developer.

The framework layer provides an application programming interface (Aplication Pogramming Iterface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions. The application framework layer corresponds to a processing center that decides to let the applications in the application layer act.

As shown in fig. 4, the application framework layer in the embodiment of the present application includes a manager (manager), a Content Provider (Content Provider), a View System (View System), and the like.

In some embodiments, the activity manager is to: managing the lifecycle of the individual applications and typically the navigation rollback functionality.

In some embodiments, a window manager is used to manage all window programs.

In some embodiments, the system runtime layer provides support for the upper layer, the framework layer, and when the framework layer is accessed, the android operating system runs the C/C++ libraries contained in the system runtime layer to implement the functions to be implemented by the framework layer.

In some embodiments, the kernel layer is a layer between hardware and software. As shown in fig. 4, the kernel layer contains at least one of the following drivers: audio drive, display drive, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (e.g., fingerprint sensor, temperature sensor, touch sensor, pressure sensor, etc.), and the like.

In some embodiments, the kernel layer further includes a power driver module for power management.

In some embodiments, the software programs and/or modules corresponding to the software architecture in fig. 4 are stored in the first memory or the second memory shown in fig. 2 or fig. 3.

The above embodiments show the hardware/software architecture, functional implementation, and the like of the display device 200. Based on the above-described display device 200, natural language processing can be realized. Natural language processing is a processing method for interactive communication with a machine by using natural language used by user communication. For example, sentences of natural language, syntax structures, understanding semantics of natural language, and the like are learned by means of machine learning, thereby obtaining a language that is machine-readable and understood.

To implement a function of understanding natural language, for example, understanding the voice content of the user, a voice assistant may be provided in the display device 200, and a semantic engine may be provided in the voice assistant, through which the voice content of the user may be recognized. Part of the display device 200 includes three voice assistants, one of which is integrated in the cloud and the other two of which are integrated in the terminals of the display device 200. Another portion of the display device 200 also includes three voice assistants, but all three are integrated at the terminal of the display device 200.

In order to solve the problem that the matching degree of voice recognition is low and the user semantics cannot be accurately recognized, some embodiments of the present application provide a display device 200, where the display device 200 includes a display 260 and a controller 250, where the display 260 is configured to display a user interface, and the controller 250 is configured to execute an adaptation method of a semantic engine. The display device 200 can determine a target semantic engine according to the semantic recognition results output by each semantic engine through a machine learning classification algorithm, and the target semantic engine outputs a target recognition result which has the highest matching degree and is most in line with the voice instruction of the user, so that the problems that the matching degree of voice recognition is low and the user semantic cannot be accurately recognized are solved. Moreover, in the present application, it is not necessary to integrate a voice assistant in the terminal, so that terminal resources of the display device 200 can be saved.

In order to facilitate understanding of the technical solutions in some embodiments of the present application, the following details of each step are described with reference to some specific embodiments and the accompanying drawings. Fig. 5 is a flowchart of an adaptation method of a display device to execute a semantic engine according to some embodiments of the present application, as shown in fig. 5, in some embodiments, when the display device 200 executes the adaptation method of the semantic engine, the following steps S1-S4 may be included, which specifically includes the following steps:

Step S1: the display device 200 receives the voice instruction and converts the voice instruction into text information.

In some embodiments, the three VOICE assistants in the display device 200 may be a VOICE VOICE Assistant, an Alexa VOICE Assistant, and an Assistant VOICE Assistant, respectively, see Table 1, table 1 is a comparative example table of three VOICE assisted media searches, device controls, response speeds:

table 1: three voice-aided media searching, equipment control and response speed comparison example table

As shown in table 1, each voice assistant has different dimensional conditions for searching media, control content for the device, and response speed for the command, which results in different semantic recognition results. If the voice engine in the voice assistant recognizes the voice command of the user under different use scenes, the problem that the matching degree of the voice recognition is low and the user semantics cannot be accurately recognized can be caused.

Based on this scenario, in the embodiment of the present application, the display device 200 may determine, from a plurality of semantic engines, a target recognition result with the highest matching degree that best meets the user request, and return the target recognition result to the display device 200, and may train each semantic engine according to the target recognition result, so as to improve the semantic understanding capability of semantics.

In some embodiments, the voice instructions input by the user may be received by a voice assistant in the display device 200, and the voice instructions may be converted into text information upon receipt. Fig. 6 is a schematic flow chart of converting a voice command into text information by the display device according to some embodiments of the present application, as shown in fig. 6, when the display device 200 converts the voice command into the text information, firstly, a voice signal in the voice command can be extracted, then the voice signal is converted into a digital signal, then, the digital signal is subjected to elimination of environmental noise and signal enhancement, a pre-processing signal is obtained, and finally, the pre-processing signal is converted into the text information by a text recognition model.

The display device 200, for example, first extracts a voice signal in a voice instruction, and then converts the voice signal into a digital signal using a digital signal processing technique. The digital signal processing technology is a mode for converting analog information such as sound, video and pictures into digital information, and can be based on digital signal processing theory, hardware technology and software technology to research digital signal processing algorithm and implementation method thereof. After conversion to a digital signal, a preprocessing operation may be performed on the digital signal. For example, the method can eliminate environmental noise, enhance signals and the like, and can also comprise other operation contents, the method is not particularly limited in the application, the preprocessing signals are obtained after the preprocessing operation is finished, and the accuracy of voice instruction recognition can be improved through the preprocessing operation. Finally, the preprocessed signal may be converted into text information using a deep learning based cyclic word recognition model. After the completion of the execution of step S1, the following step S2 may be executed.

Step S2: the display device 200 transmits the text information to a plurality of semantic engines in the display device, and acquires a plurality of semantic recognition results of the semantic engines after the text information is recognized.

In order to obtain recognition results of the voice command input by the user, in some embodiments, after the display device 200 converts the voice command into text information, the text information may be sent to a plurality of semantic engines in the display device 200, and a plurality of semantic recognition results of the text information after the semantic engines recognize the text information are obtained.

In some embodiments, the semantic engine can make natural language have semantic logic relationship by performing semantic annotation on the resource object and performing semantic processing on the voice instruction or query expression of the user, so that extensive and effective semantic reasoning can be performed on the natural language, and the retrieval and analysis of the user requirements are more accurately and comprehensively realized.

After the text information is transmitted to the plurality of semantic engines in the display apparatus 200, the semantic engines may recognize the input text information and output corresponding semantic recognition results. For example, if there are three semantic engines, after text information is input to the three semantic engines, each semantic engine corresponds to one semantic recognition result, and the three semantic engines correspond to three semantic recognition results in total.

In some embodiments, some of the semantic engines may not support text states, and the semantic engines will not recognize the semantics of the text information, and therefore, to recognize whether the semantic engines support text states, the display device 200 may perform the following procedure. Fig. 7 is a schematic flow chart of detecting a supporting state of a text message by a semantic engine according to some embodiments of the present application, as shown in fig. 7, first a terminal parameter of a display device 200 may be obtained, then an adaptation information of the semantic engine may be set according to the terminal parameter, and a supporting state of the semantic engine to the text message is detected according to the adaptation information, where the supporting state may include a supporting text state and an unsupported text state, if the supporting state is the supporting text state, the display device 200 may generate a semantic recognition result of the semantic engine to the text message, and if the supporting state is the unsupported text state, the display device 200 may generate a prompt message for prompting the unsupported text state.

For example, after the display device 200 sends the text information to the plurality of semantic engines at the same time, whether the semantic engines support the text information may set the terminal parameters through the parameter management platform, for example, the parameter management platform sets the adaptation information of the semantic engines, the supported type, the unsupported type, and the like, and the semantic engines may return the results corresponding to the supported states, so that whether the semantic engines support the text states may be obtained. If so, the text information can be identified by the semantic engine, and if not, referring to fig. 8, fig. 8 is a schematic diagram of a prompt message provided in some embodiments of the present application that the semantic engine does not support text status, and the display device 200 displays the prompt message shown in fig. 8, so as to remind the user that the semantic engine does not support the function of identifying the text information through the prompt message. After the completion of the execution of step S2, the following step S3 may be executed.

Step S3: the display device 200 calculates the matching degree of the semantic recognition result based on a machine learning classification algorithm, and determines a target semantic engine according to the matching degree, wherein the target semantic engine is the semantic engine corresponding to the semantic recognition result with the highest matching degree.

In order to screen out the target recognition results corresponding to the target semantic engine that best meets the user's needs from the plurality of semantic recognition results, in some embodiments, the display device 200 may determine the target semantic engine from the plurality of semantic recognition results based on a machine learning classification algorithm.

Before determining the target semantic engine, in some embodiments, the display device 200 may perform the following procedure. Fig. 9 is a schematic flow chart of generating feature vectors by a display device according to a voice command according to some embodiments of the present application, as shown in fig. 9, in order to analyze feature information included in a voice command of a user, in some embodiments, the display device 200 may obtain feature categories in a classification model, where the classification model may be used to perform feature analysis on the voice command, output a semantic recognition result corresponding to the voice command through a semantic engine, and then extract feature information in the voice command, obtain the number of engines of the semantic engine, and perform feature stitching on the feature categories, the feature information, and the number of engines to obtain feature vectors.

Table 2 is a schematic table of feature classes provided in some embodiments of the present application, and in some embodiments, feature classes in the classification model may be preset, as shown in table 2:

table 2: schematic table of feature classes

Wherein, 7 types of features are defined in table 2, i in the name column represents the ith semantic engine, i is a positive integer, and if N semantic engines exist, training data are input in the classification model, each piece of training data has 7N-dimensional features, wherein N is a positive integer. If n=3, each training data will have 21-dimensional features. It should be noted that table 2 is only an exemplary illustration, and in a specific usage scenario, the number of feature categories, the calculation method, the description of the features, and the like may be set according to actual requirements, which is not limited in this application.

When the voice command is input into the classification model, the classification model can acquire feature information in the voice command, and feature stitching is performed according to the feature information, the number of semantic engines and feature categories to obtain feature vectors. After the feature vector is obtained, a target semantic engine can be determined according to the semantic recognition result through a classification model based on a machine learning classification algorithm.

Fig. 10 is a schematic flow diagram of a semantic engine corresponding to a semantic recognition result with a highest matching degree, as shown in fig. 10, in some embodiments, in order to determine the semantic engine, a display device 200 may first input a feature vector into a classification model, then perform offline learning on a classifier based on the classification model, obtain an offline learning result, calculate the semantic recognition result and the matching degree translated by the semantic engine according to the offline learning result and the feature vector, and finally determine the semantic engine according to the semantic recognition result and the matching degree. The classification model is a model which is trained offline based on training data, and the classification model can assist online classification decision through offline learning of a bisection type. For example, the machine learning classification algorithm may assist the classification model in making an analysis decision, where the input may be a feature vector and the output may be the number of each semantic engine and the corresponding degree of matching.

Exemplary, fig. 11 is a schematic view of a scenario in which a display device provided in some embodiments of the present application performs a decision based on a machine learning classification algorithm, as shown in fig. 11, after feature information is extracted from a voice command of a user, feature information and feature categories may be spliced by combining the number of specific semantic engines, and feature vectors shown in fig. 11 may be formed after the feature information and the feature categories are spliced, and then the feature vectors are input into a classification model, and a matching degree result of each semantic engine for user requirement recognition is calculated through the classification model. For example, three semantic engines can be set, through the decision of the classification model, the matching degree of the semantic engine 1 is 0.85, the matching degree of the semantic engine 2 is 0.11, and the matching degree of the semantic engine 3 is 0.04, it can be understood that the execution degree of the semantic engine 1 is the highest, which means that the semantic recognition result output by the semantic engine 1 is the most consistent with the requirement of the user, so the semantic engine 1 can be selected as the target semantic engine.

It should be noted that, in the process of determining the target speech engine by the display device 200, the target semantic engine may be determined without switching the speech assistant, which may be convenient for the user to use. That is, after the plurality of semantic engines output the plurality of semantic recognition results, the classification model can calculate the semantic recognition result of each semantic engine on the voice instruction input by the user through decision analysis, and judge the target semantic engine according to the matching degree, wherein the semantic recognition result corresponding to the target semantic engine is the result which is the most in line with the user requirement. After the completion of the execution of step S3, the following step S4 may be executed.

Step S4: the display device 200 outputs a target recognition result of the text information by the target semantic engine.

After the display device 200 determines the target semantic engine, a target recognition result of the text information corresponding to the target semantic engine can be output, where the target recognition result is the result that best meets the user requirement.

To obtain training data for training the classification model, in some embodiments, the display device 200 may perform the following procedure. Firstly, the display device 200 can acquire a user log about a voice instruction, then can call a semantic engine offline, and recognizes the voice instruction through the semantic engine to obtain a semantic recognition result of the voice instruction, marks the semantic recognition result based on the feature class to obtain a mark tag about the semantic recognition result, and finally screens training data according to the mark tag according to a preset data proportion.

For example, the classification model may be trained in a supervised training manner, and the training data may be obtained by first obtaining a user log about voice instructions from an online user log, and then invoking N semantic engines online, where each semantic engine may identify the voice instructions in the user log. The voice command can be multiple, however, the voice engine recognizes one by one during recognition, and obtains a corresponding semantic recognition result. In the recognition process, a preset voice category can be combined to obtain a tag label related to a semantic recognition result, a corresponding label can be marked, and the content of the tag label is not limited in this application, for example, the tag label of a semantic engine can be included. After the label marking is finished, voice instructions in the user logs recognized by each semantic engine can be calculated through screening the labels, and training data can be screened according to a certain proportion. Therefore, the number of training data under each semantic engine can be obtained, the basic balance of the training data proportion of each semantic engine can be ensured by adjusting the preset proportion, and the problem of inaccurate recognition results caused by unbalanced training data is avoided.

Fig. 12 is a schematic flow chart of training a semantic engine by using a display device according to some embodiments of the present application, as shown in fig. 12, in some embodiments, the display device 200 may send a voice command and a target recognition result to a message queue of the display device 200, and control the display 260 to display the target recognition result, and the semantic engine asynchronously subscribes to the target recognition result in the message queue, and trains each semantic engine according to text information as an input value of the semantic engine, and the target recognition result is used as an output value of each semantic engine.

For example, after the target semantic engine outputs the target recognition result of the text information, the target recognition result may be returned to the terminal of the display device 200, and at the same time, the text information after the conversion of the voice command and the corresponding output such as the target recognition result may be used as training sentences of all the semantic engines, so as to train each semantic engine, thereby improving the semantic understanding capability of the semantic engine. Meanwhile, after receiving the target recognition result through the voice assistant, the target recognition result is displayed, text information and the target recognition result are sent to the message queue of the display device 200, each semantic engine can asynchronously subscribe to the message in the message queue, the text information is used as input, and the target recognition result is used as an expected result to train the semantic engine, so that through training of the semantic engines, the advantage of mutually learning each other can be achieved between the semantic engines, and the semantic recognition capability of each semantic engine is improved.

To improve the recognition efficiency of the classification model, in some embodiments, the classification model may be a distributed gradient enhancement library model. For example, the classification model may be XGBoost, XGBoost an improvement of the gradient enhanced decision tree (Gradient Boosting Decision Tree, GBDT), where X in XGBoost represents the eXtreme eXtreme, which may be a faster, more efficient training model.

In some embodiments, the basic constituent elements of XGboost are decision trees, which can be understood as learners that together constitute XGboost. Fig. 13 is a schematic flow chart of a decision tree-based prediction result of training data according to some embodiments of the present application, as shown in fig. 13, the display device 200 may first obtain a plurality of decision trees in a distributed gradient enhancement library model, and then obtain a prediction result of training data by a target decision tree, where the target decision tree is any decision tree other than the last decision tree. When the target decision tree is a non-first decision tree, the prediction result comprises the prediction deviation of the previous decision tree of the target decision tree, wherein the prediction deviation is calculated according to the objective function, the prediction result is input into the next decision tree, and finally the prediction result of each decision tree on training data is output.

Illustratively, the decision trees that make up XGBoost are sequential. In some embodiments, the generation of the latter decision tree may take into account the prediction result of the former decision tree, i.e. the latter decision tree may take into account the prediction bias of the former decision tree, wherein the prediction bias may be calculated from an objective function.

Fig. 14 is a flow chart of calculating a prediction result by using a decision tree for new sample data by using a display device according to some embodiments of the present application, as shown in fig. 14, in some embodiments, when new sample data to be predicted is available, the display device 200 may obtain the new sample data to be predicted, then sequentially input the new sample data to each decision tree, each decision tree generates a prediction result, then accumulates each prediction result, and uses the accumulated prediction result as a final prediction result of the new sample data.

For example, when predicting the prediction result of the new sample data through the decision tree, the new sample data may enter each decision tree of XGBoost in turn. In the first decision tree, there is a predictive value, in the second decision tree, there is a predictive value, and so on, until all decision trees are traversed, i.e., all learners are entered. And finally, adding the values in each decision tree to obtain the final prediction result. In calculating the prediction bias by the objective function, it can be calculated as follows. For example, the objective function of the t-th decision tree may be as follows:

Where t is a non-first decision tree, i.e. not the first decision tree. It can be understood that when the decision tree is the first one, the decision tree has no prediction result of the previous decision tree, and t can be regarded as any decision tree, and when the decision tree is the first one, the prediction result of the previous decision tree does not exist. i represents new sample data to be identified,the predicted value of the sample i representing the 'previous t-1 decision tree' is obtained by adding the predicted value of each decision tree in the previous t-1 decision tree; yi represents the actual value of sample i; f (f) _t (x _i ) Representing the predicted value of the t th decision tree to the sample i; Ω (ft) represents the model complexity of the t-th tree.

As can be seen from the above technical solutions, the above embodiments provide a display device 200, where the display device 200 receives a voice command and converts the voice command into text information; the method comprises the steps of sending text information to a plurality of semantic engines in display equipment, and obtaining a plurality of semantic recognition results of the semantic engines after text information recognition; calculating the matching degree of the semantic recognition result based on a machine learning classification algorithm, and determining a target semantic engine according to the matching degree, wherein the target semantic engine is the semantic engine corresponding to the semantic recognition result with the highest matching degree; and outputting a target recognition result of the target semantic engine on the text information. The display device can determine a target semantic engine according to semantic recognition results output by each semantic engine through a machine learning classification algorithm, and the target semantic engine outputs a target recognition result which has highest matching degree and is most in line with a user voice instruction, so that the problems that the matching degree of voice recognition is low and the user semantics cannot be accurately recognized are solved.

Based on the display device 200 described above, some embodiments of the present application also provide a method for adapting a semantic engine, which may be applied to the display device 200 in the above embodiments, the display device 200 may include a display 260 for displaying a user interface and a controller 250 for executing the method for adapting a semantic engine. Fig. 15 is a flowchart of an adaptation method of a semantic engine according to some embodiments of the present application, as shown in fig. 15, in some embodiments, the adaptation method of the semantic engine may include the following steps:

In this embodiment of the present application, the display device 200 may determine, from a plurality of semantic engines, a target recognition result with the highest matching degree that best meets the user request, and return the target recognition result to the display device 200, and may train each semantic engine according to the target recognition result, so as to improve the semantic understanding capability of semantics.

In some embodiments, the voice instructions input by the user may be received by a voice assistant in the display device 200, and the voice instructions may be converted into text information upon receipt. When the display device 200 converts the voice command into text information, firstly, a voice signal in the voice command can be extracted, then the voice signal is converted into a digital signal, then the digital signal is subjected to environmental noise elimination and signal enhancement to obtain a pre-processing signal, and finally, the pre-processing signal is converted into the text information through a text recognition model.

In some embodiments, some of the semantic engines may not support text states, and the semantic engines will not recognize the semantics of the text information, and therefore, to recognize whether the semantic engines support text states, the display device 200 may perform the following procedure. Firstly, terminal parameters of the display device 200 may be acquired, then, adapting information of the semantic engine may be set according to the terminal parameters, and a supporting state of the semantic engine to the text information may be detected according to the adapting information, wherein the supporting state may include a supporting text state and a non-supporting text state, if the supporting state is the supporting text state, the display device 200 may generate a semantic recognition result of the semantic engine to the text information, and if the supporting state is the non-supporting text state, the display device 200 may generate a prompting message for prompting the non-supporting text state.

Before determining the target semantic engine, in some embodiments, the display device 200 may perform the following procedure. In order to analyze feature information contained in a voice command of a user, in some embodiments, the display device 200 may obtain feature categories in a classification model, where the classification model may be used to perform feature analysis on the voice command, output a semantic recognition result corresponding to the voice command through a semantic engine, extract feature information in the voice command, obtain the number of engines of the semantic engine, and perform feature stitching on the feature categories, the feature information, and the number of engines to obtain feature vectors.

In some embodiments, to determine the target semantic engine, the display device 200 may first input the feature vector to the classification model, then perform offline learning on the classifier based on the classification model, obtain an offline learning result, calculate the semantic recognition result and the matching degree of the semantic engine translation according to the offline learning result and the feature vector, and finally determine the target semantic engine according to the semantic recognition result and the matching degree. The classification model is a model which is trained offline based on training data, and the classification model can assist online classification decision through offline learning of a bisection type. For example, the machine learning classification algorithm may assist the classification model in making an analysis decision, where the input may be a feature vector and the output may be the number of each semantic engine and the corresponding degree of matching.

It should be noted that, in the process of determining the target speech engine by the display device 200, the target semantic engine may be determined without switching the speech assistant, which may be convenient for the user to use. That is, after the plurality of semantic engines output the plurality of semantic recognition results, the classification model can calculate the semantic recognition result of each semantic engine on the voice instruction input by the user through decision analysis, and judge the target semantic engine according to the matching degree, wherein the semantic recognition result corresponding to the target semantic engine is the result which is the most in line with the user requirement.

As can be seen from the above technical solutions, the above embodiments provide a method for adapting a semantic engine, where the method includes receiving a voice command and converting the voice command into text information; the method comprises the steps of sending text information to a plurality of semantic engines in display equipment, and obtaining a plurality of semantic recognition results of the semantic engines after text information recognition; calculating the matching degree of the semantic recognition result based on a machine learning classification algorithm, and determining a target semantic engine according to the matching degree, wherein the target semantic engine is the semantic engine corresponding to the semantic recognition result with the highest matching degree; and outputting a target recognition result of the target semantic engine on the text information. According to the method, the target semantic engine can be determined according to the semantic recognition results output by each semantic engine through the machine learning classification algorithm, and the target semantic engine outputs the target recognition result which has the highest matching degree and is most in line with the voice instruction of the user, so that the problems that the matching degree of voice recognition is low and the user semantic cannot be accurately recognized are solved. Moreover, in the method, the voice assistant is not required to be integrated in the terminal, so that terminal resources of the display device can be saved.

The same and similar parts of the embodiments in this specification are referred to each other, and are not described herein.

It will be apparent to those skilled in the art that the techniques of embodiments of the present invention may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied essentially or in parts contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method of the embodiments or some parts of the embodiments of the present invention.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A display device, characterized by comprising:

a display configured to display a user interface;

a controller configured to:

2. The display device of claim 1, wherein the controller performs the step of converting the voice instruction into text information, further configured to:

extracting a voice signal in the voice command;

converting the voice signal into a digital signal;

3. The display device of claim 1, wherein the controller is further configured to:

acquiring terminal parameters of display equipment;

4. The display device of claim 1, wherein the controller is further configured to:

extracting characteristic information in the voice instruction;

acquiring the number of engines of the semantic engine;

5. The display device of claim 4, wherein the controller performs the steps of calculating a degree of matching of the semantic recognition results based on a machine learning classification algorithm, and determining a target semantic engine based on the degree of matching, further configured to:

inputting the feature vector into the classification model;

6. The display device of claim 4, wherein the controller is further configured to:

acquiring a user log about voice instructions;

7. The display device of claim 6, wherein the classification model is a distributed gradient enhancement library model, the controller further configured to:

inputting the prediction result into a next decision tree;

8. The display device of claim 7, wherein the controller is further configured to:

acquiring new sample data to be predicted;

9. The display device of claim 1, wherein the controller is further configured to:

10. A method of adaptation of a semantic engine applied to a display device according to any of claims 1-9, the display device comprising a display and a controller, characterized in that the method of adaptation of a semantic engine comprises: