CN112820314A

CN112820314A - Intelligent voice control large screen display method, system and related components thereof

Info

Publication number: CN112820314A
Application number: CN202110031145.8A
Authority: CN
Inventors: 冯杰; 倪萌; 杜俊磊
Original assignee: Runlian Software System Shenzhen Co Ltd
Current assignee: Runlian Software System Shenzhen Co Ltd
Priority date: 2021-01-11
Filing date: 2021-01-11
Publication date: 2021-05-18

Abstract

The invention discloses an intelligent voice control large screen display method, a system and related components thereof, wherein the method comprises the following steps: collecting voice audio input by a user, extracting voice audio, and intercepting the voice audio into designated audio and required audio; training and decoding the required audio to obtain target characters; splitting the target character to obtain a special character, and judging the effectiveness of the special character; if the special word is valid, judging whether the special word belongs to the authority index, and if the special word is the authority index, calling the specified audio to judge whether the user has the viewing authority; and if the user has the viewing permission, pushing the special word to a large-screen display module for display, and if the user does not have the viewing permission, stopping the access. According to the method, whether the special character belongs to the permission index or not is judged, and whether the user has the permission to check or not is judged, so that how to display the input voice audio serving as the permission index is defined, large-screen display is more intelligent, application scenes are wider, and better user experience is provided for the user.

Description

Intelligent voice control large screen display method, system and related components thereof

Technical Field

The invention relates to the technical field of intelligent voice, in particular to an intelligent voice control large-screen display method, an intelligent voice control large-screen display system and related components thereof.

Background

With the rapid development of information-based construction and the coming of big data era, the requirements of various industries on index visualization are higher and higher. Through the electronic large screen, not only pictures, videos and the like need to be displayed for a user to watch, but also values behind mass data need to be mined and analyzed, and a manager is helped to find out relationships and rules behind the data, so that a basis is provided for decision making.

At present, a large-screen display control system used by most enterprises still needs to realize the display of index data by mouse clicking and frequent operation, and the steps are complicated and time is wasted. The voice control large-screen display system used by some enterprises has no authority setting for the contents displayed on a large screen, so that the electronic large screen can only display no-authority indexes and cannot meet the diversified requirements of users. In the existing voice control large screen display system, there are two limitations: 1. the authority of the speaker is not set, so that the electronic large screen can only display partial indexes; 2. when the display is carried out, only the content without the permission index can be displayed.

Disclosure of Invention

The embodiment of the invention provides an intelligent voice control large-screen display method, an intelligent voice control large-screen display system and related components thereof, and aims to solve the problems that in the prior art, the voice control large-screen does not set the authority of a speaker, so that index display is incomplete and contents needing authority indexes cannot be displayed.

In a first aspect, an embodiment of the present invention provides an intelligent voice-controlled large-screen display method, which includes:

collecting voice audio input by a user, extracting voice audio in the voice audio, intercepting the voice audio, acquiring specified audio and required audio, and storing the specified audio;

training and decoding the required audio to obtain target characters;

splitting the target character to obtain a special character, and judging the effectiveness of the special character;

if the special word is valid, judging whether the special word belongs to an authority index, and if the special word is the authority index, calling the specified audio to judge whether the user has a viewing authority;

and if the user has the viewing permission, pushing the special word to a large-screen display module for display, and if the user does not have the viewing permission, stopping access.

In a second aspect, an embodiment of the present invention provides an intelligent voice-controlled large-screen display system, which includes:

the audio acquisition unit is used for acquiring voice audio input by a user, extracting voice audio in the voice audio, intercepting the voice audio, acquiring specified audio and required audio, and storing the specified audio;

the target character acquisition unit is used for training and decoding the required audio to obtain target characters;

the target character splitting unit is used for splitting the target character to obtain a special character and judging the validity of the special character;

the user permission confirming unit is used for judging whether the special word belongs to permission indexes or not if the special word is valid, and calling the specified audio frequency to judge whether the user has viewing permission or not if the special word is a permission index;

and the large-screen display unit is used for pushing the special word to a large-screen display module for display if the user has the viewing permission, and stopping access if the user does not have the viewing permission.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the intelligent voice-controlled large-screen display method according to the first aspect.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the method for intelligent voice-controlled large-screen display according to the first aspect.

The embodiment of the invention provides an intelligent voice control large-screen display method, an intelligent voice control large-screen display system and related components thereof. The method comprises the steps of collecting voice audio input by a user, extracting voice audio in the voice audio, intercepting the voice audio, acquiring designated audio and required audio, and storing the designated audio; training and decoding the required audio to obtain target characters; splitting the target character to obtain a special character, and judging the effectiveness of the special character; if the special word is valid, judging whether the special word belongs to an authority index, and if the special word is the authority index, calling the specified audio to judge whether the user has a viewing authority; and if the user has the viewing permission, pushing the special word to a large-screen display module for display, and if the user does not have the viewing permission, stopping access. According to the embodiment of the invention, whether the special word in the voice audio input by the user belongs to the authority index is judged, and whether the user has the corresponding viewing authority is further judged, so that whether the special word is displayed on the large screen is determined, and the process of displaying the information when the information input by the user is the authority index is defined, so that the large screen display is more intelligent, the application scene is wider, and the user can obtain better use experience.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of an intelligent voice-controlled large-screen display method according to an embodiment of the present invention;

fig. 2 is a schematic block diagram of an intelligent voice-controlled large-screen display system according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1, fig. 1 is a schematic flow chart of an intelligent voice-controlled large-screen display method according to an embodiment of the present invention, where the method includes steps S101 to S105.

S101, voice audio input by a user is collected, voice audio in the voice audio is extracted, the voice audio is intercepted, designated audio and required audio are obtained, and the designated audio is stored;

in this step, after the voice audio is collected, the voice audio required in the voice audio is extracted, and then the voice audio is intercepted into a section of designated audio and a section of required audio. When a user inputs voice audio to the large screen, the large screen needs to be awakened first, after the large screen is named in advance, the name of the large screen needs to be input by voice each time the voice audio is input, and the voice audio is the designated audio. The appointed audio is stored in a voice storage unit inside the large screen, so that the appointed audio can be conveniently called at any time. The required audio is the content of the actual voice input. For example, if a large screen is named as "smart", when voice audio input is performed, the user inputs "smart" or "smart" to the large screen to activate a voice module in the large screen, and then continues to input content information to be input to the large screen.

In one embodiment, the acquiring voice audio input by a user and extracting human voice audio in the voice audio includes:

and performing model matching on the voice audio based on the voice activity detection of the Gaussian mixture model so as to distinguish the voice audio and the noise audio in the voice audio and extract the voice audio.

In this embodiment, in the process of inputting voice audio by a user, due to the fact that partial noise may exist due to environmental factors, model matching is performed on the voice audio by using voice activity detection based on a gaussian mixture model, so that human voice audio and noise audio in the voice audio are distinguished. Based on the voice activity detection of the Gaussian mixture model, the Gaussian mixture model of environmental noise and voice is established in a special spectrum space, and then the voice audio is distinguished by adopting a model matching method, so that the noise audio and the human voice audio are judged. If only noise audio exists in the voice input by the user, the voice audio is directly judged to be invalid, and a prompt of're-inputting the audio' is given.

S102, training and decoding the required audio to obtain target characters;

in this step, after the required audio is obtained, two stages of training and decoding are required to be performed on the required audio, so that the required audio is analyzed to obtain the target characters.

In one embodiment, the step S102 includes:

carrying out silence removal and framing pretreatment on the required audio, and extracting Mel cepstrum coefficient characteristics from the pretreated voice data;

and inputting the Mel cepstrum coefficient characteristics into a pre-trained acoustic model and a language model for decoding to obtain the target characters.

In this embodiment, the required audio is first preprocessed, including silence removal and framing, and then mel cepstrum coefficient features of the preprocessed voice data are extracted, and then the mel cepstrum coefficient features are decoded to obtain the target characters. Specifically, the beginning and the end of the required audio are subjected to silence removal to reduce interference, then the required audio subjected to silence removal is subjected to framing processing, so that a voice signal has short-time stationarity, and then mel cepstrum coefficient features (namely MCFF features are extracted from voice data subjected to framing processing, namely MCFF features are extracted, namely, each frame waveform is changed into a multi-dimensional vector according to physiological characteristics of human ears, and the vector can be simply understood to contain content information of the frame voice). After the preprocessing of the required voice is finished, inputting the Mel cepstrum coefficient characteristics extracted by preprocessing into a pre-trained acoustic model and a language model for decoding to obtain the target characters.

In a specific embodiment, the inputting the mel-frequency cepstrum coefficient characteristics into a pre-trained acoustic model and a language model for decoding to obtain the target text includes:

inputting the Mel cepstrum coefficient characteristics into a pre-trained acoustic model for characteristic decoding to obtain phoneme information;

searching a word or a word corresponding to the phoneme information in a pre-established dictionary;

and judging the probability of the phoneme information belonging to the corresponding character or word through a pre-trained language model, and selecting and outputting the target character through the probability.

In this embodiment, the mel cepstrum coefficient features are input into a pre-trained acoustic model, and are decoded to obtain phoneme information, then words or phrases corresponding to the phoneme information are found out from the dictionary, and finally the probability that the phoneme information belongs to the corresponding words or phrases is judged through the pre-trained language model, so as to select the corresponding target characters. In the pre-created dictionary, for Chinese, pinyin and Chinese characters are corresponding, and for English, phonetic symbols and words are corresponding.

S103, splitting the target character to obtain a special character, and judging the effectiveness of the special character;

in the step, the target characters are split, special characters comprising one or more of atomic indexes, dimensions and intentions are obtained, and then the effectiveness of the special characters is judged. The splitting process of the target character is a process of converting the target character into a structured language which can be understood by a machine. The special words comprise an atomic index, a dimension and an intention, wherein the atomic index is based on a measurement under a certain business event behavior, is an index which can not be split again in a business definition and has a name with a clear business meaning. The dimension is a measured environment and is used for reflecting a class of attributes of the service, and a set of the attributes forms a dimension and can also be called an entity object. The dimensions belong to a data domain, such as geographic dimensions (including country, region, province, etc.), time dimensions (including year, season, month, week, day level content). The intent refers to a user's desire that the computer understand. Take Shenzhen nan shan Ying this year as an example: the atomic index of the instruction is 'earning', the dimensionalities are 'Shenzhen nan shan' and 'this year', and the intention is 'view earning'.

In one embodiment, the step S103 includes:

acquiring all special characters in the target characters, and judging whether atomic indexes exist in the special characters or not;

if the special word has the atom index, judging that the special word is valid;

and if the special word does not have the atomic index, judging that the special word is invalid.

In this embodiment, all types of special words in the target text are obtained, and then it is determined whether an atomic index exists in the special words, if an atomic index exists, it indicates that the special word split from the target text is valid, otherwise, it is determined as invalid, and a prompt of "please re-input without inputting an atomic index" is given.

S104, if the special word is valid, judging whether the special word belongs to an authority index, and if the special word is the authority index, calling the specified audio to judge whether the user has a viewing authority;

in this step, if an atomic index exists in the special word, whether the special word belongs to the permission index is further determined, and if the special word belongs to the permission index, whether the user has the viewing permission is determined. The purpose of this step is to confirm whether the user has the permission to view, before the determination, it is necessary to confirm whether the special word belongs to the permission index, and if the special word belongs to the permission index, it indicates that the user needs to have the permission to view the special word.

In one embodiment, the step S104 includes:

judging whether the atomic index in the special word belongs to the authority index or not;

if the atomic index does not belong to the authority index, directly pushing the special word to the large-screen display module;

and if the atomic index is an authority index, calling the specified audio and carrying out identity comparison so as to judge whether the user has the checking authority.

In this embodiment, whether the special word belongs to the authority index is determined according to an atomic index in the special word, if the atomic index belongs to the authority index, the special word also belongs to the authority index, and if the atomic index does not belong to the authority index, the special word also does not belong to the authority index. When the atomic index does not belong to the permission index, the voice audio can be checked without permission, so that the special word can be directly pushed to a large-screen display module for display without the permission of a user. And when the atomic index is the permission index, the voice audio at the end can be checked only when the user has the checking permission, and at the moment, the stored specified audio is called to compare the identities, so that the checking permission of the user is checked.

In a specific embodiment, the calling the designated audio and performing identity comparison to determine whether the user has a viewing right includes:

and carrying out voiceprint recognition on the specified audio frequency through a voiceprint recognition technology, matching a voiceprint recognition result with a prestored voiceprint characteristic, judging that the user has the viewing permission if the voiceprint recognition result passes the matching, and judging that the user does not have the viewing permission if the voiceprint recognition result does not pass the matching.

In this embodiment, the specified audio is subjected to voiceprint recognition, and a voiceprint recognition result is matched with a voiceprint feature stored in advance, so that the viewing permission of the user is obtained according to the matching result. The voiceprint recognition technology is one of biological recognition technologies, is also called speaker recognition, and is a technology for judging the identity of a speaker through voice.

S105, if the user has the viewing permission, pushing the special word to a large-screen display module for display, and if the user does not have the viewing permission, stopping access.

In this step, when the user has the viewing right, the special word is displayed on the large-screen display module, and if the user does not have the viewing right, the access is stopped. The large screen display module is mainly used for displaying the special words on a large screen in a specific form, such as a static graph, a dynamic graph or a simple index value.

Referring to fig. 2, fig. 2 is a schematic block diagram of an intelligent voice-controlled large-screen display system according to an embodiment of the present invention, where the intelligent voice-controlled large-screen display system 200 includes:

the audio acquisition unit 201 is configured to acquire a voice audio input by a user, extract a voice audio in the voice audio, intercept the voice audio, acquire an assigned audio and a required audio, and store the assigned audio;

a target character obtaining unit 202, configured to train and decode the required audio to obtain target characters;

a target character splitting unit 203, configured to split the target character to obtain a special character, and determine validity of the special character;

a user permission confirming unit 204, configured to determine whether the special word belongs to a permission index if the special word is valid, and call the specified audio to determine whether the user has a viewing permission if the special word is the permission index;

and the large-screen display unit 205 is configured to push the special word to a large-screen display module for display if the user has a viewing permission, and stop accessing if the user does not have the viewing permission.

In one embodiment, the audio obtaining unit 201 includes:

and the voice audio extracting unit is used for carrying out model matching on the voice audio based on voice activity detection of a Gaussian mixture model so as to distinguish the voice audio and noise audio in the voice audio and extracting the voice audio.

In one embodiment, the target text acquiring unit 202 includes:

the preprocessing unit is used for carrying out silence removal and framing preprocessing on the required audio and extracting Mel cepstrum coefficient characteristics from the preprocessed voice data;

and the decoding unit is used for inputting the Mel cepstrum coefficient characteristics into a pre-trained acoustic model and a language model for decoding to obtain the target characters.

In an embodiment, the decoding unit comprises:

a phoneme information obtaining unit, configured to input the mel cepstrum coefficient features into a pre-trained acoustic model for feature decoding to obtain phoneme information;

the dictionary searching unit is used for searching words or phrases corresponding to the phoneme information in a pre-established dictionary;

and the target character judging unit is used for judging the probability that the phoneme information belongs to the corresponding character or word through a pre-trained language model, and selecting and outputting the target character through the probability.

In one embodiment, the target text splitting unit 203 includes:

the special character acquisition unit is used for acquiring all special characters in the target characters and judging whether the special characters have atomic indexes or not;

the special word validity judging unit is used for judging that the special word is valid if an atomic index exists in the special word;

and the special word invalidation judging unit is used for judging that the special word is invalid if no atom index exists in the special word.

In an embodiment, the user right confirming unit 204 includes:

the authority index judging unit is used for judging whether the atomic index in the special word belongs to the authority index or not;

the special word pushing unit is used for directly pushing the special word to the large-screen display module if the atomic index does not belong to the authority index;

and the identity comparison unit is used for calling the designated audio and comparing the identities if the atomic index is an authority index so as to judge whether the user has the checking authority.

In one embodiment, the identity matching unit includes:

and the voiceprint recognition unit is used for carrying out voiceprint recognition on the specified audio frequency through a voiceprint recognition technology, matching a voiceprint recognition result with a prestored voiceprint characteristic, judging that the user has the viewing permission if the voiceprint recognition result passes the matching, and judging that the user does not have the viewing permission if the voiceprint recognition result does not pass the matching.

The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the intelligent voice control large-screen display method when executing the computer program.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method for displaying an intelligent voice-controlled large screen is implemented.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. An intelligent voice control large screen display method is characterized by comprising the following steps:

training and decoding the required audio to obtain target characters;

2. The method for intelligently controlling the large-screen display through voice according to claim 1, wherein the collecting voice audio input by a user and extracting human voice audio in the voice audio comprises:

3. The method for intelligently controlling the large-screen display through the voice according to claim 1, wherein the training and decoding the required audio to obtain the target characters comprises the following steps:

4. The method as claimed in claim 3, wherein the step of inputting the Mel cepstrum coefficient features into a pre-trained acoustic model and a language model for decoding to obtain target characters comprises:

5. The method for intelligently controlling the large-screen display through the voice according to claim 1, wherein the splitting of the target characters to obtain the special characters and the judgment of the effectiveness of the special characters comprise:

if the special word has the atom index, judging that the special word is valid;

6. The method as claimed in claim 1, wherein the determining whether the special word belongs to a permission index if the special word is valid, and calling the designated audio to determine whether the user has a permission to view if the special word is a permission index comprises:

7. The method according to claim 6, wherein the calling the designated audio and performing identity comparison to determine whether the user has a viewing right comprises:

8. The utility model provides an intelligence speech control large screen display system which characterized in that includes:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the intelligent voice-controlled large-screen display method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to execute the intelligent voice-controlled large-screen display method according to any one of claims 1 to 7.