CN112820314A - Intelligent voice control large screen display method, system and related components thereof - Google Patents

Intelligent voice control large screen display method, system and related components thereof Download PDF

Info

Publication number
CN112820314A
CN112820314A CN202110031145.8A CN202110031145A CN112820314A CN 112820314 A CN112820314 A CN 112820314A CN 202110031145 A CN202110031145 A CN 202110031145A CN 112820314 A CN112820314 A CN 112820314A
Authority
CN
China
Prior art keywords
audio
voice
user
special
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110031145.8A
Other languages
Chinese (zh)
Inventor
冯杰
倪萌
杜俊磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Runlian Software System Shenzhen Co Ltd
Original Assignee
Runlian Software System Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Runlian Software System Shenzhen Co Ltd filed Critical Runlian Software System Shenzhen Co Ltd
Priority to CN202110031145.8A priority Critical patent/CN112820314A/en
Publication of CN112820314A publication Critical patent/CN112820314A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses an intelligent voice control large screen display method, a system and related components thereof, wherein the method comprises the following steps: collecting voice audio input by a user, extracting voice audio, and intercepting the voice audio into designated audio and required audio; training and decoding the required audio to obtain target characters; splitting the target character to obtain a special character, and judging the effectiveness of the special character; if the special word is valid, judging whether the special word belongs to the authority index, and if the special word is the authority index, calling the specified audio to judge whether the user has the viewing authority; and if the user has the viewing permission, pushing the special word to a large-screen display module for display, and if the user does not have the viewing permission, stopping the access. According to the method, whether the special character belongs to the permission index or not is judged, and whether the user has the permission to check or not is judged, so that how to display the input voice audio serving as the permission index is defined, large-screen display is more intelligent, application scenes are wider, and better user experience is provided for the user.

Description

Intelligent voice control large screen display method, system and related components thereof
Technical Field
The invention relates to the technical field of intelligent voice, in particular to an intelligent voice control large-screen display method, an intelligent voice control large-screen display system and related components thereof.
Background
With the rapid development of information-based construction and the coming of big data era, the requirements of various industries on index visualization are higher and higher. Through the electronic large screen, not only pictures, videos and the like need to be displayed for a user to watch, but also values behind mass data need to be mined and analyzed, and a manager is helped to find out relationships and rules behind the data, so that a basis is provided for decision making.
At present, a large-screen display control system used by most enterprises still needs to realize the display of index data by mouse clicking and frequent operation, and the steps are complicated and time is wasted. The voice control large-screen display system used by some enterprises has no authority setting for the contents displayed on a large screen, so that the electronic large screen can only display no-authority indexes and cannot meet the diversified requirements of users. In the existing voice control large screen display system, there are two limitations: 1. the authority of the speaker is not set, so that the electronic large screen can only display partial indexes; 2. when the display is carried out, only the content without the permission index can be displayed.
Disclosure of Invention
The embodiment of the invention provides an intelligent voice control large-screen display method, an intelligent voice control large-screen display system and related components thereof, and aims to solve the problems that in the prior art, the voice control large-screen does not set the authority of a speaker, so that index display is incomplete and contents needing authority indexes cannot be displayed.
In a first aspect, an embodiment of the present invention provides an intelligent voice-controlled large-screen display method, which includes:
collecting voice audio input by a user, extracting voice audio in the voice audio, intercepting the voice audio, acquiring specified audio and required audio, and storing the specified audio;
training and decoding the required audio to obtain target characters;
splitting the target character to obtain a special character, and judging the effectiveness of the special character;
if the special word is valid, judging whether the special word belongs to an authority index, and if the special word is the authority index, calling the specified audio to judge whether the user has a viewing authority;
and if the user has the viewing permission, pushing the special word to a large-screen display module for display, and if the user does not have the viewing permission, stopping access.
In a second aspect, an embodiment of the present invention provides an intelligent voice-controlled large-screen display system, which includes:
the audio acquisition unit is used for acquiring voice audio input by a user, extracting voice audio in the voice audio, intercepting the voice audio, acquiring specified audio and required audio, and storing the specified audio;
the target character acquisition unit is used for training and decoding the required audio to obtain target characters;
the target character splitting unit is used for splitting the target character to obtain a special character and judging the validity of the special character;
the user permission confirming unit is used for judging whether the special word belongs to permission indexes or not if the special word is valid, and calling the specified audio frequency to judge whether the user has viewing permission or not if the special word is a permission index;
and the large-screen display unit is used for pushing the special word to a large-screen display module for display if the user has the viewing permission, and stopping access if the user does not have the viewing permission.
In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the intelligent voice-controlled large-screen display method according to the first aspect.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the method for intelligent voice-controlled large-screen display according to the first aspect.
The embodiment of the invention provides an intelligent voice control large-screen display method, an intelligent voice control large-screen display system and related components thereof. The method comprises the steps of collecting voice audio input by a user, extracting voice audio in the voice audio, intercepting the voice audio, acquiring designated audio and required audio, and storing the designated audio; training and decoding the required audio to obtain target characters; splitting the target character to obtain a special character, and judging the effectiveness of the special character; if the special word is valid, judging whether the special word belongs to an authority index, and if the special word is the authority index, calling the specified audio to judge whether the user has a viewing authority; and if the user has the viewing permission, pushing the special word to a large-screen display module for display, and if the user does not have the viewing permission, stopping access. According to the embodiment of the invention, whether the special word in the voice audio input by the user belongs to the authority index is judged, and whether the user has the corresponding viewing authority is further judged, so that whether the special word is displayed on the large screen is determined, and the process of displaying the information when the information input by the user is the authority index is defined, so that the large screen display is more intelligent, the application scene is wider, and the user can obtain better use experience.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of an intelligent voice-controlled large-screen display method according to an embodiment of the present invention;
fig. 2 is a schematic block diagram of an intelligent voice-controlled large-screen display system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1, fig. 1 is a schematic flow chart of an intelligent voice-controlled large-screen display method according to an embodiment of the present invention, where the method includes steps S101 to S105.
S101, voice audio input by a user is collected, voice audio in the voice audio is extracted, the voice audio is intercepted, designated audio and required audio are obtained, and the designated audio is stored;
in this step, after the voice audio is collected, the voice audio required in the voice audio is extracted, and then the voice audio is intercepted into a section of designated audio and a section of required audio. When a user inputs voice audio to the large screen, the large screen needs to be awakened first, after the large screen is named in advance, the name of the large screen needs to be input by voice each time the voice audio is input, and the voice audio is the designated audio. The appointed audio is stored in a voice storage unit inside the large screen, so that the appointed audio can be conveniently called at any time. The required audio is the content of the actual voice input. For example, if a large screen is named as "smart", when voice audio input is performed, the user inputs "smart" or "smart" to the large screen to activate a voice module in the large screen, and then continues to input content information to be input to the large screen.
In one embodiment, the acquiring voice audio input by a user and extracting human voice audio in the voice audio includes:
and performing model matching on the voice audio based on the voice activity detection of the Gaussian mixture model so as to distinguish the voice audio and the noise audio in the voice audio and extract the voice audio.
In this embodiment, in the process of inputting voice audio by a user, due to the fact that partial noise may exist due to environmental factors, model matching is performed on the voice audio by using voice activity detection based on a gaussian mixture model, so that human voice audio and noise audio in the voice audio are distinguished. Based on the voice activity detection of the Gaussian mixture model, the Gaussian mixture model of environmental noise and voice is established in a special spectrum space, and then the voice audio is distinguished by adopting a model matching method, so that the noise audio and the human voice audio are judged. If only noise audio exists in the voice input by the user, the voice audio is directly judged to be invalid, and a prompt of're-inputting the audio' is given.
S102, training and decoding the required audio to obtain target characters;
in this step, after the required audio is obtained, two stages of training and decoding are required to be performed on the required audio, so that the required audio is analyzed to obtain the target characters.
In one embodiment, the step S102 includes:
carrying out silence removal and framing pretreatment on the required audio, and extracting Mel cepstrum coefficient characteristics from the pretreated voice data;
and inputting the Mel cepstrum coefficient characteristics into a pre-trained acoustic model and a language model for decoding to obtain the target characters.
In this embodiment, the required audio is first preprocessed, including silence removal and framing, and then mel cepstrum coefficient features of the preprocessed voice data are extracted, and then the mel cepstrum coefficient features are decoded to obtain the target characters. Specifically, the beginning and the end of the required audio are subjected to silence removal to reduce interference, then the required audio subjected to silence removal is subjected to framing processing, so that a voice signal has short-time stationarity, and then mel cepstrum coefficient features (namely MCFF features are extracted from voice data subjected to framing processing, namely MCFF features are extracted, namely, each frame waveform is changed into a multi-dimensional vector according to physiological characteristics of human ears, and the vector can be simply understood to contain content information of the frame voice). After the preprocessing of the required voice is finished, inputting the Mel cepstrum coefficient characteristics extracted by preprocessing into a pre-trained acoustic model and a language model for decoding to obtain the target characters.
In a specific embodiment, the inputting the mel-frequency cepstrum coefficient characteristics into a pre-trained acoustic model and a language model for decoding to obtain the target text includes:
inputting the Mel cepstrum coefficient characteristics into a pre-trained acoustic model for characteristic decoding to obtain phoneme information;
searching a word or a word corresponding to the phoneme information in a pre-established dictionary;
and judging the probability of the phoneme information belonging to the corresponding character or word through a pre-trained language model, and selecting and outputting the target character through the probability.
In this embodiment, the mel cepstrum coefficient features are input into a pre-trained acoustic model, and are decoded to obtain phoneme information, then words or phrases corresponding to the phoneme information are found out from the dictionary, and finally the probability that the phoneme information belongs to the corresponding words or phrases is judged through the pre-trained language model, so as to select the corresponding target characters. In the pre-created dictionary, for Chinese, pinyin and Chinese characters are corresponding, and for English, phonetic symbols and words are corresponding.
S103, splitting the target character to obtain a special character, and judging the effectiveness of the special character;
in the step, the target characters are split, special characters comprising one or more of atomic indexes, dimensions and intentions are obtained, and then the effectiveness of the special characters is judged. The splitting process of the target character is a process of converting the target character into a structured language which can be understood by a machine. The special words comprise an atomic index, a dimension and an intention, wherein the atomic index is based on a measurement under a certain business event behavior, is an index which can not be split again in a business definition and has a name with a clear business meaning. The dimension is a measured environment and is used for reflecting a class of attributes of the service, and a set of the attributes forms a dimension and can also be called an entity object. The dimensions belong to a data domain, such as geographic dimensions (including country, region, province, etc.), time dimensions (including year, season, month, week, day level content). The intent refers to a user's desire that the computer understand. Take Shenzhen nan shan Ying this year as an example: the atomic index of the instruction is 'earning', the dimensionalities are 'Shenzhen nan shan' and 'this year', and the intention is 'view earning'.
In one embodiment, the step S103 includes:
acquiring all special characters in the target characters, and judging whether atomic indexes exist in the special characters or not;
if the special word has the atom index, judging that the special word is valid;
and if the special word does not have the atomic index, judging that the special word is invalid.
In this embodiment, all types of special words in the target text are obtained, and then it is determined whether an atomic index exists in the special words, if an atomic index exists, it indicates that the special word split from the target text is valid, otherwise, it is determined as invalid, and a prompt of "please re-input without inputting an atomic index" is given.
S104, if the special word is valid, judging whether the special word belongs to an authority index, and if the special word is the authority index, calling the specified audio to judge whether the user has a viewing authority;
in this step, if an atomic index exists in the special word, whether the special word belongs to the permission index is further determined, and if the special word belongs to the permission index, whether the user has the viewing permission is determined. The purpose of this step is to confirm whether the user has the permission to view, before the determination, it is necessary to confirm whether the special word belongs to the permission index, and if the special word belongs to the permission index, it indicates that the user needs to have the permission to view the special word.
In one embodiment, the step S104 includes:
judging whether the atomic index in the special word belongs to the authority index or not;
if the atomic index does not belong to the authority index, directly pushing the special word to the large-screen display module;
and if the atomic index is an authority index, calling the specified audio and carrying out identity comparison so as to judge whether the user has the checking authority.
In this embodiment, whether the special word belongs to the authority index is determined according to an atomic index in the special word, if the atomic index belongs to the authority index, the special word also belongs to the authority index, and if the atomic index does not belong to the authority index, the special word also does not belong to the authority index. When the atomic index does not belong to the permission index, the voice audio can be checked without permission, so that the special word can be directly pushed to a large-screen display module for display without the permission of a user. And when the atomic index is the permission index, the voice audio at the end can be checked only when the user has the checking permission, and at the moment, the stored specified audio is called to compare the identities, so that the checking permission of the user is checked.
In a specific embodiment, the calling the designated audio and performing identity comparison to determine whether the user has a viewing right includes:
and carrying out voiceprint recognition on the specified audio frequency through a voiceprint recognition technology, matching a voiceprint recognition result with a prestored voiceprint characteristic, judging that the user has the viewing permission if the voiceprint recognition result passes the matching, and judging that the user does not have the viewing permission if the voiceprint recognition result does not pass the matching.
In this embodiment, the specified audio is subjected to voiceprint recognition, and a voiceprint recognition result is matched with a voiceprint feature stored in advance, so that the viewing permission of the user is obtained according to the matching result. The voiceprint recognition technology is one of biological recognition technologies, is also called speaker recognition, and is a technology for judging the identity of a speaker through voice.
S105, if the user has the viewing permission, pushing the special word to a large-screen display module for display, and if the user does not have the viewing permission, stopping access.
In this step, when the user has the viewing right, the special word is displayed on the large-screen display module, and if the user does not have the viewing right, the access is stopped. The large screen display module is mainly used for displaying the special words on a large screen in a specific form, such as a static graph, a dynamic graph or a simple index value.
Referring to fig. 2, fig. 2 is a schematic block diagram of an intelligent voice-controlled large-screen display system according to an embodiment of the present invention, where the intelligent voice-controlled large-screen display system 200 includes:
the audio acquisition unit 201 is configured to acquire a voice audio input by a user, extract a voice audio in the voice audio, intercept the voice audio, acquire an assigned audio and a required audio, and store the assigned audio;
a target character obtaining unit 202, configured to train and decode the required audio to obtain target characters;
a target character splitting unit 203, configured to split the target character to obtain a special character, and determine validity of the special character;
a user permission confirming unit 204, configured to determine whether the special word belongs to a permission index if the special word is valid, and call the specified audio to determine whether the user has a viewing permission if the special word is the permission index;
and the large-screen display unit 205 is configured to push the special word to a large-screen display module for display if the user has a viewing permission, and stop accessing if the user does not have the viewing permission.
In one embodiment, the audio obtaining unit 201 includes:
and the voice audio extracting unit is used for carrying out model matching on the voice audio based on voice activity detection of a Gaussian mixture model so as to distinguish the voice audio and noise audio in the voice audio and extracting the voice audio.
In one embodiment, the target text acquiring unit 202 includes:
the preprocessing unit is used for carrying out silence removal and framing preprocessing on the required audio and extracting Mel cepstrum coefficient characteristics from the preprocessed voice data;
and the decoding unit is used for inputting the Mel cepstrum coefficient characteristics into a pre-trained acoustic model and a language model for decoding to obtain the target characters.
In an embodiment, the decoding unit comprises:
a phoneme information obtaining unit, configured to input the mel cepstrum coefficient features into a pre-trained acoustic model for feature decoding to obtain phoneme information;
the dictionary searching unit is used for searching words or phrases corresponding to the phoneme information in a pre-established dictionary;
and the target character judging unit is used for judging the probability that the phoneme information belongs to the corresponding character or word through a pre-trained language model, and selecting and outputting the target character through the probability.
In one embodiment, the target text splitting unit 203 includes:
the special character acquisition unit is used for acquiring all special characters in the target characters and judging whether the special characters have atomic indexes or not;
the special word validity judging unit is used for judging that the special word is valid if an atomic index exists in the special word;
and the special word invalidation judging unit is used for judging that the special word is invalid if no atom index exists in the special word.
In an embodiment, the user right confirming unit 204 includes:
the authority index judging unit is used for judging whether the atomic index in the special word belongs to the authority index or not;
the special word pushing unit is used for directly pushing the special word to the large-screen display module if the atomic index does not belong to the authority index;
and the identity comparison unit is used for calling the designated audio and comparing the identities if the atomic index is an authority index so as to judge whether the user has the checking authority.
In one embodiment, the identity matching unit includes:
and the voiceprint recognition unit is used for carrying out voiceprint recognition on the specified audio frequency through a voiceprint recognition technology, matching a voiceprint recognition result with a prestored voiceprint characteristic, judging that the user has the viewing permission if the voiceprint recognition result passes the matching, and judging that the user does not have the viewing permission if the voiceprint recognition result does not pass the matching.
The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the intelligent voice control large-screen display method when executing the computer program.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method for displaying an intelligent voice-controlled large screen is implemented.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. An intelligent voice control large screen display method is characterized by comprising the following steps:
collecting voice audio input by a user, extracting voice audio in the voice audio, intercepting the voice audio, acquiring specified audio and required audio, and storing the specified audio;
training and decoding the required audio to obtain target characters;
splitting the target character to obtain a special character, and judging the effectiveness of the special character;
if the special word is valid, judging whether the special word belongs to an authority index, and if the special word is the authority index, calling the specified audio to judge whether the user has a viewing authority;
and if the user has the viewing permission, pushing the special word to a large-screen display module for display, and if the user does not have the viewing permission, stopping access.
2. The method for intelligently controlling the large-screen display through voice according to claim 1, wherein the collecting voice audio input by a user and extracting human voice audio in the voice audio comprises:
and performing model matching on the voice audio based on the voice activity detection of the Gaussian mixture model so as to distinguish the voice audio and the noise audio in the voice audio and extract the voice audio.
3. The method for intelligently controlling the large-screen display through the voice according to claim 1, wherein the training and decoding the required audio to obtain the target characters comprises the following steps:
carrying out silence removal and framing pretreatment on the required audio, and extracting Mel cepstrum coefficient characteristics from the pretreated voice data;
and inputting the Mel cepstrum coefficient characteristics into a pre-trained acoustic model and a language model for decoding to obtain the target characters.
4. The method as claimed in claim 3, wherein the step of inputting the Mel cepstrum coefficient features into a pre-trained acoustic model and a language model for decoding to obtain target characters comprises:
inputting the Mel cepstrum coefficient characteristics into a pre-trained acoustic model for characteristic decoding to obtain phoneme information;
searching a word or a word corresponding to the phoneme information in a pre-established dictionary;
and judging the probability of the phoneme information belonging to the corresponding character or word through a pre-trained language model, and selecting and outputting the target character through the probability.
5. The method for intelligently controlling the large-screen display through the voice according to claim 1, wherein the splitting of the target characters to obtain the special characters and the judgment of the effectiveness of the special characters comprise:
acquiring all special characters in the target characters, and judging whether atomic indexes exist in the special characters or not;
if the special word has the atom index, judging that the special word is valid;
and if the special word does not have the atomic index, judging that the special word is invalid.
6. The method as claimed in claim 1, wherein the determining whether the special word belongs to a permission index if the special word is valid, and calling the designated audio to determine whether the user has a permission to view if the special word is a permission index comprises:
judging whether the atomic index in the special word belongs to the authority index or not;
if the atomic index does not belong to the authority index, directly pushing the special word to the large-screen display module;
and if the atomic index is an authority index, calling the specified audio and carrying out identity comparison so as to judge whether the user has the checking authority.
7. The method according to claim 6, wherein the calling the designated audio and performing identity comparison to determine whether the user has a viewing right comprises:
and carrying out voiceprint recognition on the specified audio frequency through a voiceprint recognition technology, matching a voiceprint recognition result with a prestored voiceprint characteristic, judging that the user has the viewing permission if the voiceprint recognition result passes the matching, and judging that the user does not have the viewing permission if the voiceprint recognition result does not pass the matching.
8. The utility model provides an intelligence speech control large screen display system which characterized in that includes:
the audio acquisition unit is used for acquiring voice audio input by a user, extracting voice audio in the voice audio, intercepting the voice audio, acquiring specified audio and required audio, and storing the specified audio;
the target character acquisition unit is used for training and decoding the required audio to obtain target characters;
the target character splitting unit is used for splitting the target character to obtain a special character and judging the validity of the special character;
the user permission confirming unit is used for judging whether the special word belongs to permission indexes or not if the special word is valid, and calling the specified audio frequency to judge whether the user has viewing permission or not if the special word is a permission index;
and the large-screen display unit is used for pushing the special word to a large-screen display module for display if the user has the viewing permission, and stopping access if the user does not have the viewing permission.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the intelligent voice-controlled large-screen display method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to execute the intelligent voice-controlled large-screen display method according to any one of claims 1 to 7.
CN202110031145.8A 2021-01-11 2021-01-11 Intelligent voice control large screen display method, system and related components thereof Pending CN112820314A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110031145.8A CN112820314A (en) 2021-01-11 2021-01-11 Intelligent voice control large screen display method, system and related components thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110031145.8A CN112820314A (en) 2021-01-11 2021-01-11 Intelligent voice control large screen display method, system and related components thereof

Publications (1)

Publication Number Publication Date
CN112820314A true CN112820314A (en) 2021-05-18

Family

ID=75869874

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110031145.8A Pending CN112820314A (en) 2021-01-11 2021-01-11 Intelligent voice control large screen display method, system and related components thereof

Country Status (1)

Country Link
CN (1) CN112820314A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113870763A (en) * 2021-09-15 2021-12-31 肇庆德庆冠旭电子有限公司 Light control method and device and intelligent sound box
CN116564280A (en) * 2023-07-05 2023-08-08 深圳市彤兴电子有限公司 Display control method and device based on voice recognition and computer equipment
CN117037790A (en) * 2023-10-10 2023-11-10 朗朗教育科技股份有限公司 AI interaction intelligent screen control system and method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005189846A (en) * 2003-12-05 2005-07-14 Ihm:Kk Audio control screen system
CN106814997A (en) * 2015-11-27 2017-06-09 阿里巴巴集团控股有限公司 Database manipulation language script optimization method, apparatus and system
CN107193391A (en) * 2017-04-25 2017-09-22 北京百度网讯科技有限公司 The method and apparatus that a kind of upper screen shows text message
CN108319864A (en) * 2018-01-17 2018-07-24 链家网(北京)科技有限公司 A kind of information inspection control method and device
CN111273879A (en) * 2020-01-10 2020-06-12 杭州勇电照明有限公司 Large-screen display method and device for user interactive display
WO2020147380A1 (en) * 2019-01-14 2020-07-23 深圳前海达闼云端智能科技有限公司 Human-computer interaction method and apparatus, computing device, and computer-readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005189846A (en) * 2003-12-05 2005-07-14 Ihm:Kk Audio control screen system
CN106814997A (en) * 2015-11-27 2017-06-09 阿里巴巴集团控股有限公司 Database manipulation language script optimization method, apparatus and system
CN107193391A (en) * 2017-04-25 2017-09-22 北京百度网讯科技有限公司 The method and apparatus that a kind of upper screen shows text message
CN108319864A (en) * 2018-01-17 2018-07-24 链家网(北京)科技有限公司 A kind of information inspection control method and device
WO2020147380A1 (en) * 2019-01-14 2020-07-23 深圳前海达闼云端智能科技有限公司 Human-computer interaction method and apparatus, computing device, and computer-readable storage medium
CN111273879A (en) * 2020-01-10 2020-06-12 杭州勇电照明有限公司 Large-screen display method and device for user interactive display

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113870763A (en) * 2021-09-15 2021-12-31 肇庆德庆冠旭电子有限公司 Light control method and device and intelligent sound box
CN116564280A (en) * 2023-07-05 2023-08-08 深圳市彤兴电子有限公司 Display control method and device based on voice recognition and computer equipment
CN116564280B (en) * 2023-07-05 2023-09-08 深圳市彤兴电子有限公司 Display control method and device based on voice recognition and computer equipment
CN117037790A (en) * 2023-10-10 2023-11-10 朗朗教育科技股份有限公司 AI interaction intelligent screen control system and method
CN117037790B (en) * 2023-10-10 2024-01-09 朗朗教育科技股份有限公司 AI interaction intelligent screen control system and method

Similar Documents

Publication Publication Date Title
US11409813B2 (en) Method and apparatus for mining general tag, server, and medium
US11037553B2 (en) Learning-type interactive device
US20190057697A1 (en) Better resolution when referencing to concepts
US10192544B2 (en) Method and system for constructing a language model
CN112820314A (en) Intelligent voice control large screen display method, system and related components thereof
Reddy et al. Speech to text conversion using android platform
US7962842B2 (en) Method and systems for accessing data by spelling discrimination letters of link names
CN109087670B (en) Emotion analysis method, system, server and storage medium
US8417530B1 (en) Accent-influenced search results
CN109740053B (en) Sensitive word shielding method and device based on NLP technology
CN109509470A (en) Voice interactive method, device, computer readable storage medium and terminal device
CN107402912B (en) Method and device for analyzing semantics
CN107943786B (en) Chinese named entity recognition method and system
CN110297880B (en) Corpus product recommendation method, apparatus, device and storage medium
Ali et al. Advances in dialectal arabic speech recognition: A study using twitter to improve egyptian asr
CN110808032A (en) Voice recognition method and device, computer equipment and storage medium
KR101677859B1 (en) Method for generating system response using knowledgy base and apparatus for performing the method
JP7400112B2 (en) Biasing alphanumeric strings for automatic speech recognition
CN109635073A (en) Forum's community application management method, device, equipment and computer readable storage medium
CN112669842A (en) Man-machine conversation control method, device, computer equipment and storage medium
WO2020233381A1 (en) Speech recognition-based service request method and apparatus, and computer device
JP2009042968A (en) Information selection system, information selection method, and program for information selection
CN112861510A (en) Summary processing method, apparatus, device and storage medium
Hughes et al. Formant dynamics and durations of um improve the performance of automatic speaker recognition systems
CN103020311A (en) Method and system for processing user search terms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination