CN109949803B

CN109949803B - Building service facility control method and system based on semantic instruction intelligent identification

Info

Publication number: CN109949803B
Application number: CN201910110334.7A
Authority: CN
Inventors: 王闺臣
Original assignee: Terminus Beijing Technology Co Ltd
Current assignee: Terminus Beijing Technology Co Ltd
Priority date: 2019-02-11
Filing date: 2019-02-11
Publication date: 2020-01-31
Anticipated expiration: 2039-02-11
Also published as: CN109949803A

Abstract

The invention discloses a building service facility control method based on semantic instruction intelligent identification, which comprises the following steps: firstly, acquiring an instruction voice signal, then converting the instruction voice signal into a primary character string through voice recognition, then determining low-confidence characters from the primary character string, then substituting characteristic parameters of the low-confidence characters into a Hopfield neural network to obtain a regular character string, finally generating a corresponding control signal according to the regular character string, and controlling a building service facility to execute a corresponding function according to the control signal. The method enables a user to control the operation of various service devices in the building service facility through instruction voice, so that the control mode is more intelligent and convenient; meanwhile, correct instruction voice can be obtained when the pronunciation of the instruction voice sent by the user is not standard, and the problem that the user cannot normally control the building service facilities due to the voice is avoided.

Description

Building service facility control method and system based on semantic instruction intelligent identification

Technical Field

The invention relates to the technical field of intelligent control, in particular to a building service facility control method based on semantic instruction intelligent identification and a building service facility control system based on semantic instruction intelligent identification.

Background

The building service facilities are various in-building facilities which provide services for daily operation of cities, daily life of people and the like, including public service facilities and living service facilities, the operation of the building service facilities can not be separated from the operation of various service equipment in the facilities, such as elevators in buildings in public places and bans at mouths of residential buildings, air conditioners, lighting lamps and the like in bedrooms in private halls, the normal operation of the building service facilities can be ensured only by the common normal operation of the service equipment, and the building service facilities tend to be intelligentized, facilitated and multifunctional along with the development of the cities and the improvement of the living standard of people.

Service equipment in existing building service facilities is provided with a control panel for starting and controlling the service equipment, but the service equipment is generally operated by manually operating the control panel provided with buttons and keys, for example, a user can only receive a control command by pressing down a floor key in an elevator, a number button of a resident building forbidden system, a button on an air-conditioning panel and a light panel on an indoor wall, and the like by hands so as to realize elevator lifting, brand input, temperature adjustment and light adjustment.

For example, the user may speak instructions such as "please call XXX room", "please open , and" please lock "beside forbidden," or speak instructions such as "please turn on air conditioner", "please turn off air conditioner", "please warm up X degree", "please cool down X degree" near the air conditioner panel, or speak instructions such as "please turn on ceiling lamp light", "please dim bedside lamp light" near the light panel to control forbidden or air conditioner or light fitting to perform corresponding operations.

However, in the current method of controlling the device operation and implementing various functions by voice, a simpler voice recognizer is adopted, and the situations of different accents, irregular pronunciation and unclear pronunciation of the speaking user cannot be handled and processed well, so that the instruction voice may not be recognized correctly, and the facility operation cannot be controlled.

Disclosure of Invention

() object of the invention

In order to overcome at least defects existing in the prior art, so that a user can control the operation of various service equipment in a building service facility through instruction voice and can obtain an instruction with correct semantics when the pronunciation of the instruction voice sent by the user is not standard, the invention discloses the following technical scheme.

(II) technical scheme

As an th aspect of the invention, the invention discloses a building service facility control method based on semantic instruction intelligent recognition, which comprises the following steps:

collecting instruction voice signals;

converting the instruction voice signal into a primary character string through voice recognition;

determining low-confidence characters from the primary character string;

substituting the characteristic parameters of the low-confidence characters into a neural network to obtain regular character strings;

and generating a corresponding control signal according to the regular character string, and controlling the building service facility to execute a corresponding function according to the control signal.

In , before the converting the instruction speech signal into the primary character string by speech recognition, the method further includes:

and carrying out noise reduction and/or echo elimination processing on the collected instruction voice signal.

recognizing the voiceprint of the instruction voice signal through a voiceprint recognition technology;

and judging whether the recognized voiceprint has the control authority or not, and stopping executing the subsequent steps under the condition that the recognized voiceprint does not have the control authority.

In possible implementations, before the determining the low-confidence character from the primary character string, the method further includes:

and identifying whether the instruction voice signal or the primary character string contains a voice signal or a character of an identification word, and stopping executing the subsequent steps under the condition that the voice signal or the character of the identification word is not identified.

In possible implementations, the determining low-confidence characters from the primary string includes:

calculating the number of sample character strings of each character in the primary character string in all sample character strings contained in an instruction voice sample library and containing each character;

comparing the number of the sample character strings calculated to include each of the characters with a minimum confidence threshold, and determining characters whose number is lower than the minimum confidence threshold as low confidence characters.

In possible embodiments, before the substituting the feature parameters of the low-confidence character into the neural network to obtain the regular character string, the method further includes:

determining high confidence characters from the primary character string;

determining a precursor high-confidence character and/or a successor high-confidence character of the low-confidence character; and the number of the first and second electrodes,

the step of substituting the characteristic parameters of the low-confidence character into a neural network to obtain a regular character string comprises:

determining a successor character set of the predecessor high-confidence character and/or a successor character set of the successor high-confidence character from an instruction voice sample library;

training a neural network according to the sound vectors of part or all of the characters in the subsequent character set and/or the precursor character set;

and substituting the sound vector corresponding to the low-confidence character into the neural network for regularized recognition so as to determine a replacement character from the corresponding subsequent character set and/or precursor character set, and replacing the low-confidence character to obtain a regular character string.

In possible implementations, the determining the high confidence character from the primary string includes:

determining all characters in the primary character string which are not determined as low-confidence characters as high-confidence characters; or the like, or, alternatively,

comparing the number of the sample character strings including each of the characters with a maximum confidence threshold value, and determining characters with the number higher than the maximum confidence threshold value as high-confidence characters.

In possible implementations, the determining the successor character set of the predecessor high-confidence character and/or the successor character set of the successor high-confidence character from the instruction speech sample library includes:

determining all subsequent characters of the precursor high-confidence character and/or determining all precursor characters of the subsequent high-confidence character in the instruction voice sample library;

counting the frequency of the subsequent characters and/or the precursor characters appearing in all the sample character strings;

and determining a plurality of successor characters and/or predecessor characters with the highest frequency according to the frequency ranking or a preset frequency threshold, and respectively forming a successor character set and/or a predecessor character set.

In possible embodiments, before determining the plurality of subsequent characters and the precursor character with the highest frequency according to the frequency ranking or according to the preset frequency threshold, summing the occurrence frequencies of the same character in the subsequent character and the precursor character, and taking the summed frequency data as the occurrence frequency of the character.

In possible embodiments, when training the neural network according to the sound vectors of some or all characters in the successor character set and/or the predecessor character set, only some sound vectors of corresponding characters in the instruction speech sample library are selected to train the neural network.

In possible embodiments, after the generating the corresponding control signal according to the regular character string, the method further includes:

storing the collected instruction voice signal in the instruction voice sample library.

In possible implementations, the generating the corresponding control signal according to the regular string includes:

converting the regular character string into a semantic instruction through synonym mapping;

and generating a control signal according to the semantic instruction, and controlling the building service facility to execute a corresponding function according to the control signal.

As a second aspect of the invention, the invention also discloses a building service facility control system based on semantic instruction intelligent identification, which comprises:

the signal acquisition module is used for acquiring instruction voice signals;

the primary character generation module is used for converting the instruction voice signal into a primary character string through voice recognition;

the low-confidence character determining module is used for determining low-confidence characters from the primary character string;

the regular character generation module is used for substituting the characteristic parameters of the low-confidence characters into a neural network to obtain regular character strings;

and the control signal generation module is used for generating corresponding control signals according to the regular character strings and controlling the building service facilities to execute corresponding functions according to the control signals.

In possible implementations, the control system further includes:

and the noise processing module is used for carrying out noise reduction and/or echo elimination processing on the collected instruction voice signal before converting the instruction voice signal into a primary character string through voice recognition.

In possible implementations, the control system further includes:

and the voiceprint judging module is used for identifying the voiceprint of the instruction voice signal through a voiceprint identification technology before converting the instruction voice signal into the primary character string through voice identification, judging whether the identified voiceprint has the control authority or not, and stopping executing the subsequent steps under the condition that the identified voiceprint does not have the control authority.

In possible implementations, the control system further includes:

and the identification recognition module is used for recognizing whether the instruction voice signal or the primary character string contains the voice signal or the character of the identification word or not before the low-confidence character is determined from the primary character string, and stopping executing the subsequent steps under the condition that the voice signal or the character of the identification word is not recognized.

In possible implementations, the low-confidence character determination module includes:

the appearance quantity counting unit is used for calculating the quantity of each character in the primary character string in all sample character strings contained in the instruction voice sample library and containing the sample character strings of each character;

and a low-confidence character determination unit configured to compare the number of the sample character strings calculated to include each of the characters with a minimum confidence threshold value, and determine a character whose number is lower than the minimum confidence threshold value as a low-confidence character.

In possible implementations, the control system further includes:

a high confidence character determining module, configured to determine a high confidence character from the primary character string and determine a predecessor high confidence character and/or a successor high confidence character of the low confidence character before substituting the characteristic parameters of the low confidence character into a neural network to obtain a regular character string; and the number of the first and second electrodes,

the regular character generation module includes:

the character set determining unit is used for determining a successor character set of the predecessor high-confidence character and/or a successor character set of the successor high-confidence character from an instruction voice sample library;

the neural network training unit is used for training a neural network according to the sound vectors of part or all of the characters in the successor character set and/or the predecessor character set;

and the regular character generating unit is used for substituting the sound vector corresponding to the low-confidence character into the neural network to carry out regular recognition so as to determine a replacement character from the corresponding subsequent character set and/or precursor character set and replace the low-confidence character to obtain a regular character string.

In possible implementations, the high confidence character determination module includes:

an th character determining unit for determining all characters in the primary character string which are not determined as low-confidence characters as high-confidence characters, and/or,

and a second character determination unit configured to compare the number of the sample character strings calculated to include each of the characters with a maximum confidence threshold value, and determine characters whose number is higher than the maximum confidence threshold value as high-confidence characters.

In possible implementations, the character set determination unit includes:

a third character determining subunit, configured to determine all subsequent characters of the previous high-confidence character and/or all previous characters of the subsequent high-confidence character in the instruction voice sample library;

the frequency counting subunit is used for counting the frequency of the subsequent characters and/or the precursor characters appearing in all the sample character strings;

and the character set forming subunit is used for determining a plurality of successor characters and/or predecessor characters with the highest frequency according to the frequency ranking or a preset frequency threshold value, and respectively forming successor character sets and/or predecessor character sets.

In possible embodiments, the frequency statistics subunit is further configured to, before the determining, according to the frequency ranking or according to a preset frequency threshold, a plurality of subsequent characters and the precursor character with a highest frequency, sum the occurrence frequencies of the same characters in the subsequent characters and the precursor character, and use the summed frequency data as the occurrence frequency of the character.

In possible embodiments, the neural network training unit selects only a part of the sound vectors of the corresponding characters in the instruction speech sample library to train the neural network.

In possible implementations, the control system further includes:

and the sample storage module is used for storing the acquired instruction voice signal in the instruction voice sample library after the corresponding control signal is generated according to the regular character string.

In possible implementations, the control signal generating module includes:

the semantic mapping unit is used for converting the regular character string into a semantic instruction through synonym mapping;

and the signal generating unit is used for generating a control signal according to the semantic instruction and controlling the building service facility to execute a corresponding function according to the control signal.

(III) advantageous effects

The building service facility control method and system based on semantic instruction intelligent identification disclosed by the invention have the following beneficial effects:

1. the user can control the operation of various service devices in the building service facility through the instruction voice without manually operating a control panel by the hand of the user, so that the control mode is more intelligent and convenient; meanwhile, when the instruction voice sent by the speaking user has accent, irregular pronunciation and unclear pronunciation, the instruction voice can be correctly associated and recognized to obtain correct instruction voice, and the problem that the user cannot normally control the building service facilities due to the voice is avoided.

2. Whether a speaking user has a control authority or not is judged by a voiceprint recognition technology before an instruction voice signal is converted into a primary character string, so that the situation that the television sends sound and is collected by a control system to generate misoperation is avoided, meanwhile, the situation that the user does not have the authority and is stopped to carry out subsequent steps in the later operation stage of the control system is prevented, and the waste of operation resources of the control system is avoided.

3. Before determining the low-confidence character from the primary character string or before converting the instruction voice signal into the primary character string through voice recognition, identifying word recognition is carried out on the instruction voice signal or the primary character string converted by the instruction voice signal, and the instruction voice signal is confirmed to be a control command sent by a control system instead of other human conversations, so that recognition of non-instruction voice can be avoided, and waste of operation resources of the control system is avoided.

4. The low-confidence characters are determined by acquiring the occurrence frequency of the characters in the sample library and setting a threshold value, and then the characters with wrong semantics are positioned, so that the positioning accuracy is high, and the positioning speed is high.

5. The candidate characters of the low-confidence characters are obtained by utilizing the high-confidence characters before and after the low-confidence characters, the characters with correct semantics are obtained from the candidate characters by utilizing the neural network, the characters which are identified as having wrong semantics due to the problems of accents and the like can be accurately and quickly replaced by the characters with correct semantics, the control of users from different regions on the building service facilities is facilitated, and the replacement accuracy is high.

6. The high-frequency successor characters and high-frequency predecessor characters are screened out, only the high-frequency successor characters and high-frequency predecessor characters are trained into the neural network, and only part of sound vectors are selected out and substituted into the neural network for training, so that the operation efficiency is improved, and the system response time is saved.

7. By storing the instruction voice signal which can be correctly recognized in the instruction voice sample library, the recognition process when the same voice instruction is sent by a user later can be facilitated.

8. The control system is used for converting various regular character strings into semantic instructions with relatively less total number in advance, then control signals are generated and sent to the controlled equipment, communication between the control system and the controlled equipment is enabled to be simpler, and the requirement on the data processing capacity of the controlled equipment is reduced.

Drawings

The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining and illustrating the present invention and should not be construed as limiting the scope of the present invention.

FIG. 1 is a flow chart of an embodiment of the construction service facility control method based on semantic instruction intelligent recognition disclosed by the invention.

FIG. 2 is a block diagram illustrating an embodiment of the building service control system based on intelligent recognition of semantic instructions.

Detailed Description

In order to make the implementation objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be described in more detail below with reference to the accompanying drawings in the embodiments of the present invention.

It is to be noted that throughout the appended drawings, like or similar designations refer to like or similar elements or elements having like or similar functionality, and that the described embodiments are some, but not all embodiments of the invention , and that the embodiments and features of the embodiments may be combined without conflict, and all other embodiments that may be obtained by one of ordinary skill in the art, based on the embodiments herein, are within the scope of the present invention.

In this document, "", "second", etc. are used only for distinguishing one from another, and do not indicate their degree of importance, order, etc.

The division of the modules, units or components herein is merely logical functions, and there may be other division ways in actual implementation, for example, a plurality of modules and/or units may be combined or integrated in another systems.

The following describes in detail the construction service facility control method based on semantic instruction intelligent recognition disclosed by the present invention with reference to fig. 1. this embodiment is mainly applied to construction service facilities, so that a user can control the operation of various service devices in the construction service facilities by instruction voice, and does not need to manually operate a control panel with a hand of the user, so that the control mode is more intelligent and convenient, and meanwhile, the method can correctly associate and recognize the instruction voice to obtain correct instruction voice when the instruction voice sent by a speaking user has accent, irregular pronunciation and unclear pronouncing, thereby avoiding that the user cannot normally control the construction service facilities due to the problem of voice.

As shown in fig. 1, the building service facility control method disclosed in the present embodiment includes the following steps:

step 100, collecting instruction voice signals.

When a user needs to control the starting, stopping and running conditions of the building service facilities, the user can realize control by sending instruction voice. For the building service facilities and the service equipment therein, firstly, the instruction voice sent by the user needs to be collected, and an instruction voice signal is obtained.

It is understood that the device for collecting the command voice signal is an environment such as an elevator room, a bedroom, etc., which is installed in advance in each space, place, passageway, or entrance/exit of the building service facility. Typically, a user utters a command voice by speaking. In the following, the "please turn off the air conditioner" instruction voice spoken by the user in the bedroom to control the on/off of the air conditioner is taken as an example.

Step 200, converting the instruction voice signal into a primary character string through voice recognition.

After obtaining the instruction voice signal sent by the user, firstly, the instruction voice signal is identified by a basic voice identification technology so as to convert the instruction voice signal into a text form and obtain a text character string. The basic speech recognition has a high recognition speed, but the semantic meaning expressed by the recognized text string may not be completely the same as the semantic meaning actually expressed by the command speech signal by the user due to the occurrence of an accent or a whistling of the command speech issued by the user. For example, a text string of "please turn off the air conditioner" with an accent uttered by the user is recognized as "please manage the control up".

At step 300, low confidence characters are determined from the primary string.

Low confidence characters refer to characters that occur less frequently in the speech samples of the library of command speech samples. After the primary character string is obtained, it is necessary to determine from the primary character string which characters are semantically incorrect characters due to problems such as accents, pronouncing, etc. during the speech recognition process of step 200, and these determined semantically incorrect characters are usually low-confidence characters.

The command voice sample library contains command voice samples with a certain number (not , all) and accurately recognized sample character strings corresponding to each command voice samples in advance, and characters with wrong semantics can be recognized by judging the confidence level of each character in the primary character string, for example, some or several characters in the primary character string appear in all sample character strings in the command voice sample library with extremely low frequency, even if all sample character strings do not contain (at the moment, the confidence level is zero), the characters are likely to be characters with wrong semantics, namely low-confidence characters.

Specifically, in the "please manage the upper control tone" character string, "manage" and "control" appear in all sample character strings in the command voice sample library with very low frequency, so that "manage" and "control" are low-confidence characters and can be regarded as characters with wrong semantics, if the primary character string after voice recognition and conversion is "celebrating the upper air conditioner", and no "celebration" appears in any sample character string in the command voice sample library, so that the confidence level of "celebrating" is zero, and the characters belong to low-confidence characters and can be determined as characters with wrong semantics.

And step 400, substituting the characteristic parameters of the low-confidence characters into a neural network to obtain a regular character string.

It is understood that the regular character string is not specific to a sample character string in the command speech sample library that has been accurately recognized, as long as the building service facility can correctly recognize and represent the correct character string.

The feature parameter of the low-confidence character may be a feature vector after the text character is converted into an image, for example, a 256 × 256 matrix, where values of a portion for representing the character and a blank portion are different; the feature parameter may also be a feature vector corresponding to a low-confidence character in a previously acquired instruction voice signal.

The Neural Network (NN) is a complex network system formed by widely connecting a large number of simple processing units (called neurons) , and has a self-learning function, an associative storage function and the capability of searching for an optimized solution at a high speed, and after training the Neural network, a primary character string (for example, "please manage and adjust the upper control") which may contain erroneous recognition semantics can be directly input into the Neural network, and a regular character string (for example, "please turn on the air conditioner") after semantic normalization can be directly obtained, so as to correct the recognition error part in the voice recognition and conversion process in the step 200, and the obtained character string is consistent with the command originally sent by the user.

Specifically, a plurality of correct templates corresponding to command voice signals which may be sent by a user can be used as input of the neural network, and corresponding expected output is given. When the input is applied to the neural network, the actual output of the neural network is compared with the expected output, and if the actual output of the neural network does not accord with the expected output, the weight and the bias value of the neural network are adjusted. The above process is then repeated until the actual output of the neural network is close to the desired output, at which point the neural network may be considered to be trained. Substituting the low-confidence character into the trained neural network to obtain correct output.

For example, when image recognition is performed, the network will slowly learn to recognize similar images through a self-learning function only by inputting a plurality of different image templates and corresponding recognition results into the neural network.

And 500, generating a corresponding control signal according to the regular character string, and controlling the building service facility to execute a corresponding function according to the control signal.

After the regular character string is obtained, the building service facility can be ensured to smoothly identify and correctly execute the function which the user instructs the voice to intend to execute, and the service which the user wants to obtain is realized. For example, after the user speaks the voice of "please turn off the air conditioner", the control system automatically obtains the regular text character string of "please turn off the air conditioner" through the above steps, and sends an instruction to the air conditioner according to the character string to turn off the air conditioner.

In , before converting the instruction voice signal into the primary character string through voice recognition in step 200, the method further includes:

and 010, performing noise reduction and/or echo elimination processing on the acquired instruction voice signal.

In the process that the user sends the instruction voice and the control system collects the instruction voice signal sent by the user, other sound sources may generate noise interference, such as the sound of a television, the sound of a washing machine and the like, and the signal to noise ratio of the instruction voice signal collected by the control system can be improved by reducing noise, so that the recognition accuracy of the voice signal is improved.

Under the indoor environment, the command voice sent by the user may encounter the reflection of an indoor wall to generate echoes, the echo is reflected to the acquisition point of the control system to interfere the acquisition of the voice signal, and through the echo cancellation technology, the signal-to-noise ratio of the command voice signal acquired by the control system can be improved, so that the recognition precision of the voice signal is improved.

step 020, identifying the voiceprint of the instruction voice signal through a voiceprint identification technology.

And 030, judging whether the identified voiceprint has the control authority, and stopping executing the subsequent steps under the condition that the identified voiceprint does not have the control authority.

For example, the operations of opening and closing, temperature regulation and the like of an indoor air conditioner are limited to three persons of an owner , guests, parrots, a television and other persons, organisms or equipment which can send out human voice do not have the authority, otherwise, if the television has words of 'please turn off the air conditioner', the control system can carry out acquisition recognition and control the air conditioner to be turned off, and a misoperation phenomenon is generated.

The voiceprint is a sound wave frequency spectrum which is displayed by an electro-acoustic instrument and carries speech information, and the voiceprint recognition technology is technologies for judging the identity of a speaker through sound.

It should be noted that in some situations, for example, in a building, all people can select a destination elevator floor by voice, and at this time, there is no need to set a step of voiceprint recognition.

In , before determining the low-confidence character from the primary string in step 300, the method further includes:

and 030, identifying whether the instruction voice signal or the primary character string contains the voice signal or the character of the identified word, and stopping executing the subsequent steps under the condition that the voice signal or the character of the identified word is not identified.

The identification words enable the control system to confirm whether the collected voice signal was sent for control of the building services.

If the speech is sent for controlling the building service facility, the user needs to add the identification word in the speech when speaking, so that the control system knows that the speech containing the identification word is sent for controlling the building service facility. For example, if the user says "please turn off the air conditioner", the word "please" can be used as the identification word, and the control system continues to execute the subsequent steps after recognizing the word "please"; if the control system finds that the collected voice does not contain the identification words, for example, the collected voice is 'strategic partnership' contained in the conversation between two people, the control system stops executing the subsequent steps and terminates the implementation of the control method.

There are various ways for the control system to determine whether the speech signal contains the identification word.

, before the command voice signal is converted into the primary character string in step 200, the command voice signal is directly recognized as a recognition word, that is, it is determined whether the command voice signal contains a voice signal of the recognition word in a manner of recognizing the audio signal.

Secondly, after the command speech signal is converted into the primary character string in step 200 and before the low-confidence character is determined from the primary character string in step 300, the recognition of the recognition word is directly performed on the primary character string converted from the command speech signal, that is, whether the recognition word is included in the primary character string is determined in a manner of recognizing the text character, and further, whether the command speech signal includes the speech signal of the recognition word is determined.

It can be understood that there may be more than identification words for the above multiple identification word judgment modes, for example, it can also use "trouble" as identification words, and the audio signal and character signal of the identification words are pre-recorded into the system, and directly extracted and identified the voice signal or text character string to be tested when needed.

In embodiments, determining the low-confidence character from the primary string in step 300 includes:

in step 310, the number of sample character strings containing each character in all sample character strings contained in the command voice sample library in each character string in the primary character string is calculated.

The command speech sample library includes a predetermined number of command speech samples, for example, 10000 accurately recognized speech sample strings, each corresponding to command speech samples, the primary string is "please manage and control" as mentioned in the above embodiments, and the primary string includes five characters, "please", "pipe", "up", "accuse" and "tone", and among the 10000 command speech samples included in the command speech sample library, the number of the speech sample string including the "please" character is 5000, the number of the speech sample string including the "pipe" character is 500, the number of the speech sample string including the "up" character is 2500, the number of the speech sample string including the "control" character is 800, and the number of the speech sample string including the "tone" character is 2000.

Step 320, comparing the number of the sample character strings calculated to contain each character with a minimum confidence threshold value, and determining the characters with the number lower than the minimum confidence threshold value as low confidence characters.

The preset minimum confidence threshold is 1000, and the occurrence number of the characters of 'pipe' and 'control' is 500 and 800, which are both smaller than the minimum confidence threshold, so that the characters of 'pipe' and 'control' are determined as low confidence characters.

The low-confidence characters are determined by acquiring the occurrence frequency of the characters in the sample library and setting a threshold value, and then the characters with wrong semantics are positioned, so that the positioning accuracy is high, and the positioning speed is high.

In , before substituting the characteristic parameters of the low-confidence character into the neural network to obtain the regular character string in step 400, the method further includes:

at step 330, high confidence characters are determined from the primary string.

The characters having a higher frequency of occurrence may be relative, e.g., in each character of the primary string, all characters not determined as low-confidence characters may be determined as high-confidence characters, i.e., all characters having a lower frequency of occurrence may be considered as high-frequency-of-occurrence characters, while the three characters "please", "up", "key up" are all high-confidence characters, the characters having a higher frequency of occurrence may also be absolute, e.g., maximum confidence thresholds for determining whether or not a character is a high-confidence character may be preset, and which are determined by exceeding the maximum confidence threshold in the command language sample library in the primary string, similarly to the manner in which low-confidence characters are determined.

It is understood that the high confidence character and the low confidence character may be determined simultaneously or sequentially in steps.

At step 340, predecessor high confidence characters and/or successor high confidence characters of the low confidence characters are determined.

For low confidence characters "pipe" and "control", "pipe" is "please" for the preceding high confidence character and "up" for the succeeding high confidence character, and "up" for the preceding high confidence character of "control" is "up" for the succeeding high confidence character.

It is to be understood that a low-confidence character may have only a corresponding predecessor high-confidence character, e.g., the last character of the primary string has no successor character, only a predecessor character, or may have only a corresponding successor high-confidence character, e.g., the character of the primary string has no predecessor character, only a successor character.

, the step 400 of substituting the characteristic parameters of the low-confidence character into the neural network to obtain the regular character string includes:

at step 410, a successor character set of predecessor high-confidence characters and/or a successor character set of successor high-confidence characters are determined from the command speech sample library.

The successor character set refers to a set of characters adjacent to and following the predecessor high-confidence character in the instruction voice sample containing the predecessor high-confidence character in the instruction voice sample library.

For example, if the preceding high-confidence character of the "pipe" is "please", and the voice sample string containing "please" in the command voice sample library is 5000, such as "please turn on the ceiling lamp", "please turn off the television", etc., the "on" and "off" in the two voice sample strings are the characters next to "please", and the set of all such characters in the 5000 voice sample strings containing "please" { "on", "off", "on", "up", "down" … … } is the subsequent character set of the preceding high-confidence character "please".

Similarly, if the high-confidence character succeeding the "pipe" is "up", the speech sample library is instructed to contain the speech sample string "up", for example, "close the tv after ten minutes", "please pull the curtain", etc., then "close" and "pull" in the two speech sample strings are the characters adjacent to the character before "up", and the set of all such characters in the speech sample string containing "up" { "close", "raise", "close" … … } is the precursor character set of the succeeding high-confidence character "pipe".

In the set of characters preceding the high confidence character "please" for "pipe" and the set of characters preceding the high confidence character "up", the correct ideographic character for the low confidence character "pipe" must appear.

It is understood that the successor character set of the predecessor high-confidence character and the predecessor character set of the successor high-confidence character of the low-confidence character "control" are determined in the same manner as the above "pipe", and when there are a plurality of low-confidence characters, after the corresponding successor character set and predecessor character set of all the low-confidence characters are determined, the neural network training is performed on each low-confidence character, or after the corresponding successor character set and predecessor character set of low-confidence characters are determined, the neural network is trained, and the replacement character of the low-confidence character is obtained, the corresponding successor character set and predecessor character set of the next low-confidence characters are determined.

It should be noted that, if the low-confidence character has no corresponding predecessor high-confidence character, there is no corresponding successor character set, and the predecessor character set is the same.

And step 420, training a neural network according to the sound vectors of part or all of the characters in the subsequent character set and/or the precursor character set.

Training neural networks are themselves complex tasks and involve setting a large number of parameters using, for example, iterative or adaptive algorithms.

For each character existing in the instruction voice sample library, the instruction voice sample library contains the sound vector of the character, namely the characteristic parameter of the character. At this time, the preceding character set and the succeeding character set all contain candidate characters that may replace the corresponding low-confidence characters, so the set of the preceding character set and the succeeding character set may also be referred to as a candidate character set. Taking "tube" as an example, the sound vectors of each character in the precursor character set and the successor character set of the "tube" are extracted from the instruction voice sample library, and then the sound vectors are substituted into the neural network to train the neural network.

A discrete binary network may be specifically adopted: the Hopfield neural network training mode comprises the following steps:

1. firstly, the sound vector is subjected to binary coding, and binary matrixes are used for storing the sound vector respectively. The binarization mode can use a binary halftone method in a binary image for reference to carry out binarization processing on the digital audio signal. The continuous time domain waveform of the audio frequency processed by the binarization process is converted into a binary square wave only, and the binarized audio frequency only comprises two audio signals of 0 and 1 or-1 and 1. If the sound vector stored in the command speech sample library is analog audio, it needs to be converted into digital audio.

2. The Hopfield neural network model is circular neural networks, feedback connections are formed from output to input, all the neuron units are and are connected with each other, each neuron receives information fed back by the output of all other neurons through the connection weight, the output of each neuron can be controlled by the output of all other neurons, so that the neurons can mutually restrict each other, the weight of each neuron is 0, and then the Hamming distance is set.

3. The sample vector group is input into the neural network, and a synchronous working mode is adopted for learning to obtain a weight matrix after training, at the moment, the neural network is trained, and the synchronous working mode, also called a parallel working mode, means that at any moment, the states of all or part of neurons are changed simultaneously.

4. Inputting the vector to be tested into the trained neural network, and after iterations for a certain number of times, the network reaches the energy minimum point, outputting the corresponding sound vector result, and then converting the result into a character form to obtain the corresponding character.

It can be understood that, when extracting the sound vectors of the characters, the sound vectors in all the successor character sets and the predecessor character sets can be extracted and substituted into the neural network, or only the sound vectors of a part of the characters can be extracted and substituted into the neural network, so as to reduce the operation amount and the operation time. It should be noted that training of the neural network is required to be performed separately for different low-confidence characters, so as to prevent confusion.

It should be noted that, if the low-confidence character has no corresponding successor character set, only the corresponding predecessor character set needs to be brought in, and the same applies to the case where there is no corresponding predecessor character set.

And 430, substituting the sound vector corresponding to the low-confidence character into a neural network for regularized recognition so as to determine a replacement character from the corresponding subsequent character set and/or precursor character set, and replacing the low-confidence character to obtain a regularized character string.

After the training of the corresponding neural network of the low-confidence character 'tube' is finished, extracting the sound vector of the 'tube' from the instruction voice sample library, and substituting the sound vector of the 'tube' into the neural network to obtain the substitute character 'off' in the corresponding candidate character set. And in the same way, substituting the sound vector of the control into the neural network to obtain a replacing character null in the corresponding candidate character set. And replacing 'pipe' and 'control' in the primary character string with 'off' and 'empty' respectively to convert the primary character string 'please manage and control up' and obtain a regular character string 'please turn off the air conditioner'.

It is understood that the extraction of the sound vector of the "pipe" may be after the neural network training is completed, or may be simultaneously extracted when the sound vectors of the characters in the succeeding character set and the preceding character set are extracted.

The candidate characters of the low-confidence characters are obtained by utilizing the high-confidence characters before and after the low-confidence characters, the characters with correct semantics are obtained from the candidate characters by utilizing the neural network, the characters which are identified as having wrong semantics due to the problems of accents and the like can be accurately and quickly replaced by the characters with correct semantics, the control of users from different regions on the building service facilities is facilitated, and the replacement accuracy is high.

In implementations, the determination of the high confidence character from the primary string in step 330 is made in of the following ways:

, all characters in the primary string that are not determined to be low-confidence characters are determined to be high-confidence characters, in this way, the characters in the primary string may only be low-confidence characters or high-confidence characters, and the preceding high-confidence character of all low-confidence characters is the first characters of the low-confidence character and the succeeding high-confidence character is the last characters of the low-confidence character.

In this manner, the characters in the primary string may be low confidence characters, high confidence characters, or intermediate characters that are neither low nor high confidence characters, and the high confidence characters are not adjacent to the low confidence characters and may be separated by intermediate characters.

In , the step 410 of determining the successor character set of the predecessor high-confidence characters and the predecessor character set of the successor high-confidence characters from the instruction speech sample library includes:

in step 411, all the successor characters of the predecessor high-confidence characters and/or all the predecessor characters of the successor high-confidence characters are determined in the instruction voice sample library.

For example, all successors to the preceding high-confidence character "please" of the low-confidence character "pipe" are: { "on", "off", "on", "up", "down", … … }, and all preceding characters that follow the high confidence character "up" of "tube" are: { "off", "up", "on", "off" … … }. The low confidence character "control" is obtained in the same way. It will be appreciated that the corresponding successor and predecessor character sets have been formed at this time, but that neither successor nor predecessor character set has been filtered at this time.

It should be noted that, if the low-confidence character and the preceding high-confidence character/the succeeding high-confidence character are adjacent to each other, and there is no intermediate character between them, the succeeding character of the preceding high-confidence character is also selected from the characters adjacent to the preceding high-confidence character, and the preceding character of the succeeding high-confidence character is also selected from the characters adjacent to the succeeding high-confidence character. If the intermediate character is adjacent between the low-confidence character and the predecessor high-confidence character/successor high-confidence character, when the predecessor character of the successor character/successor high-confidence character of the predecessor high-confidence character is determined, the determination is required according to the number of characters at intervals between the low-confidence character and the predecessor high-confidence character/successor high-confidence character. For example, a low confidence character and a predecessor high confidence character are separated by two middle characters, then the determined successor character also needs to be separated by two characters from the predecessor high confidence character.

The frequency of occurrence of successor and/or predecessor characters in all sample strings is counted, step 412.

The frequency of occurrence of 'on', 'off', 'on', 'up', 'down' in the respective sample character strings is 4100, 3900, 3400, 2600, 2000, and similarly, other successor characters have their own frequency of occurrence, and the frequency of occurrence of 'off', 'up', 'on', 'off', and 'off' in the respective sample character strings is 4500, 3700, 3500, 2400, and similarly, other predecessor characters have their own frequency of occurrence.

If the low-confidence character only has corresponding successor characters, only the frequency of the successor characters is counted, and the predecessor characters are treated in the same way.

And 413, determining a plurality of successor characters and/or predecessor characters with the highest frequency according to the frequency ranking or a preset frequency threshold, and respectively forming a successor character set and/or a predecessor character set.

, sorting the appearance frequencies of the corresponding successor characters and predecessor characters of the same low-confidence character according to a plurality of orders, and selecting a plurality of characters with the highest frequency from the appearance frequencies, such as selecting two characters with the highest frequency as high-frequency characters, and secondly, respectively selecting the characters with the appearance frequencies higher than a frequency threshold from the successor characters and predecessor characters according to a preset frequency threshold, and determining the selected characters as the high-frequency characters.

If the low-confidence character only has corresponding successor characters, only the successor characters are formed, and the predecessor characters are the same.

By screening out high-frequency successor characters and high-frequency predecessor characters and training the neural network only with the high-frequency successor characters and the high-frequency predecessor characters, the operation efficiency is improved, and the system response time is saved.

In embodiments, after step 412 is performed and before step 413 is performed, the occurrence frequencies of the same character in the succeeding character and the preceding character are summed, and the summed frequency data is used as the occurrence frequency of the character.

If the low-confidence character has corresponding successor characters and predecessor characters at the same time, after counting the occurrence frequency of the successor characters and the predecessor characters in all sample character strings, summing the occurrence frequency of the same characters in the successor characters and the predecessor characters, for example, the corresponding successor characters and the predecessor characters of the 'pipe' have the same characters of 'off' and 'up', and therefore summing the occurrence frequencies of 'off' and 'up'. After summing, the two identical characters have a total frequency data as the frequency of occurrence of the characters, that is, "off" in either the subsequent or the preceding character, the frequency is 3900+ 4500-8400, and "up" in either the subsequent or the preceding character, the frequency is 2600+ 3700-6300. In determining high frequency characters, the summed frequencies are also ranked and compared to a frequency threshold.

It should be noted that after the summation, if all the high-frequency characters determined in step 413 are characters in the subsequent character set or all the high-frequency characters are characters in the previous character set, then step 420 is performed subsequently, which is a case of training a neural network according to the sound vectors of some characters in the subsequent character set and the previous character set.

In , when training the neural network according to the sound vectors of some or all of the characters in the succeeding character set and/or the preceding character set in step 420, only some sound vectors of the corresponding characters in the instruction speech sample library are selected to train the neural network.

In the candidate character set composed of the successor character set and the predecessor character set, no matter how many characters are included, the sound vector of each character (namely, the candidate character) may be many, for example, the frequencies of "off" and "up" are high, so correspondingly, the sound vectors instructing "off" and "up" in the speech sample library are also many, and at this time, only parts of sound vectors can be selected from the sound vectors to be substituted into the neural network for training, so as to reduce the operation amount.

In , the step 500 of generating the corresponding control signal according to the regular string includes:

step 510, the structured character string is converted into a semantic instruction through synonym mapping.

After step 400 is executed and the regular character string is obtained, the obtained regular character string is converted into a semantic instruction with a more standard format through word vector mapping. The semantic instruction is standardized instruction information capable of expressing semantics contained in instruction voice, for example, the regular character strings "please turn off the air conditioner", "please turn off the air conditioner" and the like can be converted into standard terms "please turn off the air conditioner", wherein "turn off", "turn off" and other possible semantic terms are converted into the standard terms "turn off" in the process of synonym mapping.

The control system will have a semantic instruction set with several words stored therein that can be mapped synonymously with the semantic instruction, for example, the above "off" and "off" will be stored in the semantic instruction set and will be mapped to "off".

And step 520, generating a control signal according to the semantic instruction, and controlling the building service facility to execute a corresponding function according to the control signal.

The standard semantic instruction has a corresponding machine identification code, and taking the converted "please turn off the air conditioner" as an example, the machine identification code of "please turn off the air conditioner" can be sent to the air conditioner as a control signal, and the air conditioner can execute control according to the machine identification code.

The method has the advantages that various types of regular character strings are converted into semantic instructions, with relatively less total quantity, of a system in advance by utilizing the high computing capacity and the high data processing capacity of a control system, then control signals are generated, communication between the control system and controlled equipment is simpler, the requirement on the data processing capacity of the controlled equipment is reduced, the controlled equipment can execute corresponding functions only by identifying the control signals corresponding to the semantic instructions with less quantity, and an instruction-signal mapping library with various mapping relations does not need to be reserved in advance to deal with various types of control signals possibly sent by the control system.

Specifically, before step 510, the regular character string may be cut into words, word sequences are cut into individual words, and then words such as verbs are selected for conversion, for example, "please turn off the air conditioner" is cut into words to obtain "turn off," step 510 is executed to obtain a semantic command "turn off," and then a machine identifier corresponding to "turn off" is sent to the controlled device, i.e., the air conditioner, to turn off the air conditioner.

In , after generating the corresponding control signal according to the regular character string in step 500, the method further includes:

step 600, storing the collected instruction voice signal in an instruction voice sample library.

The collected instruction voice signal is not only subjected to voice recognition conversion to realize the final control of the building service facility, but also can be updated into an instruction voice sample library to be used as an instruction voice sample after a corresponding real instruction is finally determined, so that the instruction voice signal is extracted as a voice vector when other voices are recognized later, or used as a voice signal reference when voiceprint recognition is carried out, or a pre-recognition step can be additionally arranged when the voiceprint is confirmed to meet requirements and the voice signal contains a recognition word, the pre-recognition step is firstly carried out on the instruction voice signal, and the pre-recognition comparison object is a plurality of stored voices according to the usual pronunciation habit of a user, namely the voice signal which is correctly recognized and stored as the pre-recognition comparison object after being correctly recognized and executed.

By storing the instruction voice signal which can be correctly recognized in the instruction voice sample library, the recognition process when the same voice instruction is sent by a user later can be facilitated.

The embodiment of the building service facility control system based on semantic instruction intelligent recognition disclosed by the invention is described in detail below with reference to fig. 2. the embodiment is a control system for implementing of the building service facility control method, and the embodiment is mainly applied to building service facilities, so that a user can control the operation of various service devices in the building service facilities through instruction voices, the user does not need to manually operate a control panel by hands, the control mode is more intelligent and convenient, and meanwhile, when the instruction voices sent by a speaking user have accents, irregular pronunciations and unclear pronunciations, the instruction voices can be correctly associated and recognized to obtain correct instruction voices, and the problem that the user cannot normally control the building service facilities due to voice problems is avoided.

As shown in fig. 2, the building service facility control system disclosed in the present embodiment includes: the device comprises a signal acquisition module, a primary character generation module, a low-confidence character determination module, a regular character generation module and a control signal generation module.

The signal acquisition module is used for acquiring instruction voice signals. The signal acquisition module can adopt the microphone or by the adapter etc. that microphone and audio amplifier circuit constitute, the selection of adapter sets up according to the size of environmental space, needs to select the great adapter of the biggest pickup range under the big space, can also adopt the camera.

The primary character generation module is connected with the signal acquisition module and used for converting the instruction voice signal into a primary character string through voice recognition.

The low-confidence character determining module is connected with the primary character generating module and is used for determining the low-confidence characters from the primary character string.

The regular character generation module is connected with the low-confidence character determination module and used for substituting the characteristic parameters of the low-confidence characters into the neural network to obtain the regular character strings.

The control signal generation module is connected with the regular character generation module and used for generating corresponding control signals according to the regular character strings and controlling the building service facilities to execute corresponding functions according to the control signals.

In , the control system further comprises a noise processing module for performing noise reduction and/or echo cancellation processing on the collected command voice signal before converting the command voice signal into the primary character string through voice recognition.

In , the control system further comprises a voiceprint determination module for identifying a voiceprint of the command voice signal by a voiceprint recognition technique before converting the command voice signal into the primary character string by voice recognition, determining whether the identified voiceprint has control authority, and stopping executing the subsequent steps if the identified voiceprint does not have control authority.

In , the control system further includes an identification recognition module for recognizing whether the command speech signal or the primary string contains a speech signal or a character of an identified word before the low confidence character is determined from the primary string, and stopping execution of subsequent steps if the speech signal or the character of the identified word is not recognized.

In , the low-confidence character determination module includes an occurrence count statistics unit and a low-confidence character determination unit.

The appearance quantity counting unit is used for calculating the quantity of sample character strings containing each character in all the sample character strings contained in the instruction voice sample library in each character in the primary character string.

The low-confidence character determining unit is connected with the occurrence number counting unit and is used for comparing the number of the sample character strings which are calculated to contain each character with a minimum confidence threshold value and determining the characters with the number lower than the minimum confidence threshold value as the low-confidence characters.

In , the control system further comprises a high confidence character determining module for determining a high confidence character from the primary character string and determining a preceding high confidence character and/or a succeeding high confidence character of the low confidence character before substituting the characteristic parameters of the low confidence character into the neural network to obtain the regular character string.

And, the regular character generation module includes: the device comprises a character set determining unit, a neural network training unit and a regular character generating unit.

The character set determining unit is used for determining a successor character set of a predecessor high-confidence character and/or a predecessor character set of a successor high-confidence character from the instruction voice sample library.

The neural network training unit is connected with the character set determining unit and used for training the neural network according to the sound vectors of part or all of the characters in the subsequent character set and/or the precursor character set.

The regular character generation unit is connected with the neural network training unit and used for substituting the sound vector corresponding to the low-confidence character into the neural network to carry out regular recognition so as to determine a replacement character from the corresponding successor character set and/or predecessor character set and replace the low-confidence character to obtain a regular character string.

In embodiment, the high confidence character determination module includes a character determination unit and a second character determination unit,

the th character determining unit is used for determining all characters in the primary character string which are not determined as low-confidence characters as high-confidence characters, and/or,

the second character determination unit is connected to the th character determination unit, and is configured to compare the number of sample character strings calculated to contain each character with the maximum confidence threshold value, and determine characters whose number is higher than the maximum confidence threshold value as high-confidence characters.

In , the character set determination unit includes a third character determination subunit, a frequency statistics subunit, and a character set composition subunit.

The third character determining subunit is used for determining all the subsequent characters of the preceding high-confidence character in the instruction voice sample library and/or determining all the preceding characters of the subsequent high-confidence character.

And the frequency counting subunit is connected with the third character determining subunit and is used for counting the frequency of the occurrence of the subsequent characters and/or the precursor characters in all the sample character strings.

The character set forming subunit is connected with the frequency counting subunit and is used for determining a plurality of successor characters and/or predecessor characters with the highest frequency according to the frequency ranking or a preset frequency threshold value and respectively forming a successor character set and/or a predecessor character set.

In , the frequency statistics subunit is further configured to sum the occurrence frequencies of the same character in the successor character and the predecessor character before determining the multiple successor characters and predecessor characters with the highest frequency according to the frequency ranking or according to a preset frequency threshold, and use the summed frequency data as the occurrence frequency of the character.

In , the neural network training unit selects only a portion of the sound vectors that instruct the corresponding characters in the speech sample library to train the neural network.

In , the control system further comprises a sample storage module for storing the collected instruction voice signal in an instruction voice sample library after generating the corresponding control signal according to the regular character string.

In , the control signal generation module includes a semantic mapping unit and a signal generation unit.

The semantic mapping unit is used for converting the regular character string into a semantic instruction through synonym mapping.

The signal generating unit is connected with the semantic mapping unit and used for generating a control signal according to the semantic instruction and controlling the building service facility to execute a corresponding function according to the control signal.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1, building service facility control method based on semantic instruction intelligent recognition, which is characterized by comprising the following steps:

collecting instruction voice signals;

determining low-confidence characters from the primary character string;

generating a corresponding control signal according to the regular character string, and controlling a building service facility to execute a corresponding function according to the control signal;

in addition, before the step of substituting the characteristic parameters of the low-confidence character into the neural network to obtain the regular character string, the method further includes:

determining high confidence characters from the primary character string;

2. The control method of claim 1, wherein said determining low confidence characters from said primary string of characters comprises:

3. The control method according to claim 1, wherein the determining the successor character set of the predecessor high-confidence character and/or the successor character set of the successor high-confidence character from an instruction speech sample library comprises:

4. The control method of claim 1, wherein said generating a corresponding control signal from the regular string comprises:

5, A building service facility control system based on semantic instruction intelligent recognition, which is characterized by comprising:

the signal acquisition module is used for acquiring instruction voice signals;

the control signal generation module is used for generating corresponding control signals according to the regular character strings and controlling the building service facilities to execute corresponding functions according to the control signals;

wherein the control system further comprises:

the regular character generation module includes:

6. The control system of claim 5, wherein the low confidence character determination module comprises:

7. The control system of claim 5, wherein the character set determination unit comprises:

8. The control system of claim 5, wherein the control signal generation module comprises: