CN109903754B

CN109903754B - Method, device and memory device for speech recognition

Info

Publication number: CN109903754B
Application number: CN201711305005.5A
Authority: CN
Inventors: 柳刘; 雒根雄; 李长春; 冯松浩; 任强
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2017-12-08
Filing date: 2017-12-08
Publication date: 2022-04-26
Anticipated expiration: 2037-12-08
Also published as: CN109903754A

Abstract

The disclosed embodiments provide a method for speech recognition. The method comprises the following steps: receiving a speech input from a user in a natural language; performing semantic word segmentation processing on the voice input to obtain a first semantic set of the voice input; acquiring a second semantic set related to the historical operation of the user; matching the first semantic set with the second semantic set; and determining the historical operation matched with the first semantic set in the second semantic set as the operation instruction corresponding to the voice input. The disclosed embodiment also provides equipment and storage equipment for voice recognition.

Description

Method, device and memory device for speech recognition

Technical Field

The present disclosure relates to a method, device and memory device for speech recognition.

Background

Voice interaction (natural language interaction) has become an increasingly important user portal in various fields, such as smart home field. The method is used for realizing equipment control, scene realization and interconnection interaction so as to realize simplicity and convenience of man-machine interaction.

In the prior art, the common natural language control function is generally realized by means of the speech recognition technology specially used in the industry and a specific language development tool. The equipment identification, the equipment control instruction, the equipment state analysis and the like are pre-trained off-line based on various specific equipment, and in order to achieve accurate control, each type of equipment needs to collect information in advance for analysis and training, and fixed knowledge and strategies are formed and solidified in software. However, different users may have various voice instructions for the same operation, so that such a technique cannot well recognize the operation targeted by the user's voice.

Therefore, a technique capable of recognizing a voice more efficiently is required.

Disclosure of Invention

One aspect of the disclosed embodiments provides a method for speech recognition. The method comprises the following steps: receiving a speech input from a user in a natural language; performing semantic word segmentation processing on the voice input to obtain a first semantic set of the voice input; acquiring a second semantic set related to the historical operation of the user; matching the first semantic set with the second semantic set; and determining the historical operation matched with the first semantic set in the second semantic set as the operation instruction corresponding to the voice input.

Optionally, the method may further comprise: acquiring device information of a device related to the user; identifying semantics in the first semantic set based on the device information if there is no historical operation in the second semantic set that matches the first semantic set; and if all the semantics in the first semantic set are identified based on the equipment information, combining the identified semantics in the first semantic set to form an operation instruction corresponding to the voice input.

Optionally, the method may further comprise: forming a third semantic set of semantics having the same or similar meaning as the semantics in the first semantic set, the third semantic set relating to historical operations of the user; matching the non-recognized semantics in the first semantic set with the semantics in the third semantic set if all the semantics in the first semantic set are not recognized based on the device information; and if all the unrecognized semantics in the first semantic set are recognized, combining the recognized semantics in the first semantic set to form an operation instruction corresponding to the voice input.

Optionally, the method may further comprise: if matching unrecognized semantics in the first semantic set with semantics in the third semantic set does not recognize all of the unrecognized semantics in the first semantic set, and if a user has previously performed speech recognition, combining the first semantic set with the user's previous user speech recognition to generate a new first semantic set; matching the new first semantic set with the second semantic set and the third semantic set respectively; and if all the unrecognized semantics in the first semantic set are recognized, combining the recognized semantics in the first semantic set to form an operation instruction corresponding to the voice input.

Optionally, the method may further comprise: and recording the result of each matching of the first semantic set or the new first semantic set, and updating the second semantic set and/or the third semantic set according to the matching result of each time.

Another aspect of an embodiment of the present disclosure provides an apparatus for speech recognition. The device comprises a voice input unit, an information processing and acquiring unit, a set matching unit and an operation instruction determining unit. The voice input unit is used for receiving voice input in a natural language form from a user. The information processing and acquiring unit is used for performing semantic word segmentation processing on the voice input to acquire a first semantic set of the voice input and acquiring a second semantic set related to historical operation of the user. The set matching unit is used for matching the first semantic set with the second semantic set. And the operation instruction determining unit is used for determining the historical operation matched with the first semantic set in the second semantic set as the operation instruction corresponding to the voice input.

Optionally, the information processing and acquiring unit is further configured to acquire device information of a device related to the user. The set matching unit is further configured to identify a semantic in the first semantic set based on the device information when there is no historical operation in the second semantic set that matches the first semantic set. The operation instruction determining unit is further used for combining the identified semantics in the first semantic set to form an operation instruction corresponding to the voice input when all the semantics in the first semantic set are identified based on the device information.

Optionally, the information processing and acquiring unit is further configured to form a third semantic set of semantics having the same or similar meaning as the semantics in the first semantic set, the third semantic set being related to the historical operation of the user. The set matching unit is further configured to match the non-recognized semantics in the first semantic set with the semantics in the third semantic set when all the semantics in the first semantic set are not recognized based on the device information. The operation instruction determining unit is further configured to, when all the unrecognized semantics in the first semantic set are recognized, combine the semantics in the recognized first semantic set to form an operation instruction corresponding to the voice input.

Optionally, the information processing and acquiring unit is further configured to combine the first semantic set with previous user speech recognition of the user to generate a new first semantic set when matching the unrecognized semantics in the first semantic set with the semantics in the third semantic set does not recognize all the unrecognized semantics in the first semantic set and the user has previously performed speech recognition. The set matching unit is further configured to match the new first semantic set with the second semantic set and the third semantic set, respectively. The operation instruction determining unit is further configured to, when all the unrecognized semantics in the first semantic set are recognized, combine the semantics in the recognized first semantic set to form an operation instruction corresponding to the voice input.

Optionally, the apparatus may further include a data recording and updating unit, configured to record a result of each matching of the first semantic set or the new first semantic set, and update the second semantic set and/or the third semantic set according to a result of each matching.

Another aspect of an embodiment of the present disclosure provides an apparatus for speech recognition. The apparatus includes a memory and a processor. The memory is to store executable instructions. The processor is configured to execute the executable instructions stored in the memory to perform the above-described method.

Another aspect of embodiments of the present disclosure provides a memory device having a computer program embodied thereon, which, when executed by a processor, causes the processor to perform the above-described method.

Drawings

For a more complete understanding of the present invention, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:

FIG. 1 schematically illustrates a simplified flow diagram of a method for speech recognition according to an embodiment of the present disclosure;

FIG. 2 shows a simplified block diagram of an apparatus for speech recognition according to an embodiment of the present disclosure;

FIG. 3 illustrates a flow diagram of one particular implementation of a method for speech recognition in accordance with an embodiment of the present disclosure;

FIG. 4 shows one illustration of a cosine similarity algorithm in accordance with an embodiment of the present disclosure; and

FIG. 5 schematically shows a schematic block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The words "a", "an" and "the" and the like as used herein are also intended to include the meanings of "a plurality" and "the" unless the context clearly dictates otherwise. Furthermore, the terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations thereof, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.

Accordingly, the techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). In addition, the techniques of this disclosure may take the form of a computer program product on a computer-readable medium having instructions stored thereon for use by or in connection with an instruction execution system. In the context of this disclosure, a computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the instructions. For example, the computer readable medium can include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. Specific examples of the computer readable medium include: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and/or wired/wireless communication links.

The definition of the smart home device generally includes a product type name, an attribute value, a related description, and the like.

For example, the intelligent attributes (part) of an air conditioner are defined as follows:

device type name: ABC intelligent air conditioner

The device attribute is as follows:

TABLE 1 Equipment Attribute List for certain air conditioners

Sometimes, the user may also name the device itself, such as "living room air conditioner", "bedroom air conditioner", etc.

The inventor of the present invention finds that, in the industry, the same type of devices cannot form a unified control command because of no unified standard, and therefore, a certain type of command cannot be commonly used. Taking a switch as an example, a switch may be on one kind of lamp, an onoff may be on one kind of socket, a switch may be on one kind of air conditioner, and a power may be on another kind of air conditioner. Furthermore, the description of different users may be different for the same action, e.g. a "switch" action may have different expressions of power, activation, etc., even if all users are concerned only with the meaning of "switch".

In general control of a common smart home, unique control instructions of various types of equipment are customized and sent to control and read the equipment, for example, a < switch, off > instruction is sent to an air conditioner to turn off a power supply.

The existing voice control (natural language control) performs equipment control, and finally, the existing voice control (natural language control) is converted into the common control. In order to achieve accurate control, each type of equipment needs to collect information in advance for analysis and training, and forms fixed knowledge and strategies which are solidified in software. For example, "turn on the air conditioner", the prior art trains the air conditioner in advance, and then the switch of the air conditioner is "switch", so that the analysis is "set the switch of the air conditioner to on". However, if a new device, the soymilk maker, is encountered, the switch attribute is power. Then to the instruction "open intelligent soybean milk machine", the conventional mode will be unable to discern correct instruction, perhaps also discern "set up the switch of intelligent soybean milk machine to on", and the intelligent soybean milk machine does not have the switch instruction in fact, only has the power instruction to can't reach the purpose of controlling equipment.

In order to solve the above problem, the embodiments of the present disclosure provide the following technical solutions.

FIG. 1 schematically shows a simplified flow diagram of a method for speech recognition according to an embodiment of the present disclosure.

As shown in fig. 1, the method includes an operation S110 of receiving a voice input in a natural language form from a user.

In operation S120, semantic word segmentation processing is performed on the voice input to obtain a first semantic set of the voice input.

In operation S130, a second semantic set related to the historical operation of the user is acquired.

In operation S140, the first semantic set is matched with the second semantic set.

In some examples, operation S140 may be performed using cosine similarity matching. A semantic in a first semantic set may be considered to match when the similarity of the semantic to a partial semantic in a second semantic set is greater than a certain threshold (e.g., without limitation, 0.9). Of course, operation S140 may also be performed in any manner used in the art for semantic matching, and the disclosed embodiments are not limited by a specific matching scheme.

In operation S150, a history operation in the second semantic set that matches the first semantic set is determined as an operation instruction corresponding to the voice input.

In some examples, the method may further comprise: device information of a device related to a user is acquired. If there is no historical operation in the second semantic set that matches the first semantic set, identifying semantics in the first semantic set based on the device information. And if all the semantics in the first semantic set are recognized based on the equipment information, combining the recognized semantics in the first semantic set to form an operation instruction corresponding to the voice input.

For example, the device information of the device may include a device name or identification, a device attribute, an attribute type, an attribute value, and the like. When a device name or identifier identical to the device name or identifier included in the first semantic set is found in the device information, for example, a match with other voices except the device name or identifier in the first semantic set can be found in attribute-related information such as device attributes, attribute types, attribute values, and the like, and an operation instruction is formed in this way.

In some examples, the method may further comprise: and forming a third semantic set of semantics having the same or similar meanings as the semantics in the first semantic set, wherein the third semantic set is related to the historical operation of the user. And if all the semantics in the first semantic set are not recognized based on the equipment information, matching the non-recognized semantics in the first semantic set with the semantics in the third semantic set. And if all the unrecognized semantics in the first semantic set are recognized, combining the recognized semantics in the first semantic set to form an operation instruction corresponding to the voice input.

In some examples, the method may further comprise: if matching the unrecognized semantics of the first semantic set with the semantics of the third semantic set does not recognize all unrecognized semantics of the first semantic set, and if the user previously recognized speech recognition, combining the first semantic set with the user's previous user speech recognition to generate a new first semantic set. And respectively matching the new first semantic set with the second semantic set and the third semantic set. And if all the unrecognized semantics in the first semantic set are recognized, combining the recognized semantics in the first semantic set to form an operation instruction corresponding to the voice input.

The previous speech recognition of the user as described herein may include a first semantic set and/or device information used in the last or last speech recognition.

In some examples, the method may further comprise: and recording the result of each matching of the first semantic set or the new first semantic set, and updating the second semantic set and/or the third semantic set according to the matching result of each time.

Whether the operation corresponding to the user voice is finally recognized or not, the result of the voice recognition can be recorded, and the result or a part of the result (for example, the recognized semantic part) is used as historical information to update the second semantic set and/or the third semantic set.

FIG. 2 shows a simplified block diagram of an apparatus for speech recognition according to an embodiment of the present disclosure.

As shown in fig. 2, the apparatus includes a voice input unit 210, an information processing and acquiring unit 220, a set matching unit 230, and an operation instruction determining unit 240.

The voice input unit 210 is used to receive a voice input in a natural language form from a user.

The information processing and acquiring unit 220 is configured to perform semantic segmentation on the voice input to obtain a first semantic set of the voice input and acquire a second semantic set related to the historical operation of the user.

The set matching unit 230 is configured to match the first semantic set with the second semantic set.

The operation instruction determining unit 240 is configured to determine a history operation in the second semantic set matching the first semantic set as an operation instruction corresponding to the voice input.

In some examples, the information processing and obtaining unit 220 is further configured to obtain device information of a device related to the user by the obtaining unit. The set matching unit 230 is further configured to identify a semantic in the first semantic set based on the device information when there is no historical operation in the second semantic set that matches the first semantic set. The operation instruction determining unit 240 is further configured to, when all the semantics in the first semantic set are identified based on the device information, combine the identified semantics in the first semantic set to form an operation instruction corresponding to the voice input.

In some examples, information processing and retrieval unit 220 is further configured to form a third set of semantics having the same or similar meaning as the semantics of the first set of semantics, the third set of semantics relating to a historical operation by the user. The set matching unit 230 is further configured to match the non-recognized semantics in the first semantic set with the semantics in the third semantic set when all the semantics in the first semantic set are not recognized based on the device information. The operation instruction determining unit 240 is further configured to, when all the unrecognized semantics in the first semantic set are recognized, combine the recognized semantics in the first semantic set to form an operation instruction corresponding to the voice input.

In some examples, information processing and retrieval unit 220 is further configured to combine the first set of semantics with previous user speech recognition of the user to generate a new first set of semantics when matching unrecognized semantics of the first set of semantics with semantics of the third set of semantics without recognizing all unrecognized semantics of the first set of semantics and the user has previously performed speech recognition. The set matching unit 230 is further configured to match the new first semantic set with the second semantic set and the third semantic set, respectively. The operation instruction determining unit 240 is further configured to, when all the unrecognized semantics in the first semantic set are recognized, combine the recognized semantics in the first semantic set to form an operation instruction corresponding to the voice input.

In some examples, the apparatus may further include a data recording and updating unit 250 for recording the result of each matching of the first semantic set or the new first semantic set, and updating the second semantic set and/or the third semantic set according to the result of each matching.

The technical solution of the embodiment of the present disclosure is described above by the method shown in fig. 1 and the apparatus shown in fig. 2. The technical solution according to the embodiments of the present disclosure will be explained in detail by a specific example. It is to be noted that the technical solutions of the embodiments of the present disclosure are not limited to the specific example, but may also include various modifications made to the example within the scope of the present invention.

FIG. 3 illustrates a flow diagram of one particular implementation of a method for speech recognition, according to an embodiment of the present disclosure.

A user's input s in natural language is first received. For the input, the preprocessing of operation S310 and at least one recognition process of S320-S350 may be performed.

In the following specific example, the technical solution of the embodiment of the present disclosure will be explained for an intelligent home (especially, an air conditioner) as an example. However, it should be understood that the technical solution of the embodiment of the present disclosure is equally applicable to any scenario in which control is performed using natural voice, and is not limited to an air-conditioning scenario.

Natural language preprocessing is performed in operation S310.

In the operation, the semantic word segmentation processing can be carried out on the original sentence s to form a basic semantic set A. The semantic segmentation may be performed by using a natural language segmentation technique HanLP (a segmentation and processing system of open source natural language), but is not limited thereto, and any other semantic segmentation processing method/tool existing in the art or developed in the future may be used to perform the operation.

Searching synonyms and synonyms of words in the set A from a predefined intelligent home synonym library/near-synonym library P, forming an expanded semantic set E1 together with original elements of the set A, searching word-language near-synonyms in the set A according to a semantic near-synonym library U obtained by analyzing a user history control result, and forming an expanded semantic set E2 together with the original set A.

The formation of the extended semantic sets E1, E2 may be performed immediately after the semantic participling process on the original sentence s, or whenever the use of the extended semantic sets E1, E2 is required. The execution machine of this operation is not limited to the operation S310.

The history manipulation record h commonly used by the user and a parent set ST in which the semantic participle set S of the history manipulation sentence is located can be obtained, wherein the parent set ST is { < h1, S1>, < h2, S2> … }.

Taking an air conditioner as an example, P: { { open, on, switch }, { switch, power supply } … }

Description of the drawings: such as in the first set element: open, on, the switch is a synonym, and other elements of the set are similar to it.

U: { { air conditioner, air conditioner in living room }, { temperature, temperature setting }

ST: { < turn on air conditioner in living room, { turn on, air conditioner in living room } > }

Taking an air conditioner as an example, assuming that the air conditioner is named as a living room air conditioner, the user sequentially speaks the following four statements:

TABLE 2 original sentence and its corresponding semantic set

After the semantic segmentation process is performed in operation S310, user-idiomatic language recognition is performed in operation S320.

A cosine similarity calculation method may be employed in the present operation. Of course, as described above, any similarity algorithm may be employed, and is not limited to the cosine similarity algorithm.

In order to make the technical solution of the embodiment of the present disclosure clearer, a simple description is now made on the cosine similarity algorithm. FIG. 4 shows one illustration of a cosine similarity algorithm in accordance with an embodiment of the present disclosure. The algorithm judges the similarity of two vectors by calculating cos value of an included angle between the vectors.

As shown in fig. 4, a and b are two vectors, and the cosine similarity between them is:

take the following two sentences a and B as examples, where the number of times a word occurs:

sentence a: i/like/watch/tv, no/like/watch/movie.

Sentence B: i/no/like/watch/tv, and also/no/like/watch/movie.

The number of occurrences of each word is as follows:

sentence a: i 1, like 2, watch 2, tv 1, movie 1, not 1, also 0.

Sentence B: i 1, like 2, watch 2, tv 1, movie 1, not 2, also 1.

The cosine similarity of the two sentences is:

by using the cosine similarity calculation method, for each historical statement h and the participle set S in the ST, h and statement 1(A) are divided into single words, and the similarity of the historical statement h 'opening the air conditioner in the living room' in the statement 1 and the ST is calculated as follows:

number of occurrences of word of sentence 1: 1 air-1 air conditioner 1 with 1-open 1 guest-1 hall 1

History statement "open living room air conditioner": 0-air-1 air conditioner 1 with 1-open 1 guest-1 hall 1

The cosine similarity is therefore: (1 × 1+0+1 × 1)/sqrt (1+1+1+1+0+ 1) × sqrt (1+1+1+1+1+ 1) × 0.926

If 0.8 is taken as a judgment threshold value for judging whether the two sentences are similar, and the similarity is more than 0.8, the two sentences are similar. Then in the above example, the value is greater than 0.8, which means that statement 1 is very similar to the history statement "turn on the air conditioner in the living room", so the same operation as the history statement can be used with the instruction of the history statement: the operation of < air conditioner in living room, switch, on > is used for controlling the equipment.

It should be noted that the specific structures and contents of the above-mentioned sets S, A, U and ST and the lexicon P are only examples provided for illustrating the embodiments of the present disclosure, and should not be considered as limiting the scope of the present invention, nor should they be considered as equivalent to the first semantic set, the second semantic set, the third semantic set, etc. described in the schemes schematically shown in fig. 1 and 2. Depending on the particular implementation (e.g., particular segmentation techniques, etc.), in other implementations, the sets S, A, U and ST and the lexicon P may also have different structures and forms.

Assume that no history statement satisfies the cosine similarity for statements 2, 3, 4 described above.

Accurate identification is performed in operation 330.

In this operation, the original sentence s or its underlying semantic set a needs to be matched with the device information of the user's device. The device information may include a device name owned by the user, a device product name, a device note, attribute names of all devices, an attribute type, an attribute value range, and the like.

In this operation, the noun set in a can be exactly text-matched with the device name or device product name or device identification or device remark of the device owned by the user, respectively. If not, it is determined whether the device name or the device product name or the device identifier or the device remark is included in S, and if not, the identification fails, and the fuzzy identification in operation S340 may be performed. If so, a text match is indicated.

After the device matching is successful, the matched nouns are removed from the noun set of A, and the remaining nouns are matched with the device attribute names. If there is no matching attribute, the verb set of A is matched to the device attribute, and if there is still no matching attribute, fuzzy recognition in operation S340 may be performed.

If the device attribute is successfully matched, the attribute type can be further judged, and the entry which is in accordance with the value range of the device attribute is searched from the noun or the number word set of the A. If the matching is successful, the accurate recognition system successfully recognizes, otherwise, the fuzzy recognition in operation S340 is performed.

For example, statement 2 includes "living room air conditioner", and thus identifies that the device is "living room air conditioner", and further identifies a specific attribute "wind speed" and a value "low level" thereof through the definition of the air conditioner, so that a specific operation instruction is found through accurate matching: < air conditioner in living room, wind speed, low gear >, it can be done by command assembly.

Suppose statements 3 and 4 cannot find the exact instruction.

The blur recognition is performed in operation S340.

This operation can utilize the result LR in accurate recognition, the extended semantic set E2; device information of devices di (0 ≦ i ≦ user device number) under the user. The device name, the device product name, and the device note may be participled to form the set Di. All sets Di constitute a set DA ═ Di } (0 ≦ i ≦ number of user devices).

If the LR does not include the device information, the following processing is performed: for each set Di in DA { Di } (0 < i < user equipment number), searching a similar word in a semantic similar word bank U of the user history control result, and combining the similar word with the original Di to form a new set DDi (0 < i > < user equipment number). If the DDi is a subset of E2, the matching device di succeeds, go to the second step, and go to the next subsystem if all DDi traversals are done without a match.

If the LR already includes the device information or the device di matches successfully, the element in the DDi set of the matching device di is removed from the noun in E2 to obtain EDDi. Names and enumerated value descriptions (non-enumerated values only include names) of the device attributes pj constitute a set Fj. And searching a word bank P and a word bank U, and combining the word bank P and the word bank U with Fj to obtain a set FFj (0 & ltj & gt & ltequipment attribute number). If FFj is a subset of EDDi, all matched FFj and attribute pairs are combined into a new set, i.e., SD { < pj, FFj > } (0 ≦ j ≦ the number of attributes of the device), and if SD is empty, context scene recognition processing S350 may be performed. Otherwise, finding out the most element pairs < pj, FFj > in the FFj in the SD as the result of fuzzy recognition, and carrying out equipment control. If the identification result only matches the attribute name and has no attribute value description, it indicates that the attribute data type is not an enumeration type, and determines the attribute data type (the data type is generally one of integer type and floating point type), finds a specific number, and controls the specific number as the value of the attribute.

Statement 3: the air conditioner temperature is set to 25 degrees, assuming that the device information is not found in both operations S320 and S330.

Therefore, using the first step of processing, DDi is obtained as { air conditioner, air conditioner in living room }, and DDi can be determined to be a subset of E2, and thus, the device "air conditioner in living room" is identified. Secondly, obtaining EDDi (EDDi) { temperature, namely, 25 degrees and temperature setting }; for Fj (j ═ 3), the extended set FFj corresponding to the attribute where pj is "temperature setting" is { temperature setting, temperature }, FFj is a subset of EDDi, and matching to an attribute "temperature setting" can also determine that other pj (j < >3) does not satisfy the set determination condition. Therefore, the attribute of "temperature setting" is finally matched, and according to the fact that the type of the attribute is integer, 25 is found to be an attribute value, so that a specific control instruction is found: < air conditioner in living room, temperature setting, 25 >.

No device is found in statement 4.

In operation S350, contextual scene recognition is performed

In the present operation, the semantic set a input this time may be used, and if the last time the user successfully identifies, the set a1 is composed of the successfully identified device name and attribute name, and the last time the user inputs the semantic set AL.

If a1 is not empty, the new set AA is a + a 1; otherwise, combining the A and AL sets to form a new semantic set AA;

taking a new AA set as an input for replacing A, sequentially using processing logics of an accurate system, fuzzy recognition and context scene recognition, completing recognition as long as one of the processing logics recognizes a control logic, and performing actual control; if none of these systems recognizes the manipulation logic, the system fails to correctly understand the user' S intention to the judgment system and feeds back to the user in operation S370.

For example, statement 4: a1, and thus AA, room air conditioner, temperature setting, degree, too, cold, 28. Taking AA as a new A, after the second step of operation, the identification result of the user can be identified in the logic of the fuzzy identification system: < air conditioner in living room, temperature setting, 28>, so that manipulation can be completed.

No matter whether the intention is recognized in the above operations S320-S350, in operation S360, before actual manipulation or return, the analysis result is fed back to the system, and the system performs training of natural language corpus through techniques such as machine learning, and further optimizes the basic algorithm and the thesaurus P and U.

It should be noted that although the above operations are numbered, it is not meant that the above operations need to be performed in the order of numbering, for example, the accurate identification in operation S330 may also be performed after operation S310 and before operation S320, or in any other feasible order. Thus, operations described herein do not have to be performed in the order of the numbers unless explicitly stated.

According to the embodiment of the disclosure, the natural language control of the smart home can be dynamically completed in real time according to various types of information which can be acquired, the problem that the existing natural language control system can accurately control equipment only by analyzing equipment information and user corpora in advance is solved, and the purpose of identifying the equipment which is added in real time and new descriptions of users can be also completed.

Not only through equipment information, also combine user information, user's natural language custom to realize that natural language controls intelligent house simultaneously, promoted natural language greatly and controlled intelligent house efficiency and accuracy.

Fig. 5 schematically shows a block diagram of an apparatus according to an embodiment of the present disclosure. The device shown in fig. 5 is only an example and should not bring any limitation to the function and use range of the embodiments of the present disclosure.

As shown in fig. 5, the apparatus 500 according to the embodiment includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the apparatus 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

The device 500 may also include one or more of the following components connected to the I/O interface 505: an input portion 506 including a keyboard or a mouse, etc.; an output portion 507 including a display such as a Cathode Ray Tube (CRT) or a Liquid Crystal Display (LCD), and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card or a modem. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted on the storage section 508 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The above-described functions defined in the apparatus of the embodiment of the present disclosure are performed when the computer program is executed by the Central Processing Unit (CPU) 501.

It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, or RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Methods, apparatus, units and/or modules according to embodiments of the present disclosure may also be implemented using hardware or firmware, or in any suitable combination of software, hardware and firmware implementations, for example, Field Programmable Gate Arrays (FPGAs), Programmable Logic Arrays (PLAs), system on a chip, system on a substrate, system on a package, Application Specific Integrated Circuits (ASICs), or in any other reasonable manner for integrating or packaging circuits. The system may include a storage device to implement the storage described above. When implemented in these manners, the software, hardware, and/or firmware used is programmed or designed to perform the corresponding above-described methods, steps, and/or functions according to the present disclosure. One skilled in the art can implement one or more of these systems and modules, or one or more portions thereof, using different implementations as appropriate to the actual needs. All of these implementations fall within the scope of the present invention.

As will be understood by those skilled in the art, for any and all purposes, such as in terms of providing a written description, all ranges disclosed herein also encompass any and all possible subranges and combinations of subranges thereof. Any listed range can be easily identified as a sufficient description and enabling the same range to be at least broken down into equal halves, thirds, quarters, fifths, tenths, etc. As a non-limiting example, each range discussed in this application can be readily broken down into a lower third, a middle third, and an upper third, among others. As those skilled in the art will also appreciate, all language such as "up to," "at least," "greater than," "less than," or the like, includes the recited quantity and refers to a range that can be subsequently broken down into subranges as discussed above. Finally, as will be understood by those skilled in the art, a range includes each individual component. So, for example, a group having 1-3 cells refers to a group having 1, 2, or 3 cells. Similarly, a group having 1-5 cells refers to groups having 1, 2, 3, 4, or 5 cells, and so forth.

While the present invention has been shown and described with reference to certain exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims and their equivalents. Accordingly, the scope of the present invention should not be limited to the above-described embodiments, but should be defined not only by the appended claims, but also by equivalents thereof.

Claims

1. A method for speech recognition, comprising:

receiving a speech input from a user in a natural language;

performing semantic word segmentation processing on the voice input to obtain a first semantic set of the voice input;

acquiring device information of a device related to the user;

identifying semantics in the first set of semantics based on the device information; and

if all the semantics in the first semantic set are identified based on the equipment information, combining the identified semantics in the first semantic set to form an operation instruction corresponding to the voice input,

the method further comprises the following steps:

forming a third semantic set of semantics having the same or similar meaning as the semantics in the first semantic set, the third semantic set relating to historical operations of the user;

matching the non-recognized semantics in the first semantic set with the semantics in the third semantic set if all the semantics in the first semantic set are not recognized based on the device information;

and if all the unrecognized semantics in the first semantic set are recognized, combining the recognized semantics in the first semantic set to form an operation instruction corresponding to the voice input.

2. The method of claim 1, further comprising:

if matching unrecognized semantics in the first semantic set with semantics in the third semantic set does not recognize all of the unrecognized semantics in the first semantic set, and if a user has previously performed speech recognition, combining the first semantic set with the user's previous user speech recognition to generate a new first semantic set;

matching the new first semantic set with the third semantic set; and

3. The method of claim 2, further comprising:

and recording the result of each matching of the first semantic set or the new first semantic set, and updating the third semantic set according to the matching result of each time.

4. An apparatus for speech recognition, comprising:

a voice input unit for receiving a voice input in a natural language form from a user;

the information processing and acquiring unit is used for performing semantic word segmentation processing on the voice input to acquire a first semantic set of the voice input and acquiring equipment information of equipment related to the user;

a set matching unit for identifying semantics in the first semantic set based on the device information; and

an operation instruction determination unit, configured to, when all the semantics in the first semantic set are identified based on the device information, combine the semantics in the identified first semantic set to form an operation instruction corresponding to the voice input,

wherein the information processing and acquisition unit is further configured to form a third semantic set of semantics having the same or similar meaning as the semantics in the first semantic set, the third semantic set relating to the historical operation of the user;

the set matching unit is further used for matching the unrecognized semantics in the first semantic set with the semantics in the third semantic set when all the semantics in the first semantic set are not recognized based on the equipment information;

the operation instruction determining unit is further configured to, when all the unrecognized semantics in the first semantic set are recognized, combine the semantics in the recognized first semantic set to form an operation instruction corresponding to the voice input.

5. The apparatus of claim 4, wherein:

the information processing and acquiring unit is further used for combining the first semantic set with the previous user voice recognition of the user to generate a new first semantic set when the unrecognized semantics in the first semantic set are matched with the semantics in the third semantic set, and all the unrecognized semantics in the first semantic set are not recognized and the user has previously performed voice recognition;

the set matching unit is further configured to match the new first semantic set with the third semantic set; and

6. The apparatus of claim 5, further comprising:

and the data recording and updating unit is used for recording the result of each matching of the first semantic set or the new first semantic set and updating the third semantic set according to the matching result of each time.

7. An apparatus for speech recognition, comprising:

a memory for storing executable instructions; and

a processor for executing executable instructions stored in the memory to perform the method of any one of claims 1-3.

8. A memory device having a computer program loaded thereon, which, when executed by a processor, causes the processor to carry out the method according to any one of claims 1-3.