CN114360515A - Information processing method, information processing apparatus, electronic device, information processing medium, and computer program product - Google Patents

Information processing method, information processing apparatus, electronic device, information processing medium, and computer program product Download PDF

Info

Publication number
CN114360515A
CN114360515A CN202111501494.8A CN202111501494A CN114360515A CN 114360515 A CN114360515 A CN 114360515A CN 202111501494 A CN202111501494 A CN 202111501494A CN 114360515 A CN114360515 A CN 114360515A
Authority
CN
China
Prior art keywords
voice
elevator taking
taking instruction
recognition model
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111501494.8A
Other languages
Chinese (zh)
Inventor
黄丽莉
李良斌
陈孝良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing SoundAI Technology Co Ltd
Original Assignee
Beijing SoundAI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing SoundAI Technology Co Ltd filed Critical Beijing SoundAI Technology Co Ltd
Priority to CN202111501494.8A priority Critical patent/CN114360515A/en
Publication of CN114360515A publication Critical patent/CN114360515A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Indicating And Signalling Devices For Elevators (AREA)

Abstract

The application discloses an information processing method, an information processing device, electronic equipment, a medium and a computer program product. The voice information at the elevator is acquired, the voice information is input into the pre-trained voice recognition model, the voice information is recognized through the voice recognition model, the recognition result corresponding to the voice information is obtained, the first voice information is stored under the condition that the first voice information in the recognition result comprises the elevator taking instruction, therefore, the elevator only stores the first voice information comprising the elevator taking instruction, all the voice information is not stored, the storage space is saved, a marking person can acquire the first voice information comprising the elevator taking instruction at the background, then the first voice information comprising the elevator taking instruction is marked, the marking of the voice information not comprising the elevator taking instruction is reduced, the marking process is shortened, and the marking efficiency is improved.

Description

Information processing method, information processing apparatus, electronic device, information processing medium, and computer program product
Technical Field
The application belongs to the field of intelligent elevators, and particularly relates to an information processing method, an information processing device, electronic equipment, a medium and a computer program product.
Background
The existing intelligent elevator can recognize an elevator taking instruction sent by a user through a voice recognition model, so that the elevator taking requirement of the user is met, in order to enable voice recognition to be more complete, the voice recognition model needs to be trained through a training sample, and the data source of the training sample is all audio data in a period of time at the elevator.
In order to ensure the integrity of a data source, a background maintenance person is required to extract all audio data of the intelligent elevator, the audio data comprises a plurality of audio data irrelevant to an elevator taking command, and when the extracted audio is stored, the audio data irrelevant to the elevator taking command is also stored, so that the storage space is excessively occupied, and a marking person also needs to mark all the data, so that the marking time is prolonged, and the marking efficiency is low.
Disclosure of Invention
Embodiments of the present application provide an information processing method, an information processing apparatus, an electronic device, a medium, and a computer program product, so that a annotator can obtain saved effective data at a background and then perform annotation, thereby shortening an annotation process and improving annotation efficiency.
In a first aspect, an embodiment of the present application provides an information processing method, where the method includes:
acquiring voice information of an elevator;
inputting the voice information into a pre-trained voice recognition model, and recognizing the voice information through the voice recognition model to obtain a recognition result corresponding to the voice information, wherein the voice recognition model is obtained by training a training sample based on the voice recognition model, the training sample of the voice recognition model comprises a historical voice information sample and a label sample labeled to a boarding instruction in the historical voice information sample;
and when first voice information in the recognition result contains an elevator taking instruction, saving the first voice information, wherein the first voice information is used for training samples of an elevator taking instruction recognition model.
In one possible implementation, prior to the obtaining voice information at an elevator, the method further comprises:
receiving a first input instruction of a user to a preset saving control;
and responding to the first input instruction, and starting a voice information storage function.
In one possible implementation, after the saving the first voice information, the method further includes:
and labeling the elevator taking instruction in the first voice information to obtain a training sample for training the elevator taking instruction recognition model.
In one possible implementation manner, the labeling the elevator taking instruction in the first speech information to obtain the training sample for training the elevator taking instruction recognition model includes:
the elevator taking instruction is marked as a correct elevator taking instruction, an incorrect elevator taking instruction and an invalid elevator taking instruction, the correct elevator taking instruction is used for representing that the elevator taking instruction in the first voice message is consistent with the marked elevator taking instruction, the incorrect elevator taking instruction is used for representing that the elevator taking instruction in the first voice message is inconsistent with the marked elevator taking instruction, and the invalid elevator taking instruction is used for representing that the elevator taking instruction in the first voice message is invalid;
the elevator taking instruction identification model comprises a first sub-model and a second sub-model, first voice information corresponding to the correct elevator taking instruction and the wrong elevator taking instruction is used as a training sample of the first sub-model, first voice information corresponding to the invalid elevator taking instruction is used as a training sample of the second sub-model, the first sub-model is used for identifying the elevator taking instruction, and the second sub-model is used for eliminating the invalid elevator taking instruction in the elevator taking instruction.
In one possible implementation manner, the labeling the elevator taking instruction in the first voice message includes:
and responding to the marking operation of the user on the elevator taking instruction in the first voice message, and marking the elevator taking instruction in the first voice message.
In a possible implementation manner, in a case that a first voice message in the recognition result includes a boarding instruction, after saving the first voice message, the method further includes:
response operation of an elevator responding to an elevator taking command in the first voice message is obtained, so that the elevator taking command recognition model is optimized based on the response operation.
In one possible implementation, prior to the obtaining voice information at an elevator, the method further comprises:
acquiring training samples of a plurality of voice recognition models, wherein the training sample of each voice recognition model comprises a historical voice information sample and a label sample marked with a ladder taking instruction in the historical voice information sample;
and training the voice recognition model according to the training samples of the plurality of voice recognition models until a training stopping condition is met to obtain the trained voice recognition model.
In a possible implementation manner, the training the speech recognition model according to the training samples of the plurality of speech recognition models until a training stop condition is satisfied to obtain a trained speech recognition model includes:
respectively executing the following steps for the training sample of each speech recognition model:
inputting the historical voice information sample into a preset voice recognition model to obtain a prediction recognition result of a training sample of the voice recognition model;
determining a loss function value of the voice recognition model according to the predicted recognition result and a label sample corresponding to a training sample of the voice recognition model;
and under the condition that the loss function value does not meet the training stopping condition, adjusting the model parameters of the voice recognition model, training the voice recognition model after parameter adjustment by using the training sample of the voice recognition model until the training stopping condition is met, and obtaining the trained voice recognition model.
In a second aspect, an embodiment of the present application provides an information processing apparatus, including:
the acquisition module is used for acquiring voice information at the elevator;
the recognition module is used for inputting the voice information into a pre-trained voice recognition model, recognizing the voice information through the voice recognition model, and obtaining a recognition result corresponding to the voice information, wherein the voice recognition model is obtained by training a training sample based on the voice recognition model, and the training sample of the voice recognition model comprises a historical voice information sample and a label sample marked with a ladder taking instruction in the historical voice information sample;
and the storage module is used for storing the first voice information under the condition that the first voice information in the recognition result contains an elevator taking instruction, wherein the first voice information is used for training a training sample of an elevator taking instruction recognition model.
In a third aspect, an embodiment of the present application provides an electronic device, where the device includes:
a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements the information processing method as provided by the embodiments of the present application.
In a fourth aspect, the present application provides a computer storage medium, where computer program instructions are stored on the computer storage medium, and when the computer program instructions are executed by a processor, the computer program instructions implement the information processing method provided by the present application.
In a fifth aspect, the present application provides a computer program product, and when executed by a processor of an electronic device, the instructions of the computer program product cause the electronic device to execute the information processing method provided by the present application.
The information processing method, device, electronic equipment, medium and computer program product of the embodiments of the application, by acquiring the voice information of the elevator and inputting the voice information into a pre-trained voice recognition model, recognizing the voice information through a voice recognition model to obtain a recognition result corresponding to the voice information, if the first voice information in the recognition result contains the elevator taking instruction, the first voice information is saved, therefore, the elevator only stores the first voice information containing the elevator taking instruction, not stores all the voice information, saves the storage space, and the annotator can obtain the first voice information containing the elevator taking instruction in the background, then, the first voice information containing the elevator taking instruction is labeled, so that the labeling of the voice information containing no elevator taking instruction is reduced, the labeling process is shortened, and the labeling efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of an information processing method provided in an embodiment of the present application;
FIG. 2 is a schematic flowchart of another information processing method provided in an embodiment of the present application;
FIG. 3 is a schematic flowchart of another information processing method provided in an embodiment of the present application;
fig. 4 is a schematic structural diagram of an information processing apparatus according to still another embodiment of the present application;
fig. 5 is a schematic structural diagram of an information processing electronic device according to still another embodiment of the present application.
Detailed Description
Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are intended to be illustrative only and are not intended to be limiting. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by illustrating examples thereof.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The existing intelligent elevator needs background maintenance personnel to extract all audio data of the intelligent elevator, the audio data comprise a plurality of audio data irrelevant to elevator taking instructions, so that the storage space is excessively occupied, and marking personnel need to mark all the data, so that the marking time can be prolonged, the marking efficiency is low, and the information processing method is provided for solving the problems in the prior art. The voice information at the elevator is acquired, the voice information is input into the pre-trained voice recognition model, the voice information is recognized through the voice recognition model, the recognition result corresponding to the voice information is obtained, the first voice information is stored under the condition that the first voice information in the recognition result comprises the elevator taking instruction, therefore, the elevator only stores the first voice information comprising the elevator taking instruction without storing all the voice information, the storage space is saved, a marking person can acquire the first voice information comprising the elevator taking instruction at the background and mark the first voice information comprising the elevator taking instruction, the marking of the voice information not comprising the elevator taking instruction is reduced, the marking process is shortened, and the marking efficiency is improved.
The information processing method provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.
The following first describes a training process of a speech recognition model used in the information processing method provided in the embodiment of the present application in detail.
Fig. 1 is a schematic flow chart of an information processing method provided in the embodiment of the present application, and specifically may be a training method of a speech recognition model used in the information processing method provided in the embodiment of the present application.
As shown in fig. 1, a method for training a speech recognition model used in an information processing method provided in an embodiment of the present application includes the following steps:
step 110, training samples of a plurality of voice recognition models are obtained, wherein the training sample of each voice recognition model comprises a historical voice information sample and a label sample marked on a ladder taking instruction in the historical voice information sample.
And 120, training the voice recognition models according to the training samples of the plurality of voice recognition models until the training stopping conditions are met, and obtaining the trained voice recognition models.
In some embodiments of the application, training samples of a plurality of speech recognition models are obtained, the training sample of each speech recognition model includes a historical speech information sample and a label sample labeled with a ladder taking instruction in the historical speech information sample, then the speech recognition model is trained according to the training samples of the plurality of speech recognition models until a training stop condition is met, the trained speech recognition model is obtained, so that the trained speech recognition model can be obtained, and the trained speech recognition model can accurately recognize whether the speech information contains the ladder taking instruction.
Specific implementations of the above steps are described below.
In some embodiments, in step 110, training samples of a plurality of speech recognition models are obtained, where the training samples of each speech recognition model include historical speech information samples and labeled samples labeled with elevator taking instructions in the historical speech information samples.
In some embodiments, the historical speech information samples may be manually extracted by a person, speech information generated at an elevator over a period of time.
In some embodiments, the elevator taking instruction may be an instruction issued by the user to the elevator to operate the elevator, for example, the elevator taking instruction may be a voice message of "i want to go to 10 floors" issued by the user, and "10 floors" in the historical voice message is labeled as the elevator taking instruction, which is not limited herein.
In some embodiments, the tag sample containing the elevator taking instruction may be a recognition result corresponding to a historical voice information sample, specifically, the historical voice information sample may be voice information of "i want to go to 10 floors", and the "10 floors" in the historical voice sample is recognized as the elevator taking instruction corresponding to the elevator taking instruction containing the "10 floors" in the tag sample.
In some embodiments, the speech recognition model may be trained using a plurality of training samples including the historical speech information samples and the tag samples as training samples of the speech recognition model.
In some embodiments, in step 120, the speech recognition model is trained according to training samples of a plurality of speech recognition models until a training stop condition is satisfied, resulting in a trained speech recognition model.
In some embodiments, the training stopping condition may be a preset condition for stopping training of the speech recognition model, and the specific training stopping condition may be selected according to a user requirement, which is not limited herein.
In some embodiments of the present application, step 120 may specifically include:
respectively executing the following steps for the training sample of each speech recognition model:
and inputting the historical voice information sample into a preset voice recognition model to obtain a prediction recognition result of a training sample of the voice recognition model.
And determining a loss function value of the voice recognition model according to the predicted recognition result and the label sample corresponding to the training sample of the voice recognition model.
And under the condition that the loss function value does not meet the training stopping condition, adjusting the model parameters of the voice recognition model, training the voice recognition model after parameter adjustment by using the training sample of the voice recognition model until the training stopping condition is met, and obtaining the trained voice recognition model.
In some embodiments, the predicted recognition result may be a recognition result corresponding to the historical speech information predicted after the historical speech information is input into the speech recognition model, and for example, the predicted recognition result may be an elevator riding instruction of "10 floors" in the historical speech information.
In some embodiments, a loss function value of the speech recognition model may be determined according to the predicted recognition result and the label sample corresponding to the training sample of the speech recognition model, and the loss function value may be a recognition rate of the elevator taking instruction in the historical speech information during training.
In some embodiments, in the case that the loss function value does not satisfy the training condition, the parameters of the speech recognition model may be adjusted, and then the training may be continued using the adjusted speech recognition model until the training stop condition is satisfied, so as to obtain the trained speech recognition model.
In some embodiments of the present application, a predicted recognition result of a training sample of a speech recognition model may be obtained by inputting historical speech information into a preset speech recognition model, then a loss function value of the speech recognition model is determined according to the predicted recognition result and a label sample corresponding to the training sample of the speech recognition model, when the loss function value does not satisfy a training stop condition, a model parameter of the speech recognition model is adjusted, and the trained speech recognition model is obtained by training the speech recognition model after parameter adjustment using the training sample of the speech recognition model until the training stop condition is satisfied, so as to recognize the speech information using the trained speech recognition model, and obtain speech information containing a ladder taking instruction.
The information processing method provided by the embodiment of the present application is described in detail below with reference to fig. 2.
Fig. 2 is a schematic flowchart of another information processing method provided in an embodiment of the present application, and as shown in fig. 2, the data processing method provided in the embodiment of the present application may include the following steps:
step 210, voice information at the elevator is obtained.
Step 220, inputting the voice information into a pre-trained voice recognition model, and recognizing the voice information through the voice recognition model to obtain a recognition result corresponding to the voice information.
The voice recognition model is obtained by training a training sample based on the voice recognition model, and the training sample of the voice recognition model comprises a historical voice information sample and a label sample labeled to a ladder taking instruction in the historical voice information sample.
In step 230, when the first voice information in the recognition result includes the elevator boarding command, the first voice information is saved.
The first voice information is used for training samples of the elevator taking instruction recognition model.
In the embodiment of the application, the voice information at the elevator is acquired, the voice information is input into the pre-trained voice recognition model, the voice information is recognized through the voice recognition model, the recognition result corresponding to the voice information is obtained, the first voice information is stored under the condition that the first voice information in the recognition result comprises the elevator taking instruction, so that the elevator only stores the first voice information comprising the elevator taking instruction, all the voice information is not stored, the storage space is saved, a annotator can acquire the first voice information comprising the elevator taking instruction at the background and then annotate the first voice information comprising the elevator taking instruction, the annotation of the voice information not comprising the elevator taking instruction is reduced, the annotation process is shortened, and the annotation efficiency is improved.
Specific implementations of the above steps are described below.
In some embodiments, in step 210, voice information at the elevator is obtained.
The voice information at the elevator can be voice information sent by a user when the user takes the elevator, and is not limited here, for example, the user can say that the user wants to go to a 10-th building, the weather is good today, and other voice information, and the elevator can acquire the voice information sent by the user through voice recognition, and is not limited here.
In some embodiments, before step 210, the following steps may be further included:
receiving a first input instruction of a user to a preset saving control;
and responding to the first input instruction, and starting a voice information storage function.
The preset saving control can be a control part of a saving function added in the background of the elevator in advance, and specifically, the preset saving control can be a background interface button for controlling the opening of the saving function, which is not limited herein.
In some embodiments, the first input instruction of the user may be an operation of opening the preset saving control, so that the preset saving control may be opened under the first input instruction of the user.
In some embodiments, in response to a first input command from the user, the voice message holding function of the elevator is turned on, at which time the voice message uttered by the user may be held.
In some embodiments, in step 220, the speech information is input to a pre-trained speech recognition model, and a recognition result corresponding to the speech information is obtained.
The voice recognition model is obtained by training a training sample based on the voice recognition model, and the training sample of the voice recognition model comprises a historical voice information sample and a label sample labeled to a ladder taking instruction in the historical voice information sample.
In some embodiments, the acquired voice information of the elevator is input to a pre-trained voice recognition model, and the voice recognition model can recognize the voice information to obtain a recognition result corresponding to the voice information, and how to obtain the recognition result corresponding to the voice information through the voice recognition model can refer to the training process of the voice recognition model, which is not described herein again.
In some embodiments, the voice message at the elevator may be a voice message from the user saying "i want to go to 10 floors", "weather is good today", etc.
In some embodiments, when the voice information that the user says that "i want to go to the 10 th floor" is input into the voice recognition model, the voice recognition model recognizes that "i want to go to the 10 th floor", the recognition result may be the voice information containing the elevator taking instruction, when the voice information that the user says that "weather is good today" is input into the voice recognition model, the voice recognition model recognizes that "weather is good today", the recognition result may be other voice information not containing the elevator taking instruction,
in some embodiments, in step 230, in the case that the first voice information in the recognition result includes the elevator boarding instruction, the first voice information is saved.
The first voice information is used for training samples of the elevator taking instruction recognition model.
In some embodiments, the first voice message may be a voice message including an elevator riding command issued by the user while riding an elevator, for example, the user says "i want to go to floor 10", and the voice recognition model recognizes that the voice message "i want to go to floor 10" includes an elevator riding command "floor 10", that is, the first voice message.
In some embodiments, the first speech information contains elevator taking instructions and can therefore be used as training samples for the elevator taking instruction recognition model.
In some embodiments, the first voice message containing the elevator taking instruction is saved, and the user can obtain the saved first voice message.
In some embodiments, after step 230, the following steps may be further included:
and acquiring response operation of the elevator in response to the elevator taking instruction in the first voice message.
In some embodiments, a response operation of the elevator in response to the elevator taking instruction in the first voice message may be obtained, specifically, the first voice message may be "i want to go to floor 10" spoken by the user, when the voice recognition model is recognized as "floor 10", the response operation corresponding to the elevator may be to floor 10, when the voice recognition model is recognized as "floor 4", the response operation corresponding to the elevator may be to floor 4, the first voice message may also be "company with floor 10 today" spoken by the user is active ", the voice recognition model is recognized as" floor 10 ", and the response operation corresponding to the elevator may be to floor 10. Thus, based on the response operation, the operation state of the central control system of the elevator, that is, whether the central control system of the elevator correctly responds to the elevator riding command can be determined.
For the convenience of a user to label the first voice information, fig. 3 is a schematic flowchart of another information processing method provided in the embodiment of the present application.
As shown in fig. 3, after step 230 in fig. 2, the information processing method may further include the following steps:
and 310, labeling the elevator taking instruction in the first voice information to obtain a training sample for training an elevator taking instruction recognition model.
In some embodiments, the first speech information is a training sample that has been saved including a ride order for a ride order recognition model, and the first speech information may be speech information that includes "i want to go 10 th floor" spoken by the user.
In some embodiments, the elevator taking instruction in the first voice message is labeled in response to a labeling operation of the elevator taking instruction in the first voice message by a user.
In some embodiments, the marking operation of the elevator taking instruction in the first voice message by the user may be that the user marks "10 th floor" as "10 th floor".
In some embodiments, in step 310, may include:
the elevator taking instruction is marked as a correct elevator taking instruction, an incorrect elevator taking instruction and an invalid elevator taking instruction, the correct elevator taking instruction is used for representing that the elevator taking instruction in the first voice message is consistent with the marked elevator taking instruction, the incorrect elevator taking instruction is used for representing that the elevator taking instruction in the first voice message is inconsistent with the marked elevator taking instruction, and the invalid elevator taking instruction is used for representing that the elevator taking instruction in the first voice message is invalid.
The elevator taking instruction identification model comprises a first submodel and a second submodel, first voice information corresponding to a correct elevator taking instruction and an incorrect elevator taking instruction is used as a training sample of the first submodel, first voice information corresponding to an invalid elevator taking instruction is used as a training sample of the second submodel, the first submodel is used for identifying the elevator taking instruction, and the second submodel is used for eliminating the invalid elevator taking instruction in the elevator taking instruction.
In some embodiments, after recognizing the elevator taking instruction that the user a says "i want to go to the 10 th floor", the voice recognition model inputs the elevator taking instruction that the user a wants to go to the 10 th floor "into the first sub-model of the elevator taking instruction recognition model, the first sub-model recognizes the elevator taking instruction that the user a wants to go to the 10 th floor, may recognize the elevator taking instruction that the user a wants to go to the 10 th floor, may also recognize the elevator taking instruction that the user a wants to go to the 4 th floor, marks the instruction as a correct elevator taking instruction when recognizing the instruction that the user a wants to go to the 10 th floor, and marks the instruction as an incorrect elevator taking instruction when recognizing the instruction that the user a wants to go to the 4 th floor.
In some examples, after the voice recognition model recognizes the elevator taking command of 'i want to go to 10 th floor' spoken by the user a, the elevator taking command of 'i want to go to 10 th floor' is input into the first sub-model of the elevator taking command recognition model, and the first sub-model recognizes the elevator taking command of 'i want to go to 10 th floor', may recognize the elevator taking command of 'i want to go to 10 th floor', and may also recognize the elevator taking command of 'i want to go to 4 th floor'. The first sub-model may be a language recognition model that ensures the accuracy of the recognized elevator taking instruction only to recognize whether the pronunciation of the elevator taking instruction input into the first sub-model is accurate. The elevator taking device is used for distinguishing a correct elevator taking command from an incorrect elevator taking command.
In some examples, after the voice recognition model recognizes the elevator taking command of "10 th floor is active today" spoken by the user B, the elevator taking command of "10 th floor is active today" is input into the first sub-model of the elevator taking command recognition model, the first sub-model recognizes the elevator taking command of "10 th floor is active today" as a result of the recognition, the recognition result is "10 th floor is active today" (i.e. the recognition for "10 th floor" is error-free and the pronunciation recognition is correct), then the recognition result of "10 th floor is active today" is continuously input into the second sub-model, the second sub-model performs semantic analysis on the recognition result of "10 th floor is active today", and the analysis result is not an elevator taking command, i.e. an invalid elevator taking command, the second sub-model deletes the command "10 th floor is active today".
In other embodiments, the second sub-model may also retain the correct boarding instruction. Specifically, after recognizing the elevator taking command of going to the 4 th floor spoken by the user C, the voice recognition model inputs the elevator taking command of going to the 4 th floor into the first sub-model of the elevator taking command recognition model, the first sub-model identifies the elevator taking command of going to the 4 th floor, the identified result is 'going to the 4 th floor', then the recognition result of going to the 4 th floor is continuously input into a second sub-model, the second sub-model analyzes the semantic meaning before and after the recognition result of going to the 4 th floor, the command is analyzed to be a command for taking the elevator, in particular to be a command for taking the elevator to the 4 th floor, if the command is a correct elevator taking command, the second sub-model can output the command to go to the 4 th floor, the command of going to the 4 th floor is sent to the central control system of the elevator, so that the central control system of the elevator controls the elevator to run to the 4 th floor based on the command of going to the 4 th floor. Therefore, the second sub-model can be equivalent to an intention identification model, namely, a specific intention of an instruction is identified, so that the central control system of the elevator is prevented from controlling the operation of the elevator according to an invalid elevator taking instruction.
In some embodiments, the accuracy of identification of the elevator taking instruction is improved through training of the first sub-model and the second sub-model.
Specifically, other related technical solutions in the above steps may refer to the related descriptions of step 210 to step 230, which are not described herein again.
Based on the information processing method provided by the above embodiment, correspondingly, the embodiment of the present application further provides a specific implementation manner of the information processing apparatus, please refer to the following embodiments.
Referring to fig. 4 in particular, an information processing apparatus provided in an embodiment of the present application includes the following units:
the obtaining module 410 is used for obtaining voice information at the elevator;
the recognition module 420 is configured to input voice information into a pre-trained voice recognition model, and recognize the voice information through the voice recognition model to obtain a recognition result corresponding to the voice information, where the voice recognition model is obtained by training a training sample based on the voice recognition model, the training sample of the voice recognition model includes a historical voice information sample, and a tag sample labeled to a boarding instruction in the historical voice information sample;
the storing module 430 is configured to store the first voice information when the first voice information in the recognition result includes an elevator taking instruction, where the first voice information is used for a training sample of an elevator taking instruction recognition model.
In the embodiment of the application, the obtaining module obtains the voice information of the elevator, the recognition module inputs the voice information into a pre-trained voice recognition model, the voice information is recognized through the voice recognition model to obtain the recognition result corresponding to the voice information, the storage module stores the first voice information under the condition that the first voice information in the recognition result comprises the elevator taking instruction, so that the information processing device only stores the first voice information comprising the elevator taking instruction, all the voice information is not stored, the storage space is saved, a marking person can obtain the first voice information comprising the elevator taking instruction at the background, and then the first voice information comprising the elevator taking instruction is marked, so that the marking of the voice information not comprising the elevator taking instruction is reduced, the marking process is shortened, and the marking efficiency is improved.
In some embodiments of the present application, the information processing apparatus may further include:
the first receiving module is used for receiving a first input instruction of a user to a preset storage control before acquiring the voice information of the elevator;
and the first starting module responds to the first input instruction and starts a voice information storage function.
In some embodiments of the present application, the information processing apparatus may further include:
and the elevator taking instruction marking module is used for marking the elevator taking instruction in the first voice information after the first voice information is stored to obtain a training sample for training the elevator taking instruction recognition model.
In some embodiments of the present application, the elevator taking instruction labeling module is specifically configured to:
the elevator taking instruction is marked as a correct elevator taking instruction, an incorrect elevator taking instruction and an invalid elevator taking instruction, the correct elevator taking instruction is used for representing that the elevator taking instruction in the first voice message is consistent with the marked elevator taking instruction, the incorrect elevator taking instruction is used for representing that the elevator taking instruction in the first voice message is inconsistent with the marked elevator taking instruction, and the invalid elevator taking instruction is used for representing that the elevator taking instruction in the first voice message is invalid;
the elevator taking instruction identification model comprises a first sub-model and a second sub-model, first voice information corresponding to the correct elevator taking instruction and the wrong elevator taking instruction is used as a training sample of the first sub-model, first voice information corresponding to the invalid elevator taking instruction is used as a training sample of the second sub-model, the first sub-model is used for identifying the elevator taking instruction, and the second sub-model is used for eliminating the invalid elevator taking instruction in the elevator taking instruction. The elevator taking instruction marking module can be specifically used for:
and marking the elevator taking instruction in the first voice message in response to the marking operation of the user on the elevator taking instruction in the first voice message.
In some embodiments of the present application, the information processing apparatus may further include:
and the response operation acquisition module is used for acquiring the response operation of the elevator in response to the elevator taking instruction in the first voice message.
In some embodiments of the present application, the information processing apparatus may further include:
the system comprises a training sample acquisition module, a voice recognition module and a voice recognition module, wherein the training sample acquisition module is used for acquiring training samples of a plurality of voice recognition models before acquiring voice information at an elevator, and the training sample of each voice recognition model comprises a historical voice information sample and a label sample marked with an elevator taking instruction in the historical voice information sample;
and the voice recognition model training module is used for training the voice recognition models according to the training samples of the plurality of voice recognition models until the training stopping conditions are met, so as to obtain the trained voice recognition models.
The speech recognition model training module may be specifically configured to:
respectively executing the following steps for the training sample of each speech recognition model:
inputting a historical voice information sample into a preset voice recognition model to obtain a prediction recognition result of a training sample of the voice recognition model;
determining a loss function value of the voice recognition model according to the predicted recognition result and a label sample corresponding to a training sample of the voice recognition model;
and under the condition that the loss function value does not meet the training stopping condition, adjusting the model parameters of the voice recognition model, training the voice recognition model after parameter adjustment by using the training sample of the voice recognition model until the training stopping condition is met, and obtaining the trained voice recognition model.
Based on the method provided by the above embodiment, the embodiment of the present application further provides a specific implementation manner of the information processing electronic device. Fig. 5 shows a hardware structure diagram of an information processing electronic device provided in an embodiment of the present application.
The information processing electronics may include a processor 501 and storage 502 having stored thereon computer program instructions.
Specifically, the processor 501 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 502 may include mass storage for data or instructions. By way of example, and not limitation, memory 502 may include a Hard Disk Drive (HDD), a floppy Disk Drive, flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 502 may include removable or non-removable (or fixed) media, where appropriate. The memory 502 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 502 is non-volatile solid-state memory.
The memory may include Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors), it is operable to perform operations described with reference to the methods according to an aspect of the present disclosure.
The processor 501 reads and executes the computer program instructions stored in the memory 502 to implement any one of the information processing methods in the above-described embodiments.
In one example, the information processing electronics can also include a communication interface 503 and a bus 510. As shown in fig. 5, the processor 501, the memory 502, and the communication interface 503 are connected via a bus 510 to complete communication therebetween.
The communication interface 503 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present application.
Bus 510 comprises hardware, software, or both to couple the components of the online data traffic billing device to each other. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 510 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
In addition, in combination with the online data traffic charging method in the foregoing embodiment, the embodiment of the present application may provide a computer storage medium to implement. The computer storage medium having computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement any of the information processing methods in the above embodiments.
It is to be understood that the present application is not limited to the particular arrangements and instrumentality described above and shown in the attached drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As described above, only the specific embodiments of the present application are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims (12)

1. An information processing method characterized by comprising:
acquiring voice information of an elevator;
inputting the voice information into a pre-trained voice recognition model, and recognizing the voice information through the voice recognition model to obtain a recognition result corresponding to the voice information, wherein the voice recognition model is obtained by training a training sample based on the voice recognition model, the training sample of the voice recognition model comprises a historical voice information sample and a label sample labeled to a boarding instruction in the historical voice information sample;
and when first voice information in the recognition result contains an elevator taking instruction, saving the first voice information, wherein the first voice information is used for training samples of an elevator taking instruction recognition model.
2. The information processing method according to claim 1, characterized in that, before the acquiring of the voice information at the elevator, the method further comprises:
receiving a first input instruction of a user to a preset saving control;
and responding to the first input instruction, and starting a voice information storage function.
3. The information processing method according to claim 1, wherein after said saving the first voice information, the method further comprises:
and labeling the elevator taking instruction in the first voice information to obtain a training sample for training the elevator taking instruction recognition model.
4. The information processing method according to claim 3, wherein the labeling the elevator taking instruction in the first speech information to obtain a training sample for training an elevator taking instruction recognition model includes:
the elevator taking instruction is marked as a correct elevator taking instruction, an incorrect elevator taking instruction and an invalid elevator taking instruction, the correct elevator taking instruction is used for representing that the elevator taking instruction in the first voice message is consistent with the marked elevator taking instruction, the incorrect elevator taking instruction is used for representing that the elevator taking instruction in the first voice message is inconsistent with the marked elevator taking instruction, and the invalid elevator taking instruction is used for representing that the elevator taking instruction in the first voice message is invalid;
the elevator taking instruction identification model comprises a first sub-model and a second sub-model, first voice information corresponding to the correct elevator taking instruction and the wrong elevator taking instruction is used as a training sample of the first sub-model, first voice information corresponding to the invalid elevator taking instruction is used as a training sample of the second sub-model, the first sub-model is used for identifying the elevator taking instruction, and the second sub-model is used for eliminating the invalid elevator taking instruction in the elevator taking instruction.
5. The information processing method according to claim 3, wherein the labeling of the boarding instruction in the first speech information includes:
and responding to the marking operation of the user on the elevator taking instruction in the first voice message, and marking the elevator taking instruction in the first voice message.
6. The information processing method according to claim 1, wherein when a boarding instruction is included in first speech information in the recognition result, the method further comprises, after saving the first speech information:
and acquiring response operation of the elevator in response to the elevator taking instruction in the first voice message.
7. The information processing method according to claim 1, characterized in that, before the acquiring of the voice information at the elevator, the method further comprises:
acquiring training samples of a plurality of voice recognition models, wherein the training sample of each voice recognition model comprises a historical voice information sample and a label sample marked with a ladder taking instruction in the historical voice information sample;
and training the voice recognition model according to the training samples of the plurality of voice recognition models until a training stopping condition is met to obtain the trained voice recognition model.
8. The information processing method according to claim 7, wherein the training the speech recognition model according to the training samples of the plurality of speech recognition models until a training stop condition is satisfied to obtain a trained speech recognition model, includes:
respectively executing the following steps for the training sample of each speech recognition model:
inputting the historical voice information sample into a preset voice recognition model to obtain a prediction recognition result of a training sample of the voice recognition model;
determining a loss function value of the voice recognition model according to the predicted recognition result and a label sample corresponding to a training sample of the voice recognition model;
and under the condition that the loss function value does not meet the training stopping condition, adjusting the model parameters of the voice recognition model, training the voice recognition model after parameter adjustment by using the training sample of the voice recognition model until the training stopping condition is met, and obtaining the trained voice recognition model.
9. An information processing apparatus characterized in that the apparatus comprises:
the acquisition module is used for acquiring voice information at the elevator;
the recognition module is used for inputting the voice information into a pre-trained voice recognition model, recognizing the voice information through the voice recognition model, and obtaining a recognition result corresponding to the voice information, wherein the voice recognition model is obtained by training a training sample based on the voice recognition model, and the training sample of the voice recognition model comprises a historical voice information sample and a label sample marked with a ladder taking instruction in the historical voice information sample;
and the storage module is used for storing the first voice information under the condition that the first voice information in the recognition result contains an elevator taking instruction, wherein the first voice information is used for training a training sample of an elevator taking instruction recognition model.
10. An electronic device, characterized in that the device comprises: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements an information processing method as claimed in any one of claims 1 to 8.
11. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon computer program instructions which, when executed by a processor, implement the information processing method according to any one of claims 1 to 8.
12. A computer program product, characterized in that instructions in the computer program product, when executed by a processor of an electronic device, cause the electronic device to perform the information processing method according to any one of claims 1 to 8.
CN202111501494.8A 2021-12-09 2021-12-09 Information processing method, information processing apparatus, electronic device, information processing medium, and computer program product Pending CN114360515A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111501494.8A CN114360515A (en) 2021-12-09 2021-12-09 Information processing method, information processing apparatus, electronic device, information processing medium, and computer program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111501494.8A CN114360515A (en) 2021-12-09 2021-12-09 Information processing method, information processing apparatus, electronic device, information processing medium, and computer program product

Publications (1)

Publication Number Publication Date
CN114360515A true CN114360515A (en) 2022-04-15

Family

ID=81099705

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111501494.8A Pending CN114360515A (en) 2021-12-09 2021-12-09 Information processing method, information processing apparatus, electronic device, information processing medium, and computer program product

Country Status (1)

Country Link
CN (1) CN114360515A (en)

Similar Documents

Publication Publication Date Title
US10643605B2 (en) Automatic multi-performance evaluation system for hybrid speech recognition
CN108280542B (en) User portrait model optimization method, medium and equipment
CN109087667B (en) Voice fluency recognition method and device, computer equipment and readable storage medium
CN112749081B (en) User interface testing method and related device
CN112001175A (en) Process automation method, device, electronic equipment and storage medium
CN109766496B (en) Content risk identification method, system, device and medium
CN111414745A (en) Text punctuation determination method and device, storage medium and electronic equipment
CN111309876A (en) Service request processing method and device, electronic equipment and storage medium
CN115587598A (en) Multi-turn dialogue rewriting method, equipment and medium
CN111538823A (en) Information processing method, model training method, device, equipment and medium
CN113591463B (en) Intention recognition method, device, electronic equipment and storage medium
CN114360515A (en) Information processing method, information processing apparatus, electronic device, information processing medium, and computer program product
CN110192250B (en) Method and system for estimating symbol sequence in speech
CN110827827A (en) Voice broadcasting method and electronic equipment
CN114229637A (en) Elevator floor determining method, device, equipment and computer readable storage medium
CN111833867B (en) Voice instruction recognition method and device, readable storage medium and electronic equipment
CN115132192A (en) Intelligent voice interaction method and device, electronic equipment and storage medium
CN114220428A (en) Voice recognition method, device, equipment and computer storage medium
CN113470679A (en) Voice awakening method and device based on unsupervised learning, electronic equipment and medium
CN114360508A (en) Marking method, device, equipment and storage medium
KR20220012473A (en) Apparatus and method unsupervised pretraining speaker embedding extraction system using mutual information neural estimator, computer-readable storage medium and computer program
CN116978374A (en) Voice instruction response method, device, equipment, storage medium and vehicle
CN112822666A (en) Communication method, communication device, electronic equipment and storage medium
CN112002325A (en) Multi-language voice interaction method and device
CN117076596B (en) Data storage method, device and server applying artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination