WO2022213986A1 - Voice recognition method and apparatus, electronic device, and readable storage medium - Google Patents

Voice recognition method and apparatus, electronic device, and readable storage medium Download PDF

Info

Publication number
WO2022213986A1
WO2022213986A1 PCT/CN2022/085338 CN2022085338W WO2022213986A1 WO 2022213986 A1 WO2022213986 A1 WO 2022213986A1 CN 2022085338 W CN2022085338 W CN 2022085338W WO 2022213986 A1 WO2022213986 A1 WO 2022213986A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
voice
input
target application
key information
Prior art date
Application number
PCT/CN2022/085338
Other languages
French (fr)
Chinese (zh)
Inventor
梁浩
Original Assignee
维沃移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 维沃移动通信有限公司 filed Critical 维沃移动通信有限公司
Publication of WO2022213986A1 publication Critical patent/WO2022213986A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present application belongs to the field of communication technologies, and in particular relates to a method, an apparatus, an electronic device and a readable storage medium for speech recognition.
  • voice chat has gradually become one of the main ways of remote chat.
  • the voice chat mainly includes phone calls, voice calls and voice short messages.
  • text editing software can be used to perform speech recognition on the stored voice information to generate text information, and the user can then manually filter the key information in the text information.
  • the proportion of key information in the voice information corresponding to the voice chat is small, resulting in the above-mentioned text information containing a lot of redundant information, which reduces the efficiency of obtaining key information.
  • the purpose of the embodiments of the present application is to provide a speech recognition method, apparatus, electronic device, and readable storage medium, which can solve the problem of low efficiency in acquiring key information during speech recognition.
  • an embodiment of the present application provides a method for speech recognition.
  • the method includes: in the case of acquiring voice information, receiving a first input from a user; in response to the first input, displaying first key information in the voice information through a target application program, the first key information and the target associated with the type of application.
  • an embodiment of the present application provides a device for speech recognition.
  • the device includes: a first receiving module and a first display module; the above-mentioned first receiving module is used to receive the first input of the user when the voice information is acquired; the above-mentioned first display module is used to respond to the above-mentioned first input.
  • the target application program displays the first key information in the voice information, where the first key information is associated with the type of the target application program.
  • an embodiment of the present application provides an electronic device, the electronic device includes a processor, a memory, and a program or instruction stored in the memory and executable on the processor, the program or instruction being executed by the processor When executed, the steps of the method as provided in the first aspect are implemented.
  • an embodiment of the present application provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the steps of the method provided in the first aspect are implemented.
  • an embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement the method provided in the first aspect.
  • an embodiment of the present application provides a computer program product, where the program product is stored in a non-volatile storage medium, and the program product is executed by at least one processor to implement the method provided in the first aspect.
  • the voice information when the voice information is acquired, after the first input is received, the first input can be responded to, and the first key information in the voice information can be displayed through the target application, wherein the first key A key piece of information is associated with the type of target application.
  • the first key information associated with the target application is directly displayed and extracted from the voice information, which not only improves the extraction efficiency of the key information, but also avoids invalid recognition of the voice information, and also improves the human-machine performance of the electronic device. Interactive performance.
  • FIG. 1 is one of the schematic diagrams of a method for speech recognition provided by an embodiment of the present application.
  • FIG. 2 is a second schematic diagram of a method for speech recognition provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a chat interface provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of receiving a screen recognition gesture on a chat interface provided by an embodiment of the present application
  • FIG. 5 is a schematic diagram of displaying an application identifier according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a method for receiving a first input provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a method for displaying first key information provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a display interface of the first key information provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a method for acquiring first key information provided by an embodiment of the present application.
  • FIG. 10 is a third schematic diagram of a method for speech recognition provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of editing first key information according to an embodiment of the present application.
  • FIG. 13 is the second schematic structural diagram of the apparatus for speech recognition provided by the embodiment of the application.
  • FIG. 14 is one of the hardware schematic diagrams of the electronic device provided by the embodiment of the application.
  • FIG. 15 is the second schematic diagram of the hardware of the electronic device provided by the embodiment of the present application.
  • first, second and the like in the description and claims of the present application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the present application can be practiced in sequences other than those illustrated or described herein, and distinguish between “first”, “second”, etc.
  • the objects are usually of one type, and the number of objects is not limited.
  • the first object may be one or more than one.
  • “and/or” in the description and claims indicates at least one of the connected objects, and the character “/" generally indicates that the associated objects are in an "or” relationship.
  • user A calls customer B through an electronic device, and user A asks customer B whether to buy a product. If you want to buy a product, you need to record the attribute information such as the quantity, model, color, and delivery time of the product.
  • the recording function is activated, and the content of the call is recorded to generate voice information.
  • customer B wants to purchase a product
  • user A starts the text editing application, selects the above voice information in the text editing application, recognizes the voice information as text information, and saves the text information.
  • the recognized text information may not be accurate enough, resulting in deviations in the text information, and the key information that user A really needs to record in the text information is mixed with the text information, resulting in low efficiency in obtaining key information.
  • user A can receive the first input when the electronic device obtains the voice information of the content of the call between user A and customer B, so that the electronic device
  • the first key information in the voice message is displayed through a text editing application.
  • the first key information associated with the text editing application is directly displayed and extracted from the voice information, which not only improves the extraction efficiency of the key information, but also avoids invalid recognition of the voice information, and also improves the efficiency of the electronic device. computer interaction performance.
  • the electronic device displays the first key information, so that user A can instantly confirm with customer B whether the key information is recorded correctly, and can ensure that the attribute information of the customer and the products he purchased is consistent .
  • an embodiment of the present application provides a method for speech recognition.
  • the method may include steps 101 and 102 described below.
  • the method is exemplarily described below by taking an apparatus whose main body is speech recognition as an example.
  • Step 101 The apparatus for speech recognition receives the first input of the user in the case of acquiring the speech information.
  • the above-mentioned first input is an input that triggers the startup of the target application.
  • the above-mentioned first input may include at least one of the following: clicking an icon corresponding to the target application, screen gestures, clicking a mechanical button corresponding to the target application, clicking a virtual button corresponding to the target application, or other feasible options.
  • sexual input may include at least one of the following: clicking an icon corresponding to the target application, screen gestures, clicking a mechanical button corresponding to the target application, clicking a virtual button corresponding to the target application, or other feasible options.
  • the method for speech recognition provided in this embodiment of the present application may further include step 201 and step 202 .
  • Step 201 The apparatus for speech recognition receives a third input from the user.
  • Step 202 The apparatus for speech recognition displays at least one application identifier in response to the third input.
  • each application identifier is used to respectively indicate an application, and the at least one application identifier includes a target application identifier.
  • the above-mentioned third input may include at least one of the following: clicking an icon corresponding to the target application, screen recognition gesture, clicking a mechanical button corresponding to the target application, or other feasible inputs.
  • the apparatus for voice recognition when the apparatus for voice recognition detects that the electronic device is making a voice call normally, it will display a voice chat interface corresponding to the voice call.
  • FIG. 4 and FIG. 5 when the electronic device detects the user's screen recognition gesture on the voice chat interface (that is, the above-mentioned second input), in response to the screen recognition gesture, a display on the application identification interface is displayed.
  • the application identifiers corresponding to the five candidate application programs, from which the user can select the application program that he wants to start (ie, the above-mentioned target application program).
  • multiple optional application programs are displayed for the user, which not only facilitates the user to select the target application program suitable for recording the text information corresponding to the voice information, but also improves the man-machine of the electronic device. Interactive performance.
  • step 101 may be implemented through steps 601 and 602:
  • Step 601 During the voice call, the voice recognition device performs voice recording on the call content of the voice call, and acquires voice information.
  • the above-mentioned voice information is voice information recorded by a voice recognition device during a voice call.
  • Step 602 After the voice call ends or during the voice call, the device for voice recognition receives the first input from the user.
  • the electronic device can respond to the user's first input at any time to extract key information related to the target application, and there is no need to wait for the end of the voice call, which not only improves the voice recognition efficiency, but also improves the Human-computer interaction performance of electronic devices.
  • the voice recognition device when the voice recognition device performs voice recording on the call content of the voice call, it can select any of the following voice recording storage methods to record: automatically record and store all voice calls, Record and store voice calls in response to user input, automatically record and cache all voice calls.
  • a voice call includes multiple voice call modes such as phone calls, voice chats, or voice messages
  • the embodiments of the present application may select different voice recording and storage modes according to different implementation mechanisms of the call modes.
  • the purpose is usually to transmit the voice signal through the base station. Since the voice signal is transmitted to the voice recognition device and played, it usually disappears immediately. Therefore, in order to store the call content, Voice may be recorded by recording and storing voice calls in response to user input.
  • a voice call is implemented in the form of voice chat, which depends on the recording and forwarding of the voice by the network cloud. Due to the instability of network transmission, part of the voice call content is usually cached in the voice recognition device. Partial caching method to automatically record and cache all voice calls.
  • voice calls are implemented in the form of voice messages. Since the data volume is small and the possibility of repeated playback is high, all voice calls can be automatically recorded and stored in order to facilitate the user's repeated playback.
  • the voice information stored in the cache mode is temporarily stored in the cache space of the voice recognition device. Since the cache space is limited, for the normal operation of other processes in the voice recognition device, it is necessary to Periodically or irregularly clear the cache space occupied by voice messages.
  • the clearing methods used mainly include any one of the following: after the voice call ends, the voice information cache time reaches a preset duration, the first input corresponding to the voice information is received, and the cache clearing time is received. data input.
  • Step 102 In response to the first input, the apparatus for speech recognition displays the first key information in the speech information through the target application program.
  • the first key information is associated with the type of the target application, where the type of the target application may be referred to as the achievable function of the target application.
  • the type of the target application is the address book type
  • the name and phone number in the voice information are the first key information associated with the address book type
  • the target application is the purchase type
  • the buyer, brand, product in the voice information , product model, and product quantity are the first key information associated with the purchase type.
  • the application program corresponding to the application identifier targeted by the first input is directly determined as the target application program.
  • the voice recognition method provided by this embodiment of the present application may further include step 102a of determining a target application program or step 102b.
  • Step 102a In the case of detecting a voice call, the apparatus for voice recognition determines the application associated with the first inputted first parameter as the target application.
  • Step 102b In the case of detecting the end of the voice call, the apparatus for voice recognition determines the application associated with the first input second parameter as the target application.
  • the above-mentioned first input corresponds to different target applications.
  • the above-mentioned target application is the first application corresponding to the first input
  • the above-mentioned target application is the second application corresponding to the first input.
  • the same screen gesture may perform different operations.
  • the voice recognition device may be set in the case where the end of the voice call is detected, and the time difference between the end time of the voice call and the current time belongs to the preset time. After a set period of time, the device for speech recognition can determine that the application program associated with the second input parameter of the first input is the target application program.
  • the target application is a translation software program, so as to facilitate the rapid translation of voice information; otherwise, When the end of the voice call is detected at the input moment of the first input, it is confirmed that the target application program is a memo software program, so as to realize the recording and saving of the voice information.
  • the voice recognition apparatus After determining the target application corresponding to the first input, the voice recognition apparatus starts the target application in response to the first input.
  • a voice call When a voice call is detected, hang up the voice call program and switch the target application to the foreground to run.
  • end the voice call program In the case of detecting the end of the voice call, end the voice call program and switch the target application to the foreground to run.
  • the voice recognition device after receiving the first input, the voice recognition device first determines the target application program corresponding to the first input, then determines the specific voice content contained in the voice information, and then starts the new interface while starting the target application program, A new interface may also be launched in response to user input after launching the target application.
  • the voice recognition apparatus may display the first key information in the voice information on the newly created interface of the target application.
  • the voice recognition device performs at least one of the following operations on the first key information: saving, jumping to the broadcast number page, jumping to the short message sending page, and editing again. It should be noted that the operation on the first key information can be realized by the target application.
  • the voice recognition method in the case of acquiring voice information, after receiving the first input, the first input can be responded to, and the first input in the voice information can be displayed through the target application program.
  • Key information wherein the first key information is associated with the type of the target application.
  • the first key information associated with the target application is directly displayed and extracted from the voice information, which not only improves the extraction efficiency of the key information, but also avoids invalid recognition of the voice information, and also improves the human-machine performance of the electronic device. Interactive performance.
  • step 102 may be implemented through steps 701 to 703 .
  • Step 701 The apparatus for speech recognition starts the target application in response to the first input.
  • the target application is determined, and then the target application is started. It should be noted that, in order to reduce the operation steps, when the target application program is started, the newly created information interface can be directly started, so as to conveniently display the first key information.
  • Step 702 The apparatus for speech recognition acquires the first key information corresponding to the target application in the speech information.
  • step 702 may be implemented by step 702a and step 702b.
  • Step 702a The apparatus for speech recognition extracts the key fields included in the newly created information interface in the target application program.
  • Step 702b The apparatus for speech recognition determines the first key information corresponding to each key field in the first text information according to the above key field and the rule matching method of the key field.
  • the first key information in the first text information can be extracted according to the key fields by using a template, a vocabulary, and a rule matching method.
  • the target application is an address book
  • the key fields of the address book include name, phone number, and remarks, where the name is Li Si, the phone number is 135xxxxxxxx, and the address is No. 1 Shuyuan Street.
  • the phone number being a 7-8-digit fixed phone number, or an 11-digit mobile phone number, set the rule matching method, and extract the key field phone number corresponding to the first key information.
  • the display method used to display the above-mentioned first key information in the interface of the above-mentioned target application program includes but is not limited to a bold display method, an oblique display method, and a highlight display method.
  • Step 703 The device for speech recognition displays the first key information in the target application.
  • the first key information corresponding to the key field in the voice information is identified according to the key field in the newly created information interface.
  • the first key information is displayed on the newly created information interface of the target application.
  • the first key information corresponding to the key fields in the newly created information interface of the target application program is obtained, so as to realize the purpose of real-time speech recognition.
  • acquiring the first key information specifically includes acquiring voice information, and identifying the first key information corresponding to the target program in the voice information. In this way, each specific step will be explained separately.
  • step 702 in order to reduce the data amount of speech information that needs to be recognized for speech recognition, the proportion of redundant information is reduced, so as to improve speech recognition efficiency. Acquiring the voice information in step 702 can be achieved through step 702c or step 702d.
  • Step 702c The voice recognition device determines that the voice information is the first voice information when it detects that a voice call is being made; wherein, the first voice information corresponds to a preset time period before the input time of the first input. voice message.
  • Step 702d When detecting that the voice call ends, the voice recognition device determines that the voice information is the second voice information; wherein the second voice information is all the voice information recorded during the voice call.
  • a voice call when a voice call is detected, if the user hears some key content (such as mentioning a phone number, address, appointment time, etc.) mentioned by the user talking to the user, the user enters the first enter.
  • the voice information corresponding to the preset time period before the input time of the first input usually includes the information that the user needs to record, therefore, taking the input time of the first input as the reference time node, it is possible to obtain a smaller amount of data. Voice information, reduce the proportion of redundant information to improve the efficiency of voice recognition.
  • the voice information is extracted from the entire voice information recorded during the voice call.
  • the device for speech recognition can avoid missing the content that needs to be recorded in the voice passage by updating the voice information in real time. Similar to step 702c, acquiring voice information in step 702 may further include step 702e.
  • Step 702e In the case of detecting a voice call, the voice recognition device extracts and updates voice information according to preset intervals.
  • the above-mentioned updated voice information includes: voice information of the voice call recording corresponding to the preset interval.
  • the voice recognition device will extract the second key information from the updated voice information based on the target application, and then convert the first key information displayed in the interface of the target application. and second key information.
  • a and B have a conversation about purchasing a computer.
  • A informs B that he needs to buy 10 X brand computers with a model of 1566.
  • the key call content is displayed on the interface, for example, "B needs to buy 10 X-brand computers with model number 1566" (ie, the above-mentioned first key information). Then, B reconfirmed the order of goods during the call with A, and then A told B that he needed to buy 5 Y-brand computers with model number 1588.
  • the electronic device recognizes the content of the two calls again, and obtains new key content of the call, such as "five Y-brand computers with model number 1588" (that is, the second key information above), and based on the new key Call content, update the key call content displayed in the application interface of the target application.
  • new key content of the call such as "five Y-brand computers with model number 1588" (that is, the second key information above)
  • the updated voice information is continuously generated, so that the updated voice information can be extracted and displayed in real time by the target application program of the first key information and the second key information.
  • acquiring the voice information in step 702 may further include: the voice recognition device filters out interference information in the voice information according to a preset voiceprint recognition algorithm, and regenerates the voice information.
  • the voice recognition device filters out interference information in the voice information according to a preset voiceprint recognition algorithm, and regenerates the voice information.
  • the environment may include whistle sound, animal roaring sound, rain sound, wind sound, etc. In this way, filtering out interfering information can improve the accuracy of speech recognition.
  • Example 2 Identify the first key information corresponding to the target program in the voice information
  • the text information included in the voice information is recognized in step 702 , which specifically includes steps 901 to 903 .
  • Step 901 The apparatus for speech recognition converts the above-mentioned speech information into target text information, and extracts first text information corresponding to the target application program from the target text information.
  • Step 902 The apparatus for speech recognition acquires at least one type of information.
  • Step 903 The apparatus for speech recognition deletes the text information whose type matches the preset type in the first text information according to the above at least one type of information, so as to obtain the above-mentioned first key information.
  • the above-mentioned first text information includes first key information.
  • the voice information is converted into target text information by performing audio feature extraction on the voice information, and then converting the audio features into text information through scoring by an acoustic model and a language model.
  • the above-mentioned at least one type of information is used to indicate the type of information included in the first text information.
  • the above-mentioned text information types may include abnormal overlapping words, colloquial words, time words, and place words, such as "An annual meeting will be held at a hotel on the south side of a bank on Yellow River Street, this is at 16:00 on March 2. Start and end at 21:00, do you think such a time arrangement is okay?", among which: “south side” is an abnormal reduplication, "this this” is a spoken word, ", March 2, 16:00, 21:00” is a time Vocabulary, "Yellow River Street, a certain bank, a certain hotel” is a place vocabulary.
  • the types of reduplicated words, spoken words and local dialect words can be determined as preset types.
  • the device for speech recognition deletes the text information whose type matches the preset type in the above-mentioned first text information, which means to delete "side, this and this" in the above example, and obtain the first key information "in a certain bank on the south side of Yellow River Street.
  • the hotel has an annual meeting, which starts at 16:00 on March 2 and ends at 21:00. Do you think this timetable is ok?"
  • the spoken word optimization process in the above text optimization processing method includes:
  • Mode 1 Perform oral text analysis on the first text information by using a preset spoken word list.
  • the preset spoken word list can record a piece of spoken speech by the user through voice input, perform text recognition on the spoken speech, obtain spoken text information, display the spoken text information, edit the spoken text information to retain the spoken words corresponding to one's own mantra, and finally Combine spoken words to get a list of spoken words.
  • Mode 2 Perform oral text analysis on the first text information by adding a language model (specially training a language model for commonly used spoken words).
  • the recognized spoken words can be displayed to the user in a highlighted or highlighted manner, so that the user can choose whether to delete these spoken words for the final output of the voice input, or finally, for the recognized spoken words, by setting the voice input method , perform one-click deletion, or automatically delete the recognized spoken words.
  • the readability of the first key information can be improved.
  • the speech recognition method provided in this embodiment of the present application may further include steps 1001 and 1002 .
  • Step 1001 The apparatus for speech recognition receives a second input from the user.
  • Step 1002 In response to the second input, the apparatus for speech recognition uses an editing processing method corresponding to the second input to process the first key information.
  • the above-mentioned second input is an editing input of the above-mentioned first key information by the user.
  • the editing processing method corresponding to the above third input is: after clicking to select the information to be modified, the information to be modified appears in the editing column, and the user treats Modify information for deletion, splicing or re-entry.
  • the original information content of the first key information may be updated in real time with the user's modification, or may be replaced after the user's modification is completed.
  • the information to be modified is repeated repeated information
  • the user deletes, splices or re-enters the information to be modified, all the repeated parts are corrected uniformly to realize rapid information integration.
  • the speech recognition method provided in the embodiment of the present application may further include: generating a temporary cache control, where the temporary cache control is used to cache the first key information.
  • the phone number can be directly dialed through the temporary cache control, so that the displayed first key information can be directly applied, avoiding the user's choice to copy and paste the phone number. The tedious process of making phone calls.
  • the execution subject may be a speech recognition apparatus, or a control module in the speech recognition apparatus for performing the speech recognition method.
  • a method for performing speech recognition by a speech recognition device is used as an example to describe the speech recognition device provided by the embodiments of the present application.
  • the execution subject of the above speech recognition method may also be other devices or apparatuses that can perform the speech recognition method, which is not limited in this embodiment of the present application.
  • an embodiment of the present application provides a device for speech recognition.
  • the device for speech recognition includes: a first receiving module 1201 and a first display module 1202;
  • the above-mentioned first receiving module 1201 is configured to receive the first input of the user when the voice information is obtained;
  • the above-mentioned first display module 1202 is used to display the first key information in the above-mentioned voice information through the target application program in response to the first input received by the above-mentioned first receiving module 1201, the above-mentioned first key information and the type of the above-mentioned target application program Associated.
  • the above-mentioned first receiving module 1201 is configured to: in the process of conducting a voice call, perform voice recording on the call content of the above-mentioned voice call, and obtain the above-mentioned voice information; after the above-mentioned voice call ends or during the above-mentioned voice call , the user's first input is received.
  • the above apparatus further includes: a determining module 1203;
  • the above-mentioned determining module 1203 is used for the above-mentioned first display module 1202, in response to the above-mentioned first input, before displaying the first key information in the above-mentioned voice information through the target application, in the case of detecting that a voice call is being made, the above-mentioned first key information is displayed.
  • the application program associated with the input first parameter is determined as the above-mentioned target application program;
  • the above-mentioned determining module 1203 is also used for the above-mentioned first display module 1202, in response to the above-mentioned first input, before displaying the first key information in the above-mentioned voice information through the target application, in the case that the end of the voice call is detected, the above-mentioned first key information is displayed.
  • An application program associated with the input second parameter is determined as the above-mentioned target application program.
  • the above-mentioned first display module 1202 is configured to: start the target application in response to the above-mentioned first input; obtain the first key information corresponding to the above-mentioned target application in the above-mentioned voice information; in the above-mentioned target application, display The first key information above.
  • the above-mentioned first display module 1202 is specifically configured to: convert the above-mentioned voice information into target text information, and extract the first text information corresponding to the above-mentioned target application program from the above-mentioned target text information, the above-mentioned first text
  • the information includes the above-mentioned first key information; at least one type of information is obtained, and the above-mentioned at least one type of information is used to indicate the type of information contained in the above-mentioned first text information; according to the above-mentioned at least one type of information, delete the type in the above-mentioned first text information Text information matching the preset type to obtain the above-mentioned first key information.
  • the above apparatus further includes: a second receiving module 1204 and a first processing module 1205;
  • the above-mentioned second receiving module 1204 is used for receiving the user's second input after the above-mentioned first display module 1202 displays the first key information in the above-mentioned voice information through the target application program, and the above-mentioned second input is the user's response to the above-mentioned first key information. edit input;
  • the above-mentioned first processing module 1205 is configured to, in response to the second input received by the above-mentioned second receiving module 1204, use an editing processing method corresponding to the above-mentioned second input to process the above-mentioned first key information.
  • the first input in the case of acquiring the voice information, after receiving the first input, the first input can be responded to, and the first input in the voice information can be displayed through the target application program Key information, wherein the first key information is associated with the type of the target application.
  • the first key information associated with the target application is directly displayed and extracted from the voice information, which not only improves the extraction efficiency of the key information, but also avoids invalid recognition of the voice information, and also improves the human-machine performance of the electronic device. Interactive performance.
  • the apparatus for speech recognition in this embodiment of the present application may be an apparatus, or may be a component, an integrated circuit, or a chip in a terminal.
  • the apparatus may be a mobile electronic device or a non-mobile electronic device.
  • the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palmtop computer, an in-vehicle electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook, or a personal digital assistant (personal digital assistant).
  • UMPC ultra-mobile personal computer
  • netbook or a personal digital assistant (personal digital assistant).
  • non-mobile electronic devices can be servers, network attached storage (NAS), personal computer (personal computer, PC), television (television, TV), teller machine or self-service machine, etc., this application Examples are not specifically limited.
  • the apparatus for speech recognition in this embodiment of the present application may be an apparatus having an operating system.
  • the operating system may be an Android (Android) operating system, an IOS operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.
  • the apparatus for speech recognition provided in this embodiment of the present application can implement each process implemented by the foregoing method embodiment, and to avoid repetition, details are not described herein again.
  • an embodiment of the present application further provides an electronic device 1400, including a processor 1401, a memory 1402, and a program or instruction stored in the memory 1402 and executable on the processor 1401, the program Or, when the instruction is executed by the processor 1401, each process of the above-mentioned speech recognition method embodiment can be realized, and the same technical effect can be achieved. In order to avoid repetition, details are not repeated here.
  • the electronic devices in the embodiments of the present application include the above-mentioned mobile electronic devices and non-mobile electronic devices.
  • FIG. 15 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
  • the electronic device 1500 includes but is not limited to: a radio frequency unit 1501, a network module 1502, an audio output unit 1503, an input unit 1504, a sensor 1505, a display unit 1506, a user input unit 1507, an interface unit 1508, a memory 1509, and a processor 1510, etc. part.
  • the electronic device 1500 may also include a power supply (such as a battery) for supplying power to various components, and the power supply may be logically connected to the processor 1510 through a power management system, so as to manage charging, discharging, and power consumption through the power management system. consumption management and other functions.
  • a power supply such as a battery
  • the structure of the electronic device shown in FIG. 15 does not constitute a limitation to the electronic device.
  • the electronic device may include more or less components than the one shown, or combine some components, or arrange different components, which will not be repeated here. .
  • the above-mentioned user input unit 1507 is configured to receive the first input of the user when the voice information is acquired;
  • the processor 1510 is configured to display, through the target application, the first key information in the voice information in response to the first input, where the first key information is associated with the type of the target application.
  • the above-mentioned processor 1510 is further configured to perform voice recording on the call content of the above-mentioned voice call in the process of conducting a voice call, and obtain the above-mentioned voice information;
  • the above-mentioned user input unit 1507 is further configured to receive the first input of the user after the above-mentioned voice call ends or during the above-mentioned voice call.
  • the processor 1510 is further configured to determine the application associated with the first parameter of the first input as the target application when a voice call is detected; In this case, the application associated with the second parameter of the first input is determined as the target application.
  • the above-mentioned processor 1510 is further configured to start the target application in response to the above-mentioned first input; obtain the first key information corresponding to the above-mentioned target application in the above-mentioned voice information; in the above-mentioned target application, display the above-mentioned first key information. a key message.
  • the above-mentioned processor 1510 is further configured to convert the above-mentioned voice information into target text information, and extract the first text information corresponding to the above-mentioned target application program from the above-mentioned target text information, and the above-mentioned first text information includes: The above-mentioned first key information; obtain at least one type of information, and the above-mentioned at least one type of information is used to indicate the type of information contained in the above-mentioned first text information; according to the above-mentioned at least one type of information, delete the type and preset in the above-mentioned first text information Type matching text information to obtain the above-mentioned first key information.
  • the above-mentioned user input unit 1507 is further configured to receive the second input of the user, and the above-mentioned third input is the editing input of the above-mentioned first key information by the user;
  • the above-mentioned processor 1510 is further configured to, in response to the above-mentioned second input, use an editing processing manner corresponding to the above-mentioned third input to process the above-mentioned first key information.
  • the electronic device in the case of acquiring the voice information, after receiving the first input, it can respond to the first input, and display the first key information in the voice information through the target application program , wherein the first key information is associated with the type of the target application.
  • the first key information associated with the target application is directly displayed and extracted from the voice information, which not only improves the extraction efficiency of the key information, but also avoids invalid recognition of the voice information, and also improves the human-machine performance of the electronic device. Interactive performance.
  • the input unit 1504 may include a graphics processing unit (graphics processing unit, GPU) 15041 and a microphone 15042. Such as camera) to obtain still pictures or video image data for processing.
  • the display unit 1506 may include a display panel 15061, which may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the user input unit 1507 includes a touch panel 15071 and other input devices 15072 .
  • the touch panel 15071 is also called a touch screen.
  • the touch panel 15071 may include two parts, a touch detection device and a touch controller.
  • Other input devices 15072 may include but are not limited to physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be repeated here.
  • Memory 1509 may be used to store software programs as well as various data, including but not limited to application programs and operating systems.
  • the processor 1510 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface, and application programs, and the like, and the modem processor mainly handles wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor 1510.
  • Embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored on the readable storage medium.
  • a program or an instruction is stored on the readable storage medium.
  • a readable storage medium includes a computer-readable storage medium, such as a computer read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like.
  • ROM computer read-only memory
  • RAM random access memory
  • magnetic disk or an optical disk and the like.
  • An embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the above-mentioned speech recognition method embodiment, And can achieve the same technical effect, in order to avoid repetition, it is not repeated here.
  • the chip mentioned in the embodiments of the present application may also be referred to as a system-on-chip, a system-on-chip, a system-on-a-chip, or a system-on-a-chip, or the like.
  • the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to enable a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods of the various embodiments of the present application.
  • a storage medium such as ROM/RAM, magnetic disk, CD-ROM

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A method and apparatus for collecting key information during a voice call, an electronic device, and a readable storage medium, the method comprising: receiving a first input by a user when voice information has been acquired, the first input being an input that triggers and starts a target application program (101); and in response to the first input, displaying first key information in the voice information by means of the target application program, the first key information being associated with the type of the target application program (102).

Description

语音识别的方法、装置、电子设备和可读存储介质Method, apparatus, electronic device and readable storage medium for speech recognition
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请主张在2021年4月6日在中国提交的中国专利申请号202110369099.2的优先权,其全部内容通过引用包含于此。This application claims priority to Chinese Patent Application No. 202110369099.2 filed in China on April 6, 2021, the entire contents of which are hereby incorporated by reference.
技术领域technical field
本申请属于通信技术领域,具体涉及一种语音识别的方法、装置、电子设备和可读存储介质。The present application belongs to the field of communication technologies, and in particular relates to a method, an apparatus, an electronic device and a readable storage medium for speech recognition.
背景技术Background technique
随着电子技术的发展,语音聊天逐渐成为远程聊天的主要方式之一。其中,语音聊天主要包括打电话、语音通话和语音短消息。在上述语音聊天的过程中,通常需要通过纸笔手动记录,或通过文本编辑软件手动输入,如姓名、手机号码、地址信息、会议时间和会议地点等关键信息。With the development of electronic technology, voice chat has gradually become one of the main ways of remote chat. Among them, the voice chat mainly includes phone calls, voice calls and voice short messages. In the above-mentioned voice chat process, it is usually necessary to manually record with pen and paper, or manually input through text editing software, such as key information such as name, mobile phone number, address information, meeting time and meeting location.
在相关技术中,可以采用在语音聊天结束后,通过文本编辑软件对保存的语音信息进行语音识别,生成文本信息,用户再通过手动筛选方式,保存文本信息中的关键信息。In the related art, after the end of the voice chat, text editing software can be used to perform speech recognition on the stored voice information to generate text information, and the user can then manually filter the key information in the text information.
然而,与语音聊天对应的语音信息中关键信息所占的比例较小,导致上述文本信息中包含很多冗余信息,降低获取关键信息的效率。However, the proportion of key information in the voice information corresponding to the voice chat is small, resulting in the above-mentioned text information containing a lot of redundant information, which reduces the efficiency of obtaining key information.
发明内容SUMMARY OF THE INVENTION
本申请实施例的目的是提供一种语音识别的方法、装置、电子设备和可读存储介质,能够解决语音识别过程中获取关键信息的效率较低的问题。The purpose of the embodiments of the present application is to provide a speech recognition method, apparatus, electronic device, and readable storage medium, which can solve the problem of low efficiency in acquiring key information during speech recognition.
第一方面,本申请实施例提供了一种语音识别的方法。该方法包括:在获取到语音信息的情况下,接收用户的第一输入;响应于上述第一输入,通过目标应用程序显示上述语音信息中的第一关键信息,上述第一关键信息与上述目标应用程序的类型相关联。In a first aspect, an embodiment of the present application provides a method for speech recognition. The method includes: in the case of acquiring voice information, receiving a first input from a user; in response to the first input, displaying first key information in the voice information through a target application program, the first key information and the target associated with the type of application.
第二方面,本申请实施例提供了一种语音识别的装置。该装置包括:第一接收模块和第一显示模块;上述第一接收模块,用于在获取到语音信息的情况下,接收用户 的第一输入;上述第一显示模块,用于响应于上述第一输入,通过目标应用程序显示上述语音信息中的第一关键信息,上述第一关键信息与上述目标应用程序的类型相关联。In a second aspect, an embodiment of the present application provides a device for speech recognition. The device includes: a first receiving module and a first display module; the above-mentioned first receiving module is used to receive the first input of the user when the voice information is acquired; the above-mentioned first display module is used to respond to the above-mentioned first input. Upon input, the target application program displays the first key information in the voice information, where the first key information is associated with the type of the target application program.
第三方面,本申请实施例提供了一种电子设备,该电子设备包括处理器、存储器及存储在该存储器上并可在该处理器上运行的程序或指令,该程序或指令被该处理器执行时实现如第一方面提供的方法的步骤。In a third aspect, an embodiment of the present application provides an electronic device, the electronic device includes a processor, a memory, and a program or instruction stored in the memory and executable on the processor, the program or instruction being executed by the processor When executed, the steps of the method as provided in the first aspect are implemented.
第四方面,本申请实施例提供了一种可读存储介质,该可读存储介质上存储程序或指令,该程序或指令被处理器执行时实现如第一方面提供的方法的步骤。In a fourth aspect, an embodiment of the present application provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the steps of the method provided in the first aspect are implemented.
第五方面,本申请实施例提供了一种芯片,该芯片包括处理器和通信接口,该通信接口和该处理器耦合,该处理器用于运行程序或指令,实现如第一方面提供的方法。In a fifth aspect, an embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement the method provided in the first aspect.
第六方面,本申请实施例提供了一种计算机程序产品,该程序产品被存储在非易失的存储介质中,该程序产品被至少一个处理器执行以实现如第一方面提供的方法。In a sixth aspect, an embodiment of the present application provides a computer program product, where the program product is stored in a non-volatile storage medium, and the program product is executed by at least one processor to implement the method provided in the first aspect.
在本申请实施例中,在获取到语音信息的情况下,当接收到第一输入后,便可响应该第一输入,并通过目标应用程序显示语音信息中的第一关键信息,其中,第一关键信息与目标应用程序的类型相关联。如此,直接显示从语音信息中提取出与该目标应用程序相关联的第一关键信息,不仅提高了关键信息的提取效率,而且避免了对语音信息的无效识别,还提升了电子设备的人机交互性能。In the embodiment of the present application, when the voice information is acquired, after the first input is received, the first input can be responded to, and the first key information in the voice information can be displayed through the target application, wherein the first key A key piece of information is associated with the type of target application. In this way, the first key information associated with the target application is directly displayed and extracted from the voice information, which not only improves the extraction efficiency of the key information, but also avoids invalid recognition of the voice information, and also improves the human-machine performance of the electronic device. Interactive performance.
附图说明Description of drawings
图1为本申请实施例提供的一种语音识别的方法的示意图之一;1 is one of the schematic diagrams of a method for speech recognition provided by an embodiment of the present application;
图2为本申请实施例提供的一种语音识别的方法的示意图之二;2 is a second schematic diagram of a method for speech recognition provided by an embodiment of the present application;
图3为本申请实施例提供的聊天界面的示意图;3 is a schematic diagram of a chat interface provided by an embodiment of the present application;
图4为本申请实施例提供的聊天界面接收屏幕识别手势的示意图;4 is a schematic diagram of receiving a screen recognition gesture on a chat interface provided by an embodiment of the present application;
图5为本申请实施例提供的显示应用标识的示意图;5 is a schematic diagram of displaying an application identifier according to an embodiment of the present application;
图6为本申请实施例提供的接收第一输入的方法的示意图;6 is a schematic diagram of a method for receiving a first input provided by an embodiment of the present application;
图7为本申请实施例提供的显示第一关键信息的方法的示意图;7 is a schematic diagram of a method for displaying first key information provided by an embodiment of the present application;
图8为本申请实施例提供的第一关键信息的显示界面的示意图;8 is a schematic diagram of a display interface of the first key information provided by an embodiment of the present application;
图9为本申请实施例提供的获取第一关键信息的方法的示意图;9 is a schematic diagram of a method for acquiring first key information provided by an embodiment of the present application;
图10为本申请实施例提供的一种语音识别的方法的示意图之三;10 is a third schematic diagram of a method for speech recognition provided by an embodiment of the present application;
图11为本申请实施例提供的一种编辑第一关键信息的示意图;11 is a schematic diagram of editing first key information according to an embodiment of the present application;
图12为本申请实施例提供的语音识别的装置的结构示意图之一;12 is one of the schematic structural diagrams of the apparatus for speech recognition provided by an embodiment of the present application;
图13为本申请实施例提供的语音识别的装置的结构示意图之二;FIG. 13 is the second schematic structural diagram of the apparatus for speech recognition provided by the embodiment of the application;
图14为本申请实施例提供的电子设备的硬件示意图之一;FIG. 14 is one of the hardware schematic diagrams of the electronic device provided by the embodiment of the application;
图15为本申请实施例提供的电子设备的硬件示意图之二。FIG. 15 is the second schematic diagram of the hardware of the electronic device provided by the embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments.
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”等所区分的对象通常为一类,并不限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”,一般表示前后关联对象是一种“或”的关系。The terms "first", "second" and the like in the description and claims of the present application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the present application can be practiced in sequences other than those illustrated or described herein, and distinguish between "first", "second", etc. The objects are usually of one type, and the number of objects is not limited. For example, the first object may be one or more than one. In addition, "and/or" in the description and claims indicates at least one of the connected objects, and the character "/" generally indicates that the associated objects are in an "or" relationship.
下面结合附图,通过具体的实施例及其应用场景对本申请实施例提供的语音识别方法进行详细地说明。The speech recognition method provided by the embodiments of the present application will be described in detail below through specific embodiments and application scenarios with reference to the accompanying drawings.
针对电话购物的对话场景,用户A通过电子设备给客户B打电话,用户A询问客户B是否购买产品,如需购买需要记录产品的数量、型号、颜色、交付时间等属性信息。在相关技术中,用户A与客户B通话过程中,启动录音功能,对通话内容进行录音生成语音信息。通话结束后,如果客户B需要购产品,则用户A启动文本编辑应用程序,在文本编辑应用程序中选择上述语音信息,并将语音信息识别为文本信息,然后保存文本信息。在语音识别过程中,由于客户B的口语、口音、断句等问题,识别出的文本信息可能不够准确,导致文本信息有偏差,并且在文本信息中用户A真正需要记录的关键信息,混杂与文本信息中,导致获取关键信息的效率较低。For the dialogue scenario of telephone shopping, user A calls customer B through an electronic device, and user A asks customer B whether to buy a product. If you want to buy a product, you need to record the attribute information such as the quantity, model, color, and delivery time of the product. In the related art, during a call between user A and client B, the recording function is activated, and the content of the call is recorded to generate voice information. After the call, if customer B wants to purchase a product, user A starts the text editing application, selects the above voice information in the text editing application, recognizes the voice information as text information, and saves the text information. In the process of speech recognition, due to customer B's spoken language, accent, sentence segmentation and other problems, the recognized text information may not be accurate enough, resulting in deviations in the text information, and the key information that user A really needs to record in the text information is mixed with the text information, resulting in low efficiency in obtaining key information.
结合上述具体场景,在本申请实施例中,如果客户B需要购产品,用户A可以在电子设备获取到用户A与客户B的通话内容的语音信息的情况下,接收第一输入,使得电子设备通过文本编辑应用程序显示语音信息中的第一关键信息。如此,直接显示从语音信息中提取出与该文本编辑应用程序相关联的第一关键信息,不仅提高了关键信息的提取效率,而且避免了对语音信息的无效识别,还提升了电子设备的人机交互性能。同时,在用户A 与客户B通话过程中,电子设备显示第一关键信息,使得用户A能够即时与客户B确认关键信息记录是否正确,能够确保客户、及其购买的产品的属性信息是一致的。Combining the above specific scenarios, in the embodiment of the present application, if customer B needs to purchase a product, user A can receive the first input when the electronic device obtains the voice information of the content of the call between user A and customer B, so that the electronic device The first key information in the voice message is displayed through a text editing application. In this way, the first key information associated with the text editing application is directly displayed and extracted from the voice information, which not only improves the extraction efficiency of the key information, but also avoids invalid recognition of the voice information, and also improves the efficiency of the electronic device. computer interaction performance. At the same time, during the call between user A and customer B, the electronic device displays the first key information, so that user A can instantly confirm with customer B whether the key information is recorded correctly, and can ensure that the attribute information of the customer and the products he purchased is consistent .
如图1所示,本申请实施例提供一种语音识别的方法。该方法可以包括下述的步骤101和步骤102。下面以执行主体为语音识别的装置为例对该方法进行示例性说明。As shown in FIG. 1 , an embodiment of the present application provides a method for speech recognition. The method may include steps 101 and 102 described below. The method is exemplarily described below by taking an apparatus whose main body is speech recognition as an example.
步骤101:语音识别的装置在获取到语音信息的情况下,接收用户的第一输入。Step 101: The apparatus for speech recognition receives the first input of the user in the case of acquiring the speech information.
在本申请实施例中,上述第一输入为触发启动目标应用程序的输入。示例性的,上述第一输入可以包括以下至少一项:单击目标应用程序对应的图标、屏幕手势、点击目标应用程序对应的机械按键、点击目标应用程序对应的虚拟按键,也可以为其他可行性输入。In this embodiment of the present application, the above-mentioned first input is an input that triggers the startup of the target application. Exemplarily, the above-mentioned first input may include at least one of the following: clicking an icon corresponding to the target application, screen gestures, clicking a mechanical button corresponding to the target application, clicking a virtual button corresponding to the target application, or other feasible options. Sexual input.
可选地,在本申请实施例中,如果上述第一输入为用户对目标应用标识的输入,并且,上述目标应用标识用于指示上述目标应用程序,那么如图2所示,在步骤101中接收用户的第一输入之前,本申请实施例提供的语音识别的方法还可以包括步骤201和步骤202。Optionally, in this embodiment of the present application, if the above-mentioned first input is an input of a target application identifier by a user, and the above-mentioned target application identifier is used to indicate the above-mentioned target application program, then as shown in FIG. 2 , in step 101 Before receiving the first input from the user, the method for speech recognition provided in this embodiment of the present application may further include step 201 and step 202 .
步骤201:语音识别的装置接收用户的第三输入。Step 201: The apparatus for speech recognition receives a third input from the user.
步骤202:语音识别的装置响应于第三输入,显示至少一个应用标识。Step 202: The apparatus for speech recognition displays at least one application identifier in response to the third input.
在本申请实施例中,每个应用标识分别用于指示一个应用程序,上述至少一个应用标识中包括目标应用标识。In this embodiment of the present application, each application identifier is used to respectively indicate an application, and the at least one application identifier includes a target application identifier.
应注意的是,上述第三输入可以包括以下至少一项:单击目标应用程序对应的图标、屏幕识别手势、点击目标应用程序对应的机械按键,也可以为其他可行性输入。It should be noted that the above-mentioned third input may include at least one of the following: clicking an icon corresponding to the target application, screen recognition gesture, clicking a mechanical button corresponding to the target application, or other feasible inputs.
示例性的,如图3所示,语音识别的装置在检测到电子设备正常进行语音通话时,会显示语音通话对应的语音聊天界面。接着,如图4、图5所示,所示,当电子设备检测到用户在语音聊天界面上的屏幕识别手势(即上述第二输入)后,响应于该屏幕识别手势,在应用标识界面显示五个备选应用程序对应的应用标识,用户可以从中选择想要启动的应用程序(即上述目标应用程序)。Exemplarily, as shown in FIG. 3 , when the apparatus for voice recognition detects that the electronic device is making a voice call normally, it will display a voice chat interface corresponding to the voice call. Next, as shown in FIG. 4 and FIG. 5 , when the electronic device detects the user's screen recognition gesture on the voice chat interface (that is, the above-mentioned second input), in response to the screen recognition gesture, a display on the application identification interface is displayed. The application identifiers corresponding to the five candidate application programs, from which the user can select the application program that he wants to start (ie, the above-mentioned target application program).
如此,通过显示多个应用程序的应用标识,来为用户展示多个可选的应用程序,不仅方便用户从中选取适合记录语音信息对应的文本信息的目标应用程序,还提高了电子设备的人机交互性能。In this way, by displaying the application identifiers of multiple application programs, multiple optional application programs are displayed for the user, which not only facilitates the user to select the target application program suitable for recording the text information corresponding to the voice information, but also improves the man-machine of the electronic device. Interactive performance.
可选地,在本申请实施例中,如图6所示,上述步骤101可以通过步骤601和步骤602实现:Optionally, in this embodiment of the present application, as shown in FIG. 6 , the foregoing step 101 may be implemented through steps 601 and 602:
步骤601:在进行语音通话的过程中,语音识别的装置对语音通话的通话内容进行语音录制,获取到语音信息。Step 601: During the voice call, the voice recognition device performs voice recording on the call content of the voice call, and acquires voice information.
其中,上述语音信息是语音识别的装置在语音通话的过程中录制得到的语音信息。Wherein, the above-mentioned voice information is voice information recorded by a voice recognition device during a voice call.
步骤602:在语音通话结束后或语音通话过程中,语音识别的装置接收用户的第一输入。Step 602: After the voice call ends or during the voice call, the device for voice recognition receives the first input from the user.
如此,通过对语音通话内容的实时录制,使得电子设备可以随时响应用户的第一输入来提取与目标应用程序相关的关键信息,而且无需等待语音通话结束,不仅提高了语音识别效率,还提高了电子设备的人机交互性能。In this way, through the real-time recording of the voice call content, the electronic device can respond to the user's first input at any time to extract key information related to the target application, and there is no need to wait for the end of the voice call, which not only improves the voice recognition efficiency, but also improves the Human-computer interaction performance of electronic devices.
进一步可选地,在本申请实施例中,语音识别的装置在对语音通话的通话内容进行语音录制时,可以选择以下任一种语音录制存储方式进行录制:对所有语音通话自动录制并存储、响应于用户输入录制并存储语音通话、对所有语音通话自动录制并缓存。Further optionally, in the embodiment of the present application, when the voice recognition device performs voice recording on the call content of the voice call, it can select any of the following voice recording storage methods to record: automatically record and store all voice calls, Record and store voice calls in response to user input, automatically record and cache all voice calls.
应注意的是,由于语音通话包括电话、语音聊天或语音消息等多种语音通话方式,因此,本申请实施例可以依据通话方式的实现机制不同选取不同的语音录制存储方式。It should be noted that, since a voice call includes multiple voice call modes such as phone calls, voice chats, or voice messages, the embodiments of the present application may select different voice recording and storage modes according to different implementation mechanisms of the call modes.
示例性的,以打电话的方式实现语音通话,其目的通常在于通过基站进行语音信号的传递,由于语音信号传输至语音识别的装置并播放后,通常会即刻消失,因此,为了存储通话内容,可以通过响应于用户输入录制并存储语音通话的方式来录制语音。Exemplarily, to implement a voice call by making a phone call, the purpose is usually to transmit the voice signal through the base station. Since the voice signal is transmitted to the voice recognition device and played, it usually disappears immediately. Therefore, in order to store the call content, Voice may be recorded by recording and storing voice calls in response to user input.
示例性的,以语音聊天的方式实现语音通话,其实现依赖于网络云端对语音的记录与转发,由于网络传输的不稳定性,通常在语音识别的装置缓存部分语音通话内容,因此,可以依据部分缓存方式来自动录制并缓存所有语音通话。Exemplarily, a voice call is implemented in the form of voice chat, which depends on the recording and forwarding of the voice by the network cloud. Due to the instability of network transmission, part of the voice call content is usually cached in the voice recognition device. Partial caching method to automatically record and cache all voice calls.
示例性的,以语音消息的方式实现语音通话,由于其数据量较小,且需要重复播放的可能性较大,因此,为了便于用户的重复播放,可以自动录制并存储所有语音通话。Exemplarily, voice calls are implemented in the form of voice messages. Since the data volume is small and the possibility of repeated playback is high, all voice calls can be automatically recorded and stored in order to facilitate the user's repeated playback.
进一步可选地,在本申请实施例中,以缓存方式存储的语音信息,临时存储在语音识别的装置的缓存空间,由于缓存空间有限,所以为了语音识别的装置中其他进程的正常运行,需要定期或不定期的清除语音信息所占用的缓存空间。一般的,在清除语音信息的过程中,使用的清除方式主要包括以下任一种:语音通话结束后语音信息缓存时间达到预置时长、接收到与语音信息对应的第一输入、接收到清除缓存数据的输入。Further optionally, in the embodiment of the present application, the voice information stored in the cache mode is temporarily stored in the cache space of the voice recognition device. Since the cache space is limited, for the normal operation of other processes in the voice recognition device, it is necessary to Periodically or irregularly clear the cache space occupied by voice messages. Generally, in the process of clearing the voice information, the clearing methods used mainly include any one of the following: after the voice call ends, the voice information cache time reaches a preset duration, the first input corresponding to the voice information is received, and the cache clearing time is received. data input.
步骤102:语音识别的装置响应于第一输入,通过目标应用程序显示语音信息中的第一关键信息。Step 102: In response to the first input, the apparatus for speech recognition displays the first key information in the speech information through the target application program.
在本申请实施例中,第一关键信息与目标应用程序的类型相关联,其中,目标应用程序的类型,或可称为目标应用程序的可实现功能。示例性的,目标应用程序的类型为通讯录类型,语音信息中姓名、电话号码为与通讯录类型相关联的第一关键信息,目标应用程序为采购类型,语音信息中购买方、品牌、产品、产品型号、产品数量为与采购类型相关联的第一关键信息。In this embodiment of the present application, the first key information is associated with the type of the target application, where the type of the target application may be referred to as the achievable function of the target application. Exemplarily, the type of the target application is the address book type, the name and phone number in the voice information are the first key information associated with the address book type, the target application is the purchase type, and the buyer, brand, product in the voice information , product model, and product quantity are the first key information associated with the purchase type.
在一种示例中,如果接收第一输入时,电子设备的前端界面是应用标识的显示界面,那么直接确定第一输入针对的应用标识对应的应用程序为目标应用程序。In an example, if the front-end interface of the electronic device is the display interface of the application identifier when the first input is received, the application program corresponding to the application identifier targeted by the first input is directly determined as the target application program.
在另一种示例中,如果接收第一输入时,电子设备的前端界面是语音聊天界面,那么在步骤102之前,本申请实施例提供的语音识别的方法还可以包括确定目标应用程序的步骤102a或步骤102b。In another example, if the front-end interface of the electronic device is a voice chat interface when the first input is received, before step 102, the voice recognition method provided by this embodiment of the present application may further include step 102a of determining a target application program or step 102b.
步骤102a:在检测到进行语音通话的情况下,语音识别的装置将第一输入的第一参数关联的应用程序,确定为目标应用程序。Step 102a: In the case of detecting a voice call, the apparatus for voice recognition determines the application associated with the first inputted first parameter as the target application.
步骤102b:在检测到语音通话结束的情况下,语音识别的装置将第一输入的第二参数关联的应用程序,确定为目标应用程序。Step 102b: In the case of detecting the end of the voice call, the apparatus for voice recognition determines the application associated with the first input second parameter as the target application.
示例性的,在语音通话的不同时刻,上述第一输入对应不同的目标应用程序。例如,在语音通话过程中,上述的目标应用程序为第一输入对应的第一应用程序,在语音通话结束后,上述的目标应用程序为第一输入对应的第二应用程序。Exemplarily, at different moments of the voice call, the above-mentioned first input corresponds to different target applications. For example, during a voice call, the above-mentioned target application is the first application corresponding to the first input, and after the voice call ends, the above-mentioned target application is the second application corresponding to the first input.
需要说明的是,电子设备显示不同界面的情况下,相同的屏幕手势可能会执行不同的操作,如,在电子设备的常规的初始界面,输入三指下滑手势,则启动相机应用程序,在电子设备执行语音识别的方法对应的显示界面,输入三指下滑手势,则启动备忘录应用程序。为了区别电子设备的常规的初始界面,与语音通话结束的情况下的初始界面,语音识别装置可以设置在检测到语音通话结束的情况下,语音通话的结束时刻与当前时刻相比的时间差属于预置时间段,语音识别的装置才能确定第一输入的第二输入参数关联的应用程序为目标应用程序。It should be noted that when the electronic device displays different interfaces, the same screen gesture may perform different operations. For example, in the conventional initial interface of the electronic device, input the three-finger swipe gesture to start the camera application. On the display interface corresponding to the method for performing speech recognition on the device, inputting a three-finger sliding gesture will start the memo application. In order to distinguish the conventional initial interface of the electronic device from the initial interface in the case of the end of the voice call, the voice recognition device may be set in the case where the end of the voice call is detected, and the time difference between the end time of the voice call and the current time belongs to the preset time. After a set period of time, the device for speech recognition can determine that the application program associated with the second input parameter of the first input is the target application program.
示例性的,以第一输入为两指横向滑动为例,在第一输入的输入时刻检测到进行语音通话,则确认目标应用程序为翻译软件程序,以便于实现语音信息的快速翻译;反之,在第一输入的输入时刻检测到语音通话结束,则确认目标应用程序为备忘录软件程序,以便于实现语音信息的记录与保存。Exemplarily, taking the first input as two-finger lateral sliding as an example, when a voice call is detected at the moment of input of the first input, it is confirmed that the target application is a translation software program, so as to facilitate the rapid translation of voice information; otherwise, When the end of the voice call is detected at the input moment of the first input, it is confirmed that the target application program is a memo software program, so as to realize the recording and saving of the voice information.
如此,如果第一输入的内容相同,但是第一输入的输入时间不同,则确定不同的目标应用程序,即一个第一输入对应多个响应结果,以使得较少的第一输入的输入类型,能够实现更多的响应结果。In this way, if the content of the first input is the same, but the input time of the first input is different, different target applications are determined, that is, one first input corresponds to multiple response results, so that there are fewer input types of the first input, More responsive results can be achieved.
可选地,在确定第一输入对应的目标应用程序后,语音识别装置响应第一输入,启动目标应用程序。在检测到进行语音通话的情况下,挂起语音通话程序,将目标应应用程序切换到前台运行。在检测到语音通话结束的情况下,结束语音通话程序,将目标应应用程序切换到前台运行。Optionally, after determining the target application corresponding to the first input, the voice recognition apparatus starts the target application in response to the first input. When a voice call is detected, hang up the voice call program and switch the target application to the foreground to run. In the case of detecting the end of the voice call, end the voice call program and switch the target application to the foreground to run.
可选地,语音识别装置接收到第一输入后,先确定与第一输入对应的目标应用程序,然后确定语音信息中包含的具体语音内容,然后可以在启动目标应用程序的同时启动新建界面,还可以在启动目标应用程序之后,响应于用户输入启动新建界面。Optionally, after receiving the first input, the voice recognition device first determines the target application program corresponding to the first input, then determines the specific voice content contained in the voice information, and then starts the new interface while starting the target application program, A new interface may also be launched in response to user input after launching the target application.
进一步可选地,语音识别装置可以在目标应用程序的新建界面,显示语音信息中的第一关键信息。Further optionally, the voice recognition apparatus may display the first key information in the voice information on the newly created interface of the target application.
可选地,在目标应用程序显示第一关键信息之后,语音识别装置对第一关键信息进行以下至少一项操作:保存、跳转至播号页面,跳转至短信发送页面、再次编辑。需要说明的是,对第一关键信息进行的操作,是通过目标应用程序可实现的。Optionally, after the target application program displays the first key information, the voice recognition device performs at least one of the following operations on the first key information: saving, jumping to the broadcast number page, jumping to the short message sending page, and editing again. It should be noted that the operation on the first key information can be realized by the target application.
在本申请实施例提供的语音识别的方法中,在获取到语音信息的情况下,当接收到第一输入后,便可响应该第一输入,并通过目标应用程序显示语音信息中的第一关键信息,其中,第一关键信息与目标应用程序的类型相关联。如此,直接显示从语音信息中提取出与该目标应用程序相关联的第一关键信息,不仅提高了关键信息的提取效率,而且避免了对语音信息的无效识别,还提升了电子设备的人机交互性能。In the voice recognition method provided by the embodiment of the present application, in the case of acquiring voice information, after receiving the first input, the first input can be responded to, and the first input in the voice information can be displayed through the target application program. Key information, wherein the first key information is associated with the type of the target application. In this way, the first key information associated with the target application is directly displayed and extracted from the voice information, which not only improves the extraction efficiency of the key information, but also avoids invalid recognition of the voice information, and also improves the human-machine performance of the electronic device. Interactive performance.
可选地,如图7所示,在本申请实施例中,步骤102可以通过步骤701至步骤703实现。Optionally, as shown in FIG. 7 , in this embodiment of the present application, step 102 may be implemented through steps 701 to 703 .
步骤701:语音识别的装置响应于第一输入,启动目标应用程序。Step 701: The apparatus for speech recognition starts the target application in response to the first input.
在本申请实施例中,根据接收第一输入时,电子设备的前端界面是应用标识的显示界面,还是语音聊天界面,确定目标应用程序,然后启动目标应用程序。需要说明的是,为减少操作步骤,在启动目标应用程序的同时,可以直接启动新建信息界面,以方便显示第一关键信息。In the embodiment of the present application, according to whether the front-end interface of the electronic device is the display interface of the application identification or the voice chat interface when the first input is received, the target application is determined, and then the target application is started. It should be noted that, in order to reduce the operation steps, when the target application program is started, the newly created information interface can be directly started, so as to conveniently display the first key information.
步骤702:语音识别的装置获取语音信息中目标应用程序对应的第一关键信息。Step 702: The apparatus for speech recognition acquires the first key information corresponding to the target application in the speech information.
在本申请实施例中,步骤702可以通过步骤702a和步骤702b实现。In this embodiment of the present application, step 702 may be implemented by step 702a and step 702b.
步骤702a:语音识别的装置提取目标应用程序中新建信息界面包括的关键字段。Step 702a: The apparatus for speech recognition extracts the key fields included in the newly created information interface in the target application program.
步骤702b:语音识别的装置根据上述关键字段,以及该关键字段的规则匹配方式,确定第一文本信息中每个关键字段对应的第一关键信息。Step 702b: The apparatus for speech recognition determines the first key information corresponding to each key field in the first text information according to the above key field and the rule matching method of the key field.
在本申请实施例中,提取目标应用程序中新建信息界面包括的关键字段,可以根据关键字段采用模板、词表、规则匹配方式,对第一文本信息中第一关键信息进行信息提取。In the embodiment of the present application, to extract the key fields included in the newly created information interface in the target application program, the first key information in the first text information can be extracted according to the key fields by using a template, a vocabulary, and a rule matching method.
示例性的,如图8所示,假设目标应用程序为通讯录,通讯录的关键字段包括姓名、电话号码和备注信息,其中姓名为李四、电话号码为135xxxxxxxx、地址为书院街1号。根据电话号码为7-8位固定电话号码,或11位手机号码,设置规则匹配方式,提取关键字 段电话号码对应第一关键信息。Exemplarily, as shown in FIG. 8 , it is assumed that the target application is an address book, and the key fields of the address book include name, phone number, and remarks, where the name is Li Si, the phone number is 135xxxxxxxx, and the address is No. 1 Shuyuan Street. . According to the phone number being a 7-8-digit fixed phone number, or an 11-digit mobile phone number, set the rule matching method, and extract the key field phone number corresponding to the first key information.
需要说明的是,在本申请实施例中,在上述目标应用程序的界面中显示上述第一关键信息采用的显示方式,包括但不限于加粗显示方式、倾斜显示方式、高亮显示方式。It should be noted that, in this embodiment of the present application, the display method used to display the above-mentioned first key information in the interface of the above-mentioned target application program includes but is not limited to a bold display method, an oblique display method, and a highlight display method.
步骤703:语音识别的装置在目标应用程序中,显示第一关键信息。Step 703: The device for speech recognition displays the first key information in the target application.
在本申请实施例中,根据新建信息界面中的关键字段,识别语音信息中与关键字段对应的第一关键信息。在本申请实施例中,在目标应用程序的新建信息界面显示第一关键信息。In the embodiment of the present application, the first key information corresponding to the key field in the voice information is identified according to the key field in the newly created information interface. In this embodiment of the present application, the first key information is displayed on the newly created information interface of the target application.
如此,通过直接启动目标应用程序界面,以获取与目标应用程序的新建信息界面中关键字段对应的第一关键信息,以实现实时语音识别的目的。In this way, by directly starting the target application program interface, the first key information corresponding to the key fields in the newly created information interface of the target application program is obtained, so as to realize the purpose of real-time speech recognition.
进一步可选地,在本申请实施例中,获取第一关键信息具体包括获取语音信息,以及识别语音信息中与目标程序对应的第一关键信息。以此对每个具体步骤分别予以说明。Further optionally, in this embodiment of the present application, acquiring the first key information specifically includes acquiring voice information, and identifying the first key information corresponding to the target program in the voice information. In this way, each specific step will be explained separately.
示例一:获取语音信息Example 1: Get voice information
进一步可选地,在本申请实施例中,为了减少需要进行语音识别的语音信息的数据量,降低冗余信息所占比例,以提高语音识别效率。在步骤702中获取语音信息,可以通过步骤702c或步骤702d实现。Further optionally, in this embodiment of the present application, in order to reduce the data amount of speech information that needs to be recognized for speech recognition, the proportion of redundant information is reduced, so as to improve speech recognition efficiency. Acquiring the voice information in step 702 can be achieved through step 702c or step 702d.
步骤702c:语音识别的装置在检测到进行语音通话的情况下,确定语音信息为第一语音信息;其中,上述第一语音信息为与上述第一输入的输入时间之前的预置时间段对应的语音信息。Step 702c: The voice recognition device determines that the voice information is the first voice information when it detects that a voice call is being made; wherein, the first voice information corresponds to a preset time period before the input time of the first input. voice message.
步骤702d:语音识别的装置在检测到语音通话结束的情况下,则确定语音信息为第二语音信息;其中,上述第二语音信息为语音通话过程中录制的全部语音信息。Step 702d: When detecting that the voice call ends, the voice recognition device determines that the voice information is the second voice information; wherein the second voice information is all the voice information recorded during the voice call.
示例性的,在检测到进行语音通话的情况下,如果用户听到与该用户通话的用户提及一些关键内容(如,提及电话号码、地址、约定时间等内容),则用户输入第一输入。同时,由于上述第一输入的输入时间之前的预置时间段对应的语音信息中通常包括用户需要记录的信息,因此,以第一输入的输入时间为参考时间节点,可以获取较少数据量的语音信息,降低冗余信息所占比例,以提高语音识别效率。Exemplarily, when a voice call is detected, if the user hears some key content (such as mentioning a phone number, address, appointment time, etc.) mentioned by the user talking to the user, the user enters the first enter. At the same time, since the voice information corresponding to the preset time period before the input time of the first input usually includes the information that the user needs to record, therefore, taking the input time of the first input as the reference time node, it is possible to obtain a smaller amount of data. Voice information, reduce the proportion of redundant information to improve the efficiency of voice recognition.
示例性的,在检测到语音通话结束的情况下,那么对于用户而言,语音信息中的大部分内容都是第一关键信息,或者语音信息中的第一关键信息在整个语音通话过程中较为分散,因此,语音信息是从提取语音通话过程中录制的全部语音信息。Exemplarily, in the case where the end of the voice call is detected, for the user, most of the content in the voice information is the first key information, or the first key information in the voice information is relatively low during the entire voice call process. Decentralized, therefore, the voice information is extracted from the entire voice information recorded during the voice call.
进一步可选地,在本申请实施例中,在启动目标应用程序后、关闭目标应用程序之前,语音识别的装置可以通过实时更新语音信息的方式,来避免遗漏语音通过中需要记录的内 容。与步骤702c类似,在步骤702中获取语音信息,还可以包括步骤702e。Further optionally, in this embodiment of the present application, after starting the target application program and before closing the target application program, the device for speech recognition can avoid missing the content that needs to be recorded in the voice passage by updating the voice information in real time. Similar to step 702c, acquiring voice information in step 702 may further include step 702e.
步骤702e:在检测到进行语音通话的情况下,语音识别的装置按照预设间隔,提取更新语音信息。Step 702e: In the case of detecting a voice call, the voice recognition device extracts and updates voice information according to preset intervals.
其中,上述更新语音信息包括:与预设间隔对应的语音通话录音的语音信息。Wherein, the above-mentioned updated voice information includes: voice information of the voice call recording corresponding to the preset interval.
示例性的,语音识别的装置在提取到更新语音信息后,便会基于目标应用程序从该更新语音信息中提取出第二关键信息,然后,将目标应用程序的界面中显示的第一关键信息以及第二关键信息。Exemplarily, after extracting the updated voice information, the voice recognition device will extract the second key information from the updated voice information based on the target application, and then convert the first key information displayed in the interface of the target application. and second key information.
举例说明,甲某与乙某关于购买电脑进行对话,甲某告知乙某需要购买10台型号为1566的X品牌电脑,乙某通过电子设备识别两者的通话内容,从而在目标应用程序的应用界面中显示关键通话内容,如,“乙某需要购买10台型号为1566的X品牌电脑”(即上述第一关键信息)。接着,乙某在与甲某通话过程中再次确认订购商品,其后甲某又告知乙某需要购买5台型号为1588的Y品牌电脑。此时,电子设备对两者的通话内容再次进行识别,得到新的关键通话内容,如,“5台型号为1588的Y品牌电脑”(即上述第二关键信息),并基于该新的关键通话内容,对目标应用程序的应用界面中显示关键通话内容进行更新。For example, A and B have a conversation about purchasing a computer. A informs B that he needs to buy 10 X brand computers with a model of 1566. The key call content is displayed on the interface, for example, "B needs to buy 10 X-brand computers with model number 1566" (ie, the above-mentioned first key information). Then, B reconfirmed the order of goods during the call with A, and then A told B that he needed to buy 5 Y-brand computers with model number 1588. At this time, the electronic device recognizes the content of the two calls again, and obtains new key content of the call, such as "five Y-brand computers with model number 1588" (that is, the second key information above), and based on the new key Call content, update the key call content displayed in the application interface of the target application.
如此,随着语音通话的进行,不断地生成更新语音信息,以使得更新语音信息通过目标应用程序能够实时的提取并显示第一关键信息以及第二关键信息。In this way, with the progress of the voice call, the updated voice information is continuously generated, so that the updated voice information can be extracted and displayed in real time by the target application program of the first key information and the second key information.
进一步可选地,在本申请实施例中,在步骤702中获取语音信息还可以包括:语音识别的装置根据预置声纹识别算法,滤除上述语音信息中干扰信息,重新生成语音信息。应注意的是,在语音聊天过程中,所处的环境中可能包括鸣笛声音、动物吼叫声音、下雨声音、风鸣声音等等,如此,滤除干扰信息能够提高语音识别的准确率。Further optionally, in this embodiment of the present application, acquiring the voice information in step 702 may further include: the voice recognition device filters out interference information in the voice information according to a preset voiceprint recognition algorithm, and regenerates the voice information. It should be noted that in the process of voice chat, the environment may include whistle sound, animal roaring sound, rain sound, wind sound, etc. In this way, filtering out interfering information can improve the accuracy of speech recognition.
示例二:识别语音信息中与目标程序对应的第一关键信息Example 2: Identify the first key information corresponding to the target program in the voice information
进一步可选地,如图9所示,在本申请实施例中,在步骤702中识别语音信息所包含的文本信息,具体包括步骤901至步骤903。Further optionally, as shown in FIG. 9 , in this embodiment of the present application, the text information included in the voice information is recognized in step 702 , which specifically includes steps 901 to 903 .
步骤901:语音识别的装置将上述语音信息转换为目标文本信息,并从该目标文本信息中,提取与目标应用程序对应的第一文本信息。Step 901: The apparatus for speech recognition converts the above-mentioned speech information into target text information, and extracts first text information corresponding to the target application program from the target text information.
步骤902:语音识别的装置获取至少一个类型信息。Step 902: The apparatus for speech recognition acquires at least one type of information.
步骤903:语音识别的装置根据上述至少一个类型信息,删除第一文本信息中类型与预设类型匹配的文本信息,以得到上述第一关键信息。Step 903: The apparatus for speech recognition deletes the text information whose type matches the preset type in the first text information according to the above at least one type of information, so as to obtain the above-mentioned first key information.
在本申请实施例中,上述第一文本信息中包括第一关键信息。In this embodiment of the present application, the above-mentioned first text information includes first key information.
在本申请实施例中,通过对语音信息进行音频特征提取,然后将音频特征通过声学模型和语言模型的打分转换为文本信息,以实现将语音信息转换为目标文本信息。In the embodiment of the present application, the voice information is converted into target text information by performing audio feature extraction on the voice information, and then converting the audio features into text information through scoring by an acoustic model and a language model.
在本申请实施例中,上述至少一个类型信息用于指示第一文本信息中包含的信息的类型。示例性的,上述文本信息类型可以包括非正常叠词、口头语词汇、时间词汇、地点词汇,例如“在黄河大街的某银行南侧侧的某酒店开年会,这个这个3月2日16点开始21点结束,您看这样的时间安排可以吗”,其中:“南侧侧”属于非正常叠词,“这个这个”属于口头语,“,3月2日、16点、21点”属于时间词汇,“黄河大街、某银行、某酒店”属于地点词汇。In this embodiment of the present application, the above-mentioned at least one type of information is used to indicate the type of information included in the first text information. Exemplarily, the above-mentioned text information types may include abnormal overlapping words, colloquial words, time words, and place words, such as "An annual meeting will be held at a hotel on the south side of a bank on Yellow River Street, this is at 16:00 on March 2. Start and end at 21:00, do you think such a time arrangement is okay?", among which: "south side" is an abnormal reduplication, "this this" is a spoken word, ", March 2, 16:00, 21:00" is a time Vocabulary, "Yellow River Street, a certain bank, a certain hotel" is a place vocabulary.
需要说明的是,在上述示例中非正常叠词、口头语词汇和地方方言词汇,阻碍得到第一关键信息,因此,可以将叠词、口头语词汇和地方方言词汇等类型确定为预设类型。语音识别的装置删除上述第一文本信息中类型与预设类型匹配的文本信息,是指删除上述示例中“侧、这个这个”,得到第一关键信息“在黄河大街的某银行南侧的某酒店开年会,3月2日16点开始21点结束,您看这样的时间安排可以吗”。It should be noted that in the above example, abnormal reduplicated words, spoken words and local dialect words hinder obtaining the first key information. Therefore, the types of reduplicated words, spoken words and local dialect words can be determined as preset types. The device for speech recognition deletes the text information whose type matches the preset type in the above-mentioned first text information, which means to delete "side, this and this" in the above example, and obtain the first key information "in a certain bank on the south side of Yellow River Street. The hotel has an annual meeting, which starts at 16:00 on March 2 and ends at 21:00. Do you think this timetable is ok?"
示例性的,上述文本优化处理方法中的口语词汇优化过程包括:Exemplarily, the spoken word optimization process in the above text optimization processing method includes:
方式1:通过预设口语词列表,对第一文本信息进行口语文本分析。其中,预设口语词列表可以通过用户通过语音输入方式录制一段口语语音,将口语语音进行文字识别,得到口语文本信息,显示口语文本信息,编辑口语文本信息保留自己的口头禅对应的口语词,最后合并口语词得到口语词列表。Mode 1: Perform oral text analysis on the first text information by using a preset spoken word list. Among them, the preset spoken word list can record a piece of spoken speech by the user through voice input, perform text recognition on the spoken speech, obtain spoken text information, display the spoken text information, edit the spoken text information to retain the spoken words corresponding to one's own mantra, and finally Combine spoken words to get a list of spoken words.
方式2:通过添加语言模型(专门训练一个常用口语词的语言模型),对第一文本信息进行口语文本分析。Mode 2: Perform oral text analysis on the first text information by adding a language model (specially training a language model for commonly used spoken words).
具体的,对于简单的口语词,如连续出现的“这个这个”、“这个(停顿较长时间)”,进行识别出与预设类型匹配的口语词。最后针对识别出的口语词,可以高亮或者突出的方式显示给用户,以便于用户选择是否删除这些口语词以便最终的语音输入的输出,或者最后针对识别出的口语词,通过设置语音输入法,进行一键删除,或者自动删除识别出的口语词。Specifically, for simple spoken words, such as "this this" and "this (pause for a long time)" that appear continuously, identify the spoken words that match the preset type. Finally, the recognized spoken words can be displayed to the user in a highlighted or highlighted manner, so that the user can choose whether to delete these spoken words for the final output of the voice input, or finally, for the recognized spoken words, by setting the voice input method , perform one-click deletion, or automatically delete the recognized spoken words.
如此,通过第一文本信息所包含的类型与预设类型进行匹配,删除第一文本信息的口语化词汇,能够提供第一关键信息的可读性。In this way, by matching the type contained in the first text information with the preset type, and deleting the colloquial vocabulary of the first text information, the readability of the first key information can be improved.
可选地,如图10所示,在本申请实施例中,在步骤102之后,本申请实施例提供的语音识别的方法还可以包括步骤1001和步骤1002。Optionally, as shown in FIG. 10 , in this embodiment of the present application, after step 102 , the speech recognition method provided in this embodiment of the present application may further include steps 1001 and 1002 .
步骤1001:语音识别的装置接收用户的第二输入。Step 1001: The apparatus for speech recognition receives a second input from the user.
步骤1002:语音识别的装置响应于上述第二输入,采用与上述第二输入对应的编辑处理方式,处理上述第一关键信息。Step 1002: In response to the second input, the apparatus for speech recognition uses an editing processing method corresponding to the second input to process the first key information.
在本申请实施例中,上述第二输入为用户对上述第一关键信息的编辑输入。需要说明的是,如图11所示,假设目标应用程序为通讯录,与上述第三输入对应的编辑处理方式为:在点击选择待修改信息后,待修改信息出现在编辑栏中,用户对待修改信息进行删减、拼接或重新录入。对于第一关键信息的原信息内容,可以随着用户的修改实时更新,也可以在用户修改完成后再替换。In the embodiment of the present application, the above-mentioned second input is an editing input of the above-mentioned first key information by the user. It should be noted that, as shown in FIG. 11 , assuming that the target application is the address book, the editing processing method corresponding to the above third input is: after clicking to select the information to be modified, the information to be modified appears in the editing column, and the user treats Modify information for deletion, splicing or re-entry. The original information content of the first key information may be updated in real time with the user's modification, or may be replaced after the user's modification is completed.
示例性的,如果待修改信息为多次重新的重复信息,则用户对待修改信息进行删减、拼接或重新录入之后,对所有重复部分统一更正,以实现信息的快速整合。Exemplarily, if the information to be modified is repeated repeated information, after the user deletes, splices or re-enters the information to be modified, all the repeated parts are corrected uniformly to realize rapid information integration.
可选地,在本申请实施例中,在步骤102之后,本申请实施例提供的语音识别的方法还可以包括:生成临时缓存控件,上述临时缓存控件用于缓存上述第一关键信息。Optionally, in the embodiment of the present application, after step 102, the speech recognition method provided in the embodiment of the present application may further include: generating a temporary cache control, where the temporary cache control is used to cache the first key information.
示例性的,如果目标应用程序为备忘录,第一关键信息为电话号码,通过临时缓存控件,可以直接拨打该电话号码,使得显示的第一关键信息能够直接应用,避免用户选择复制粘贴该电话号码才能拨打电话的繁琐过程。Exemplarily, if the target application is a memo and the first key information is a phone number, the phone number can be directly dialed through the temporary cache control, so that the displayed first key information can be directly applied, avoiding the user's choice to copy and paste the phone number. The tedious process of making phone calls.
需要说明的是,本申请实施例提供的语音识别的方法,执行主体可以为语音识别的装置,或者该语音识别的装置中的用于执行语音识别的方法的控制模块。本申请实施例中以语音识别的装置执行语音识别的方法为例,说明本申请实施例提供的语音识别的装置。但实际应用中上述语音识别的方法的执行主体还可以是其他可以执行该语音识别的方法的设备或装置,本申请实施例对此不作限定。It should be noted that, in the speech recognition method provided by the embodiments of the present application, the execution subject may be a speech recognition apparatus, or a control module in the speech recognition apparatus for performing the speech recognition method. In the embodiments of the present application, a method for performing speech recognition by a speech recognition device is used as an example to describe the speech recognition device provided by the embodiments of the present application. However, in practical applications, the execution subject of the above speech recognition method may also be other devices or apparatuses that can perform the speech recognition method, which is not limited in this embodiment of the present application.
如图12所示,本申请实施例提供一种语音识别的装置。该语音识别的装置包括:第一接收模块1201和第一显示模块1202;As shown in FIG. 12 , an embodiment of the present application provides a device for speech recognition. The device for speech recognition includes: a first receiving module 1201 and a first display module 1202;
上述第一接收模块1201,用于在获取到语音信息的情况下,接收用户的第一输入;The above-mentioned first receiving module 1201 is configured to receive the first input of the user when the voice information is obtained;
上述第一显示模块1202,用于响应于上述第一接收模块1201接收的第一输入,通过目标应用程序显示上述语音信息中的第一关键信息,上述第一关键信息与上述目标应用程序的类型相关联。The above-mentioned first display module 1202 is used to display the first key information in the above-mentioned voice information through the target application program in response to the first input received by the above-mentioned first receiving module 1201, the above-mentioned first key information and the type of the above-mentioned target application program Associated.
可选地,上述第一接收模块1201,用于:在进行语音通话的过程中,对上述语音通话的通话内容进行语音录制,获取到上述语音信息;在上述语音通话结束后或上述语音通话过程中,接收用户的第一输入。Optionally, the above-mentioned first receiving module 1201 is configured to: in the process of conducting a voice call, perform voice recording on the call content of the above-mentioned voice call, and obtain the above-mentioned voice information; after the above-mentioned voice call ends or during the above-mentioned voice call , the user's first input is received.
可选地,如图13所示,上述装置还包括:确定模块1203;Optionally, as shown in FIG. 13 , the above apparatus further includes: a determining module 1203;
上述确定模块1203,用于上述第一显示模块1202响应于上述第一输入,通过目标应 用程序显示上述语音信息中的第一关键信息之前,在检测到进行语音通话的情况下,将上述第一输入的第一参数关联的应用程序,确定为上述目标应用程序;The above-mentioned determining module 1203 is used for the above-mentioned first display module 1202, in response to the above-mentioned first input, before displaying the first key information in the above-mentioned voice information through the target application, in the case of detecting that a voice call is being made, the above-mentioned first key information is displayed. The application program associated with the input first parameter is determined as the above-mentioned target application program;
上述确定模块1203,还用于上述第一显示模块1202响应于上述第一输入,通过目标应用程序显示上述语音信息中的第一关键信息之前,在检测到语音通话结束的情况下,将上述第一输入的第二参数关联的应用程序,确定为上述目标应用程序。The above-mentioned determining module 1203 is also used for the above-mentioned first display module 1202, in response to the above-mentioned first input, before displaying the first key information in the above-mentioned voice information through the target application, in the case that the end of the voice call is detected, the above-mentioned first key information is displayed. An application program associated with the input second parameter is determined as the above-mentioned target application program.
可选地,上述第一显示模块1202,用于:响应于上述第一输入,启动目标应用程序;获取上述语音信息中上述目标应用程序对应的第一关键信息;在上述目标应用程序中,显示上述第一关键信息。Optionally, the above-mentioned first display module 1202 is configured to: start the target application in response to the above-mentioned first input; obtain the first key information corresponding to the above-mentioned target application in the above-mentioned voice information; in the above-mentioned target application, display The first key information above.
可选地,上述第一显示模块1202,具体用于:将上述语音信息转换为目标文本信息,并从上述目标文本信息中,提取与上述目标应用程序对应的第一文本信息,上述第一文本信息中包括上述第一关键信息;获取至少一个类型信息,上述至少一个类型信息用于指示上述第一文本信息中包含的信息的类型;根据上述至少一个类型信息,删除上述第一文本信息中类型与预设类型匹配的文本信息,以得到上述第一关键信息。Optionally, the above-mentioned first display module 1202 is specifically configured to: convert the above-mentioned voice information into target text information, and extract the first text information corresponding to the above-mentioned target application program from the above-mentioned target text information, the above-mentioned first text The information includes the above-mentioned first key information; at least one type of information is obtained, and the above-mentioned at least one type of information is used to indicate the type of information contained in the above-mentioned first text information; according to the above-mentioned at least one type of information, delete the type in the above-mentioned first text information Text information matching the preset type to obtain the above-mentioned first key information.
可选地,如图13所示,上述装置还包括:第二接收模块1204和第一处理模块1205;Optionally, as shown in FIG. 13 , the above apparatus further includes: a second receiving module 1204 and a first processing module 1205;
上述第二接收模块1204,用于上述第一显示模块1202通过目标应用程序显示上述语音信息中的第一关键信息之后,接收用户的第二输入,上述第二输入为用户对上述第一关键信息的编辑输入;The above-mentioned second receiving module 1204 is used for receiving the user's second input after the above-mentioned first display module 1202 displays the first key information in the above-mentioned voice information through the target application program, and the above-mentioned second input is the user's response to the above-mentioned first key information. edit input;
上述第一处理模块1205,用于响应于上述第二接收模块1204接收的第二输入,采用与上述第二输入对应的编辑处理方式,处理上述第一关键信息。The above-mentioned first processing module 1205 is configured to, in response to the second input received by the above-mentioned second receiving module 1204, use an editing processing method corresponding to the above-mentioned second input to process the above-mentioned first key information.
在本申请实施例提供的语音识别的装置中,在获取到语音信息的情况下,当接收到第一输入后,便可响应该第一输入,并通过目标应用程序显示语音信息中的第一关键信息,其中,第一关键信息与目标应用程序的类型相关联。如此,直接显示从语音信息中提取出与该目标应用程序相关联的第一关键信息,不仅提高了关键信息的提取效率,而且避免了对语音信息的无效识别,还提升了电子设备的人机交互性能。In the device for speech recognition provided by the embodiment of the present application, in the case of acquiring the voice information, after receiving the first input, the first input can be responded to, and the first input in the voice information can be displayed through the target application program Key information, wherein the first key information is associated with the type of the target application. In this way, the first key information associated with the target application is directly displayed and extracted from the voice information, which not only improves the extraction efficiency of the key information, but also avoids invalid recognition of the voice information, and also improves the human-machine performance of the electronic device. Interactive performance.
本申请实施例中的语音识别的装置可以是装置,也可以是终端中的部件、集成电路、或芯片。该装置可以是移动电子设备,也可以为非移动电子设备。示例性的,移动电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、可穿戴设备、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本或者个人数字助理(personal digital assistant,PDA)等,非移动电子设备可以为服务器、网络附属存储器(network attached storage,NAS)、个人计算机(personal computer,PC)、电视机(television,TV)、柜 员机或者自助机等,本申请实施例不作具体限定。The apparatus for speech recognition in this embodiment of the present application may be an apparatus, or may be a component, an integrated circuit, or a chip in a terminal. The apparatus may be a mobile electronic device or a non-mobile electronic device. Exemplarily, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palmtop computer, an in-vehicle electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook, or a personal digital assistant (personal digital assistant). assistant, PDA), etc., non-mobile electronic devices can be servers, network attached storage (NAS), personal computer (personal computer, PC), television (television, TV), teller machine or self-service machine, etc., this application Examples are not specifically limited.
本申请实施例中的语音识别的装置可以为具有操作系统的装置。该操作系统可以为安卓(Android)操作系统,可以为IOS操作系统,还可以为其他可能的操作系统,本申请实施例不作具体限定。The apparatus for speech recognition in this embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android (Android) operating system, an IOS operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.
本申请实施例提供的语音识别的装置能够实现上述方法实施例实现的各个过程,为避免重复,这里不再赘述。The apparatus for speech recognition provided in this embodiment of the present application can implement each process implemented by the foregoing method embodiment, and to avoid repetition, details are not described herein again.
本实施例中各种实现方式具有的有益效果具体可以参见上述方法实施例中相应实现方式所具有的有益效果,为避免重复,此处不再赘述。For the beneficial effects of the various implementations in this embodiment, reference may be made to the beneficial effects of the corresponding implementations in the foregoing method embodiments, which are not repeated here to avoid repetition.
可选的,如图14所示,本申请实施例还提供一种电子设备1400,包括处理器1401,存储器1402,存储在存储器1402上并可在处理器1401上运行的程序或指令,该程序或指令被处理器1401执行时实现上述语音识别的方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。Optionally, as shown in FIG. 14, an embodiment of the present application further provides an electronic device 1400, including a processor 1401, a memory 1402, and a program or instruction stored in the memory 1402 and executable on the processor 1401, the program Or, when the instruction is executed by the processor 1401, each process of the above-mentioned speech recognition method embodiment can be realized, and the same technical effect can be achieved. In order to avoid repetition, details are not repeated here.
需要说明的是,本申请实施例中的电子设备包括上述的移动电子设备和非移动电子设备。It should be noted that the electronic devices in the embodiments of the present application include the above-mentioned mobile electronic devices and non-mobile electronic devices.
图15为实现本申请实施例的一种电子设备的硬件结构示意图。FIG. 15 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
该电子设备1500包括但不限于:射频单元1501、网络模块1502、音频输出单元1503、输入单元1504、传感器1505、显示单元1506、用户输入单元1507、接口单元1508、存储器1509、以及处理器1510等部件。The electronic device 1500 includes but is not limited to: a radio frequency unit 1501, a network module 1502, an audio output unit 1503, an input unit 1504, a sensor 1505, a display unit 1506, a user input unit 1507, an interface unit 1508, a memory 1509, and a processor 1510, etc. part.
本领域技术人员可以理解,电子设备1500还可以包括给各个部件供电的电源(比如电池),电源可以通过电源管理系统与处理器1510逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图15中示出的电子设备结构并不构成对电子设备的限定,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。Those skilled in the art can understand that the electronic device 1500 may also include a power supply (such as a battery) for supplying power to various components, and the power supply may be logically connected to the processor 1510 through a power management system, so as to manage charging, discharging, and power consumption through the power management system. consumption management and other functions. The structure of the electronic device shown in FIG. 15 does not constitute a limitation to the electronic device. The electronic device may include more or less components than the one shown, or combine some components, or arrange different components, which will not be repeated here. .
其中,上述用户输入单元1507,用于在获取到语音信息的情况下,接收用户的第一输入;Wherein, the above-mentioned user input unit 1507 is configured to receive the first input of the user when the voice information is acquired;
上述处理器1510,用于响应于上述第一输入,通过目标应用程序显示上述语音信息中的第一关键信息,上述第一关键信息与上述目标应用程序的类型相关联。The processor 1510 is configured to display, through the target application, the first key information in the voice information in response to the first input, where the first key information is associated with the type of the target application.
可选地,上述处理器1510,还用于进行语音通话的过程中,对上述语音通话的通话内容进行语音录制,获取到上述语音信息;Optionally, the above-mentioned processor 1510 is further configured to perform voice recording on the call content of the above-mentioned voice call in the process of conducting a voice call, and obtain the above-mentioned voice information;
可选地,上述用户输入单元1507,还用于在上述语音通话结束后或上述语音通话过程 中,接收用户的第一输入。Optionally, the above-mentioned user input unit 1507 is further configured to receive the first input of the user after the above-mentioned voice call ends or during the above-mentioned voice call.
可选地,上述处理器1510,还用于在检测到进行语音通话的情况下,将上述第一输入的第一参数关联的应用程序,确定为上述目标应用程序;在检测到语音通话结束的情况下,将上述第一输入的第二参数关联的应用程序,确定为上述目标应用程序。Optionally, the processor 1510 is further configured to determine the application associated with the first parameter of the first input as the target application when a voice call is detected; In this case, the application associated with the second parameter of the first input is determined as the target application.
可选地,上述处理器1510,还用于响应于上述第一输入,启动目标应用程序;获取上述语音信息中上述目标应用程序对应的第一关键信息;在上述目标应用程序中,显示上述第一关键信息。Optionally, the above-mentioned processor 1510 is further configured to start the target application in response to the above-mentioned first input; obtain the first key information corresponding to the above-mentioned target application in the above-mentioned voice information; in the above-mentioned target application, display the above-mentioned first key information. a key message.
可选地,上述处理器1510,还用于将上述语音信息转换为目标文本信息,并从上述目标文本信息中,提取与上述目标应用程序对应的第一文本信息,上述第一文本信息中包括上述第一关键信息;获取至少一个类型信息,上述至少一个类型信息用于指示上述第一文本信息中包含的信息的类型;根据上述至少一个类型信息,删除上述第一文本信息中类型与预设类型匹配的文本信息,以得到上述第一关键信息。Optionally, the above-mentioned processor 1510 is further configured to convert the above-mentioned voice information into target text information, and extract the first text information corresponding to the above-mentioned target application program from the above-mentioned target text information, and the above-mentioned first text information includes: The above-mentioned first key information; obtain at least one type of information, and the above-mentioned at least one type of information is used to indicate the type of information contained in the above-mentioned first text information; according to the above-mentioned at least one type of information, delete the type and preset in the above-mentioned first text information Type matching text information to obtain the above-mentioned first key information.
可选地,上述用户输入单元1507,还用于接收用户的第二输入,上述第三输入为用户对上述第一关键信息的编辑输入;Optionally, the above-mentioned user input unit 1507 is further configured to receive the second input of the user, and the above-mentioned third input is the editing input of the above-mentioned first key information by the user;
可选地,上述处理器1510,还用于响应于上述第二输入,采用与上述第三输入对应的编辑处理方式,处理上述第一关键信息。Optionally, the above-mentioned processor 1510 is further configured to, in response to the above-mentioned second input, use an editing processing manner corresponding to the above-mentioned third input to process the above-mentioned first key information.
在本申请实施例提供的电子设备中,在获取到语音信息的情况下,当接收到第一输入后,便可响应该第一输入,并通过目标应用程序显示语音信息中的第一关键信息,其中,第一关键信息与目标应用程序的类型相关联。如此,直接显示从语音信息中提取出与该目标应用程序相关联的第一关键信息,不仅提高了关键信息的提取效率,而且避免了对语音信息的无效识别,还提升了电子设备的人机交互性能。In the electronic device provided by the embodiment of the present application, in the case of acquiring the voice information, after receiving the first input, it can respond to the first input, and display the first key information in the voice information through the target application program , wherein the first key information is associated with the type of the target application. In this way, the first key information associated with the target application is directly displayed and extracted from the voice information, which not only improves the extraction efficiency of the key information, but also avoids invalid recognition of the voice information, and also improves the human-machine performance of the electronic device. Interactive performance.
本实施例中各种实现方式具有的有益效果具体可以参见上述方法实施例中相应实现方式所具有的有益效果,为避免重复,此处不再赘述。For the beneficial effects of the various implementations in this embodiment, reference may be made to the beneficial effects of the corresponding implementations in the foregoing method embodiments, which are not repeated here to avoid repetition.
应理解的是,本申请实施例中,输入单元1504可以包括图形处理器(graphics processing unit,GPU)15041和麦克风15042,图形处理器15041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元1506可包括显示面板15061,可以采用液晶显示器、有机发光二极管等形式来配置显示面板15061。用户输入单元1507包括触控面板15071以及其他输入设备15072。触控面板15071,也称为触摸屏。触控面板15071可包括触摸检测装置和触摸控制器两个部分。其他输入设备15072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹 球、鼠标、操作杆,在此不再赘述。存储器1509可用于存储软件程序以及各种数据,包括但不限于应用程序和操作系统。处理器1510可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器1510中。It should be understood that, in this embodiment of the present application, the input unit 1504 may include a graphics processing unit (graphics processing unit, GPU) 15041 and a microphone 15042. Such as camera) to obtain still pictures or video image data for processing. The display unit 1506 may include a display panel 15061, which may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1507 includes a touch panel 15071 and other input devices 15072 . The touch panel 15071 is also called a touch screen. The touch panel 15071 may include two parts, a touch detection device and a touch controller. Other input devices 15072 may include but are not limited to physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be repeated here. Memory 1509 may be used to store software programs as well as various data, including but not limited to application programs and operating systems. The processor 1510 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface, and application programs, and the like, and the modem processor mainly handles wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor 1510.
本申请实施例还提供一种可读存储介质,该可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现上述语音识别的方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。Embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored on the readable storage medium. When the program or instruction is executed by a processor, each process of the above-mentioned speech recognition method embodiment can be achieved, and the same can be achieved. In order to avoid repetition, the technical effect will not be repeated here.
其中,处理器为上述实施例中的电子设备中的处理器。可读存储介质,包括计算机可读存储介质,如计算机只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等。The processor is the processor in the electronic device in the above embodiment. A readable storage medium includes a computer-readable storage medium, such as a computer read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like.
本申请实施例另提供了一种芯片,该芯片包括处理器和通信接口,该通信接口和该处理器耦合,该处理器用于运行程序或指令,实现上述语音识别的方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。An embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the above-mentioned speech recognition method embodiment, And can achieve the same technical effect, in order to avoid repetition, it is not repeated here.
应理解,本申请实施例提到的芯片还可以称为系统级芯片、系统芯片、芯片系统或片上系统芯片等。It should be understood that the chip mentioned in the embodiments of the present application may also be referred to as a system-on-chip, a system-on-chip, a system-on-a-chip, or a system-on-a-chip, or the like.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in the reverse order depending on the functions involved. To perform functions, for example, the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to some examples may be combined in other examples.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器或者网络设备等)执行本申请各个实施例的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to enable a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods of the various embodiments of the present application.
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。The embodiments of the present application have been described above in conjunction with the accompanying drawings, but the present application is not limited to the above-mentioned specific embodiments, which are merely illustrative rather than restrictive. Under the inspiration of this application, without departing from the scope of protection of the purpose of this application and the claims, many forms can be made, which all fall within the protection of this application.

Claims (17)

  1. 一种语音识别的方法,所述方法包括:A method for speech recognition, the method comprising:
    在获取到语音信息的情况下,接收用户的第一输入;In the case of acquiring the voice information, receiving the first input of the user;
    响应于所述第一输入,通过目标应用程序显示所述语音信息中的第一关键信息,所述第一关键信息与所述目标应用程序的类型相关联。In response to the first input, the target application displays first key information in the voice information, the first key information being associated with the type of the target application.
  2. 根据权利要求1所述的方法,其中,所述在获取到语音信息的情况下,接收用户的第一输入,包括:The method according to claim 1, wherein, when the voice information is acquired, receiving the first input of the user comprises:
    在进行语音通话的过程中,对所述语音通话的通话内容进行语音录制,获取到所述语音信息;In the process of making a voice call, voice recording is performed on the call content of the voice call, and the voice information is obtained;
    在所述语音通话结束后或所述语音通话过程中,接收用户的第一输入。After the voice call ends or during the voice call, a first input from the user is received.
  3. 根据权利要求1或2所述的方法,其中,所述响应于所述第一输入,通过目标应用程序显示所述语音信息中的第一关键信息之前,所述方法还包括:The method according to claim 1 or 2, wherein, before the first key information in the voice information is displayed by the target application in response to the first input, the method further comprises:
    在检测到进行语音通话的情况下,将所述第一输入的第一参数关联的应用程序,确定为所述目标应用程序;In the case of detecting a voice call, the application program associated with the first parameter of the first input is determined as the target application program;
    在检测到语音通话结束的情况下,将所述第一输入的第二参数关联的应用程序,确定为所述目标应用程序。In the case that the end of the voice call is detected, the application program associated with the first input second parameter is determined as the target application program.
  4. 根据权利要求1所述的方法,其中,所述响应于所述第一输入,通过目标应用程序显示所述语音信息中的第一关键信息,包括:The method according to claim 1, wherein, in response to the first input, displaying the first key information in the voice information through a target application program comprises:
    响应于所述第一输入,启动目标应用程序,或将目标应用切换到前台;In response to the first input, start the target application, or switch the target application to the foreground;
    获取所述语音信息中所述目标应用程序对应的第一关键信息;Obtain the first key information corresponding to the target application in the voice information;
    在所述目标应用程序中,显示所述第一关键信息。In the target application, the first key information is displayed.
  5. 根据权利要求4所述的方法,其中,所述获取所述语音信息中所述目标应用程序对应的第一关键信息,包括:The method according to claim 4, wherein the acquiring the first key information corresponding to the target application in the voice information comprises:
    将所述语音信息转换为目标文本信息,并从所述目标文本信息中,提取与所述目标应用程序对应的第一文本信息,所述第一文本信息中包括所述第一关键信息;Converting the voice information into target text information, and extracting first text information corresponding to the target application program from the target text information, where the first text information includes the first key information;
    获取至少一个类型信息,所述至少一个类型信息用于指示所述第一文本信息中包含的信息的类型;acquiring at least one type of information, where the at least one type of information is used to indicate the type of information contained in the first text information;
    根据所述至少一个类型信息,删除所述第一文本信息中类型与预设类型匹配的文本信息,以得到所述第一关键信息。According to the at least one type of information, the text information whose type matches the preset type in the first text information is deleted to obtain the first key information.
  6. 根据权利要求1所述的方法,其中,所述通过目标应用程序显示所述语音信息中的第一关键信息之后,所述方法还包括:The method according to claim 1, wherein after displaying the first key information in the voice information through the target application, the method further comprises:
    接收用户的第二输入,所述第二输入为用户对所述第一关键信息的编辑输入;receiving a second input from the user, where the second input is an editing input of the first key information by the user;
    响应于所述第二输入,采用与所述第二输入对应的编辑处理方式,处理所述第一关键信息。In response to the second input, the first key information is processed using an editing processing manner corresponding to the second input.
  7. 一种语音识别的装置,所述装置包括:第一接收模块和第一显示模块;A device for speech recognition, the device comprising: a first receiving module and a first display module;
    所述第一接收模块,用于在获取到语音信息的情况下,接收用户的第一输入;The first receiving module is configured to receive the first input of the user when the voice information is acquired;
    所述第一显示模块,用于响应于所述第一接收模块接收的第一输入,通过目标应用程序显示所述语音信息中的第一关键信息,所述第一关键信息与所述目标应用程序的类型相关联。The first display module is configured to display the first key information in the voice information through a target application program in response to the first input received by the first receiving module, the first key information and the target application associated with the type of program.
  8. 根据权利要求7所述的装置,其中,所述第一接收模块,用于:The apparatus according to claim 7, wherein the first receiving module is configured to:
    在进行语音通话的过程中,对所述语音通话的通话内容进行语音录制,获取到所述语音信息;In the process of making a voice call, voice recording is performed on the call content of the voice call, and the voice information is obtained;
    在所述语音通话结束后或所述语音通话过程中,接收用户的第一输入。After the voice call ends or during the voice call, a first input from the user is received.
  9. 根据权利要求7或8所述的装置,其中,所述装置还包括:确定模块;The apparatus according to claim 7 or 8, wherein the apparatus further comprises: a determining module;
    所述确定模块,用于所述第一显示模块响应于所述第一输入,通过目标应用程序显示所述语音信息中的第一关键信息之前,在检测到进行语音通话的情况下,将所述第一输入的第一参数关联的应用程序,确定为所述目标应用程序;The determining module is used for the first display module, in response to the first input, before displaying the first key information in the voice information through the target application, in the case of detecting that a voice call is being made, display all the information. The application program associated with the first parameter of the first input is determined as the target application program;
    所述确定模块,还用于所述第一显示模块响应于所述第一输入,通过目标应用程序显示所述语音信息中的第一关键信息之前,在检测到语音通话结束的情况下,将所述第一输入的第二参数关联的应用程序,确定为所述目标应用程序。The determining module is further configured to, in response to the first input, the first display module displays the first key information in the voice information through the target application, in the case of detecting the end of the voice call, displaying the voice call. The application program associated with the second parameter of the first input is determined as the target application program.
  10. 根据权利要求7所述的装置,其中,所述第一显示模块,用于:The device according to claim 7, wherein the first display module is used for:
    响应于所述第一输入,启动目标应用程序;In response to the first input, start the target application;
    获取所述语音信息中所述目标应用程序对应的第一关键信息;Obtain the first key information corresponding to the target application in the voice information;
    在所述目标应用程序中,显示所述第一关键信息。In the target application, the first key information is displayed.
  11. 根据权利要求10所述的装置,其中,所述第一显示模块,具体用于:The device according to claim 10, wherein the first display module is specifically used for:
    将所述语音信息转换为目标文本信息,并从所述目标文本信息中,提取与所述目标应用程序对应的第一文本信息,所述第一文本信息中包括所述第一关键信息;Converting the voice information into target text information, and extracting first text information corresponding to the target application program from the target text information, where the first text information includes the first key information;
    获取至少一个类型信息,所述至少一个类型信息用于指示所述第一文本信息中包含的信息的类型;acquiring at least one type of information, where the at least one type of information is used to indicate the type of information contained in the first text information;
    根据所述至少一个类型信息,删除所述第一文本信息中类型与预设类型匹配的文本信息,以得到所述第一关键信息。According to the at least one type of information, the text information whose type matches the preset type in the first text information is deleted to obtain the first key information.
  12. 根据权利要求7所述的装置,其中,所述装置还包括:第二接收模块和第一处理模块;The apparatus according to claim 7, wherein the apparatus further comprises: a second receiving module and a first processing module;
    所述第二接收模块,用于所述第一显示模块通过目标应用程序显示所述语音信息中的第一关键信息之后,接收用户的第二输入,所述第二输入为用户对所述第一关键信息的编辑输入;The second receiving module is configured to receive the second input from the user after the first display module displays the first key information in the voice information through the target application, and the second input is the user's response to the first key information. - Edit input of key information;
    所述第一处理模块,用于响应于所述第二接收模块接收的第二输入,采用与所述第二输入对应的编辑处理方式,处理所述第一关键信息。The first processing module is configured to, in response to the second input received by the second receiving module, use an editing processing manner corresponding to the second input to process the first key information.
  13. 一种电子设备,包括处理器,存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1至6中任一项所述的语音识别的方法的步骤。An electronic device, comprising a processor, a memory, and a program or instruction stored on the memory and executable on the processor, the program or instruction being executed by the processor to achieve as claimed in claims 1 to 6 The steps of any one of the speech recognition methods.
  14. 一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如权利要求1至6中任一项所述的语音识别的方法的步骤。A readable storage medium on which programs or instructions are stored, and when the programs or instructions are executed by a processor, implement the steps of the method for speech recognition according to any one of claims 1 to 6.
  15. 一种计算机程序产品,所述程序产品被至少一个处理器执行以实现如权利要求1至6中任一项所述的语音识别的方法。A computer program product executed by at least one processor to implement the method of speech recognition as claimed in any one of claims 1 to 6.
  16. 一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如权利要求1至6中任一项所述的语音识别的方法。A chip, the chip includes a processor and a communication interface, the communication interface is coupled with the processor, and the processor is used for running a program or an instruction to implement the voice as claimed in any one of claims 1 to 6 method of identification.
  17. 一种电子设备,其特征在于,包括所述电子设备被配置成用于执行如权利要求1至6中任一项所述的语音识别的方法。An electronic device, characterized by comprising the electronic device being configured to perform the method of speech recognition according to any one of claims 1 to 6 .
PCT/CN2022/085338 2021-04-06 2022-04-06 Voice recognition method and apparatus, electronic device, and readable storage medium WO2022213986A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110369099.2 2021-04-06
CN202110369099.2A CN113299290A (en) 2021-04-06 2021-04-06 Method and device for speech recognition, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
WO2022213986A1 true WO2022213986A1 (en) 2022-10-13

Family

ID=77319546

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/085338 WO2022213986A1 (en) 2021-04-06 2022-04-06 Voice recognition method and apparatus, electronic device, and readable storage medium

Country Status (2)

Country Link
CN (1) CN113299290A (en)
WO (1) WO2022213986A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113299290A (en) * 2021-04-06 2021-08-24 维沃移动通信有限公司 Method and device for speech recognition, electronic equipment and readable storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541603A (en) * 2011-12-28 2012-07-04 华为终端有限公司 Method, system and terminal equipment for starting of application programs
US8606576B1 (en) * 2012-11-02 2013-12-10 Google Inc. Communication log with extracted keywords from speech-to-text processing
CN103440866A (en) * 2013-07-30 2013-12-11 广东明创软件科技有限公司 Method and mobile terminal for executing tasks according to communication information
CN104184870A (en) * 2014-07-29 2014-12-03 小米科技有限责任公司 Call log marking method and device and electronic equipment
CN108287815A (en) * 2017-12-29 2018-07-17 重庆小雨点小额贷款有限公司 Information input method, device, terminal and computer readable storage medium
CN109167884A (en) * 2018-10-31 2019-01-08 维沃移动通信有限公司 A kind of method of servicing and device based on user speech
CN109462697A (en) * 2018-11-27 2019-03-12 努比亚技术有限公司 Method of speech processing, device, mobile terminal and storage medium
CN112182197A (en) * 2020-11-09 2021-01-05 北京明略软件系统有限公司 Method, device and equipment for recommending dialect and computer readable medium
CN113299290A (en) * 2021-04-06 2021-08-24 维沃移动通信有限公司 Method and device for speech recognition, electronic equipment and readable storage medium

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103365671A (en) * 2012-03-28 2013-10-23 宇龙计算机通信科技(深圳)有限公司 Method for displaying application icons at time interval and mobile terminal
CN106445567A (en) * 2015-08-04 2017-02-22 西安中兴新软件有限责任公司 Starting method for terminal application and terminal
CN105100360B (en) * 2015-08-26 2019-05-03 百度在线网络技术(北京)有限公司 Call householder method and device for voice communication
CN105260393A (en) * 2015-09-15 2016-01-20 北京金山安全软件有限公司 Information pushing method and device and electronic equipment
CN105278863A (en) * 2015-11-10 2016-01-27 广东欧珀移动通信有限公司 Time interval-based method for detecting and preventing misoperation, device and mobile terminal
CN105677202A (en) * 2015-12-31 2016-06-15 宇龙计算机通信科技(深圳)有限公司 Application program wakeup method and device
CN106250014A (en) * 2016-07-22 2016-12-21 广东欧珀移动通信有限公司 The recommendation method and device of application program
CN106325867A (en) * 2016-08-24 2017-01-11 努比亚技术有限公司 Mobile terminal and interface display method thereof
CN107580143B (en) * 2017-09-30 2019-03-01 维沃移动通信有限公司 A kind of display methods and mobile terminal
CN108073437B (en) * 2017-12-20 2021-06-08 维沃移动通信有限公司 Application recommendation method and mobile terminal
CN108920471A (en) * 2018-06-29 2018-11-30 联想(北京)有限公司 A kind of voice translation method and electronic equipment
CN111046680B (en) * 2018-10-15 2022-05-24 华为技术有限公司 Translation method and electronic equipment
CN109286728B (en) * 2018-11-29 2021-01-08 维沃移动通信有限公司 Call content processing method and terminal equipment
CN110933225B (en) * 2019-11-04 2022-03-15 Oppo(重庆)智能科技有限公司 Call information acquisition method and device, storage medium and electronic equipment
CN111596818A (en) * 2020-04-24 2020-08-28 维沃移动通信有限公司 Message display method and electronic equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541603A (en) * 2011-12-28 2012-07-04 华为终端有限公司 Method, system and terminal equipment for starting of application programs
US8606576B1 (en) * 2012-11-02 2013-12-10 Google Inc. Communication log with extracted keywords from speech-to-text processing
CN103440866A (en) * 2013-07-30 2013-12-11 广东明创软件科技有限公司 Method and mobile terminal for executing tasks according to communication information
CN104184870A (en) * 2014-07-29 2014-12-03 小米科技有限责任公司 Call log marking method and device and electronic equipment
CN108287815A (en) * 2017-12-29 2018-07-17 重庆小雨点小额贷款有限公司 Information input method, device, terminal and computer readable storage medium
CN109167884A (en) * 2018-10-31 2019-01-08 维沃移动通信有限公司 A kind of method of servicing and device based on user speech
CN109462697A (en) * 2018-11-27 2019-03-12 努比亚技术有限公司 Method of speech processing, device, mobile terminal and storage medium
CN112182197A (en) * 2020-11-09 2021-01-05 北京明略软件系统有限公司 Method, device and equipment for recommending dialect and computer readable medium
CN113299290A (en) * 2021-04-06 2021-08-24 维沃移动通信有限公司 Method and device for speech recognition, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN113299290A (en) 2021-08-24

Similar Documents

Publication Publication Date Title
CN109522419B (en) Session information completion method and device
US9542949B2 (en) Satisfying specified intent(s) based on multimodal request(s)
CN109309751B (en) Voice recording method, electronic device and storage medium
CN101998107B (en) Information processing apparatus, conference system and information processing method
US9454964B2 (en) Interfacing device and method for supporting speech dialogue service
WO2022199543A1 (en) Message processing method and apparatus, and electronic device
WO2022089594A1 (en) Information display method and apparatus, and electronic device
WO2022135474A1 (en) Information recommendation method and apparatus, and electronic device
CN114020197B (en) Cross-application message processing method, electronic device and readable storage medium
JP2019053566A (en) Display control device, display control method, and program
CN112311658A (en) Voice information processing method and device and electronic equipment
WO2022213986A1 (en) Voice recognition method and apparatus, electronic device, and readable storage medium
CN112882623A (en) Text processing method and device, electronic equipment and storage medium
WO2022105754A1 (en) Character input method and apparatus, and electronic device
CN117253478A (en) Voice interaction method and related device
CN114827068A (en) Message sending method and device, electronic equipment and readable storage medium
CN113055529B (en) Recording control method and recording control device
CN113986574A (en) Comment content generation method and device, electronic equipment and storage medium
CN113241097A (en) Recording method, recording device, electronic equipment and readable storage medium
CN112165627A (en) Information processing method, device, storage medium, terminal and system
CN112306450A (en) Information processing method and device
CN110175063B (en) Operation assisting method, device, mobile terminal and storage medium
WO2023131290A1 (en) Information interaction methods and apparatuses, electronic device and medium
WO2023169361A1 (en) Information recommendation method and apparatus and electronic device
WO2023134599A1 (en) Voice information sending method and apparatus, and electronic device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22784052

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE