CN112863495A

CN112863495A - Information processing method and device and electronic equipment

Info

Publication number: CN112863495A
Application number: CN202011620096.3A
Authority: CN
Inventors: 刘佳妍
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-05-28

Abstract

The application discloses an information processing method, an information processing device and electronic equipment, and belongs to the field of communication. The method comprises the following steps: under the condition that first voice input by a user comprises a keyword, acquiring a target segment related to the keyword in the first voice; acquiring first input content matched with the target segment in a preset database; replacing the text information corresponding to the target segment and the keyword with the first input content to update the recognition result corresponding to the first voice and display the updated recognition result. The embodiment of the application can improve the confidentiality when the user inputs the voice.

Description

Information processing method and device and electronic equipment

Technical Field

The application belongs to the field of communication, and particularly relates to an information processing method and device and electronic equipment.

Background

With the development of voice recognition technology, a mode that a user inputs through a voice to text function of a mobile terminal is more and more common.

At present, in the process of using equipment by a mobile terminal user, when the user needs to use a voice-to-text function for inputting, the mobile terminal directly displays the content spoken by the user in an input box through voice recognition. In the process of implementing the present application, the applicant finds that at least the following problems exist in the prior art: when a user needs to input some private content, the privacy of the user is difficult to protect through the existing voice input processing mode. It can be seen that the user has poor privacy when making speech input.

Content of application

The embodiment of the application aims to provide an information processing method, an information processing device and electronic equipment, and can solve the problem that the existing voice input processing mode is poor in confidentiality.

In a first aspect, an embodiment of the present application provides an information processing method, including:

under the condition that first voice input by a user comprises a keyword, acquiring a target segment in the first voice related to the keyword in the first voice;

acquiring first input content matched with the target segment in a preset database;

replacing the text information corresponding to the target segment and the keyword with the first input content to update the recognition result corresponding to the first voice and display the updated recognition result.

In a second aspect, an embodiment of the present application provides an information processing apparatus, including:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a target segment which is associated with a keyword in a first voice under the condition that the first voice input by a user comprises the keyword;

the second acquisition module is used for acquiring first input content matched with the target segment in a preset database;

and the display module is used for replacing the text information corresponding to the target segment and the keyword with the first input content so as to update the recognition result corresponding to the first voice and display the updated recognition result.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.

In the embodiment of the application, by recognizing the first voice of the user, under the condition that the first voice of the user includes the keyword, the target segment associated with the keyword is obtained from the first voice, the first input content matched with the target segment is searched in the preset database, and the text information corresponding to the target segment and the keyword is replaced by the first input content, so that the recognition result corresponding to the first voice is updated, and the updated recognition result is displayed.

Drawings

FIG. 1 is a flowchart of an information processing method provided in an embodiment of the present application;

fig. 2 is a second flowchart of an information processing method according to an embodiment of the present application;

FIG. 3 is a diagram of a voiceprint spectrogram provided by an embodiment of the present application;

FIG. 4 is a second voiceprint spectrogram provided in the present embodiment;

FIG. 5 is a schematic view of an operation interface of an electronic device according to an embodiment of the present disclosure;

fig. 6 is a second schematic view of an operation interface of the electronic device according to the embodiment of the present application;

fig. 7 is a structural diagram of an information processing apparatus provided in an embodiment of the present application;

FIG. 8 is a block diagram of an electronic device according to an embodiment of the present disclosure;

fig. 9 is a second structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or described herein. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The information processing method provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

Referring to fig. 1, fig. 1 is a flowchart of an information processing method provided in an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:

step 101, under the condition that a first voice input by a user comprises a keyword, acquiring a target segment associated with the keyword in the first voice.

In the embodiment of the present application, the condition that the electronic device receives the first voice of the user may be triggered by an operation of the user. For example, in an optional implementation manner, after a user clicks a voice input key on an input interface of an input method of the electronic device, the user may perform the input of the first voice; of course, in other alternative embodiments, the user may also perform the input of the first voice after speaking the preset vocabulary or performing the preset operation gesture, which is not further limited herein.

It can be understood that, in the embodiment of the present application, the electronic device needs to convert the first voice into text for outputting. When the electronic device receives the first voice, feature extraction may be performed on the first voice to extract feature data of the first voice. Specifically, the first voice may be encoded and converted into a data format for storage, and the feature data may be voiceprint information of the first voice, that is, a sound wave spectrum carrying speech information, so that the electronic device may determine whether the keyword exists in the first voice by comparing the feature data.

In this embodiment of the application, the keyword may be a trigger word for triggering the electronic device to call the database preset in step 102, a word preset by a user, or a word default set by the system. For example, it may be a repeated word like "small v and small v", so as to avoid the user from speaking the above keywords under the condition of performing normal voice input.

The electronic device can perform feature extraction on the first voice, compare voiceprint information corresponding to the keyword with voiceprint information corresponding to the first voice, and determine that the first voice comprises the keyword under the condition that the voiceprint information corresponding to the keyword exists in the first voice. Of course, the electronic device may also perform character recognition on the first voice, and determine that the first voice includes the keyword when the keyword exists in the recognized text content corresponding to the first voice.

The target segment associated with the keyword may be a segment of the first speech that is adjacent to the keyword. For example, when the keyword is "small v and small v", and the user says "XX, and the trouble small v and small v sends my identification number to XX", the electronic device may intercept a voice segment corresponding to "sending my identification number to XX" as the target segment, or intercept "my identification number" as the target segment, and match the target segment with a voice segment in a preset database.

Of course, in other alternative embodiments, the target segment may also be a segment located before the keyword in the first voice, and the target segment may be set according to a requirement of a user, and is not further limited herein.

And 102, acquiring a first input content matched with the target segment in a preset database.

The preset database may be established based on a setting operation of a user. It should be understood that, in some embodiments, the preset database may include only the second text content and the third text content stored in association, for example, the combination of numeric characters "123456789876543210" and the text content "my identification number", "123456" and "my payment password", "introduction to personal title" and "my is XX, work in XXX, the main business is XXX", and so on.

The electronic device may perform text recognition on the target segment to obtain corresponding text contents, then search for whether the same second text contents exist in a preset database, and then find corresponding third text contents according to the association relationship in the preset database as the first input contents.

In some embodiments, the preset database may also include a voice segment and a text content stored in association with each other. Illustratively, referring to fig. 6, the preset database may include the user's identification number and a voice segment associated with the identification number, specifically, a combination of numeric characters "123456789876543210" and a voice segment "my identification number". The preset database may also include a payment password of the user and a voice segment associated with the payment password and described by the user, and specifically may include a number character combination "123 @123. com" and a "my mailbox" and the like. Therefore, the user can obtain the text content with higher privacy degree by speaking the voice content with lower privacy degree, and the confidentiality of voice input is improved.

Of course, in other alternative embodiments, the preset database may further include a short voice segment and detailed text contents associated with the voice segment, for example, the voice segment "personal title introduction" and "my is XX" that is spoken by the user, and the text contents work in XXX company, and the main service is XXX ", so that the user may obtain the detailed text contents by speaking the short voice content, thereby avoiding the user from inputting a large segment of voice, and improving the convenience of voice input.

It should be noted that the voice segments and the text contents stored in association may be in a one-to-one correspondence relationship described in the above contents, that is, one voice segment corresponds to one text content. In other optional embodiments, the preset database may also store multiple voice segments in association with the same text content, for example, the same identification number may correspond to multiple voice segments, such as "my identification number", and "my identification number", which are spoken by the user, so as to avoid a situation that the user cannot obtain the corresponding text content due to different terms when speaking the voice segments with the same meaning, and further improve convenience of voice input. Similarly, if the second text content and the third text content are stored in the preset database in an associated manner, the third text content may also correspond to a plurality of second text contents.

Further, the voice segments in the preset database may be encoded and stored in the form of feature data. The electronic device may compare the feature data of the target segment with the feature data of each voice segment in the preset database, and may use text content corresponding to a third voice segment in the preset database as the first input content of the target segment when the preset database includes the third voice segment matching the feature data of the target segment.

For example, the feature data may be voiceprint information of a speech segment. Referring to fig. 3, fig. 3 shows a voiceprint spectrum of each speech segment, and if segment a corresponds to the first speech, segment b corresponds to the keyword, and the speech segment after segment b may be the target segment.

As described in the above example, when the keyword is "small v", and the target segment is "my identification number" spoken by the user, the segment b in fig. 3 may be matched with the voiceprint information of the preset keyword voice segment, when it is determined that the text content corresponding to the segment b is the keyword, the target segment corresponding to the "my identification number" after the segment b is obtained, and if the preset database includes a third voice segment matching the voiceprint information of the target segment, the text content associated with the third voice segment is determined as the first input content "123456789876543210" matching the target segment.

In this way, if the input person of the third speech segment does not match the input person of the target segment, the third speech segment does not match the target segment even if the text contents corresponding to the speech segment and the target segment match. Therefore, through comparison of the feature data of the recognized voice input and the feature data of the voice fragment of the preset database, other users are prevented from acquiring the privacy data of the user of the equipment, and the safety of user data is improved.

Step 103, replacing the text information corresponding to the target segment and the keyword with the first input content to update the recognition result corresponding to the first voice, and displaying the updated recognition result.

In this embodiment, the text information corresponding to the target segment and the keyword may be converted into a first character recognition result through voice recognition, and since the keyword is a preset vocabulary triggering the invoking of the preset database, the corresponding text content does not need to exist in the recognition result, so that the text information corresponding to the target segment and the keyword may be replaced by the first input content.

It should be understood that, when the first speech includes only the speech segment corresponding to the keyword and the target segment, the recognition result is the first input content. When the first speech includes the speech segment corresponding to the keyword and the target segment and also includes other speech segments, the speech recognition may be performed on the first speech to obtain a second text recognition result, and then the first text recognition result in the second text recognition result is replaced with the first input content to obtain a final displayed recognition result.

Optionally, the electronic device may display the recognition result corresponding to the first voice in an input interface, and may specifically display the recognition result in an input box, so that a user may visually obtain the recognition result, and may edit the recognition result at the same time.

Optionally, the step 102 may specifically include:

intercepting a first voice segment associated with the keyword in the first voice;

acquiring a second voice segment with the highest matching degree with the first voice segment from N voice segments of the preset database, wherein N is a positive integer;

and determining a voice segment corresponding to the second voice segment in the first voice segment as the target segment.

In this embodiment, the electronic device may intercept, according to the position of the keyword, a first speech segment associated with the keyword in the first speech.

As described above, in the embodiment of the present application, the first speech segment associated with the keyword may be a speech segment adjacent to the speech segment corresponding to the keyword. For example, when the keyword is "small v" or the like to trigger the call of the vocabulary of the preset database, and the user speaks "XX", and the my identification number of the small v is sent to XX "in a troublesome manner, the first voice segment may be a voice segment corresponding to the fact that the my identification number is sent to XX", that is, all voice segments after the keyword.

It should be understood that there may be multiple speech segments in the predetermined database that match the first speech segment, since the first speech segment may be longer. The electronic device may obtain a voice segment with the highest matching degree, that is, the second voice segment, and then, in the first voice segment, take the voice segment corresponding to the second voice segment as the target segment, and take the text content associated with the target segment in the preset database as the first input content of the target segment.

Specifically, the matching degree may be determined by the matching duration of the voice segment, and by obtaining the voice segment with the longest matching duration from the N voice segments matched with the first voice segment, the text content required by the user may be more accurately output, which improves the user experience.

It is to be understood that the matching degree may also be determined by the number of words that match when the first voice segment and the N voice segments are recognized as text content. Of course, in other alternative embodiments, the conditions such as the shortest matching duration may also be obtained from the N voice segments, and the conditions may be specifically set according to actual needs.

Specifically, referring to fig. 4, a voiceprint spectrum of each voice segment is shown in fig. 4, if the segment a corresponds to the first voice segment, and feature data of the segment c and the segment d in the figure are simultaneously matched with the feature data of the segment a, at this time, if the highest matching degree indicates that the matching duration is the longest, the segment d may correspond to the second voice segment, in which case, the second voice segment is matched with the feature data of the target segment in the first voice segment. In other optional embodiments, if matching is performed according to the result of the character recognition, the text information corresponding to the target segment in the second speech segment is the same as that of the first speech segment.

For example, when the text content corresponding to the first voice is "XX", the my account and the password of the small v are sent to XX in a troublesome manner, and at this time, the text content corresponding to the target segment after the small v is intercepted is "my identity card number and my mailbox address are sent to XX". If the my account number and the voice fragment corresponding to the my account number and the password exist in the preset database at the same time, and the text content associated with the voice fragment corresponding to the my account number is the account number: the text content associated with the voice segment corresponding to aaa, my account and password is "account: aaa, password: abc123 ", in this case, to acquire the voice segment with the highest matching degree, the voice segment corresponding to the" my account and password "may be used as the second voice segment, in this case, the text information corresponding to the target segment is also the" my account and password ", and the electronic device may further use the text content" account "associated with the voice segment corresponding to the" my account and password: aaa, password: abc123 "is used as the first input content of the target segment, and finally the output display" xx "is troublesome to be added to the account: aaa, password: abc123 sends the recognition result to XX ".

In the embodiment of the application, the voice segment with the highest matching degree is obtained from the N voice segments of the preset database, and the text content associated with the voice segment with the highest matching degree is used as the first input content, so that the voice segment with the highest matching degree required by a user can be obtained under the condition that the plurality of voice segments are matched with the first voice segment, the voice input is more intelligent, and the requirements of the user are met.

Optionally, before the obtaining of the target segment associated with the keyword in the first voice, the method further includes:

acquiring voiceprint information of the first voice;

the obtaining of the target segment associated with the keyword in the first voice includes:

and under the condition that the voiceprint information is matched with preset voiceprint information, acquiring a target segment associated with the keyword in the first voice.

In this embodiment of the application, the feature data may be voiceprint information of a voice segment, and the electronic device may perform voiceprint recognition first when the user performs the first voice, and acquire the target segment associated with the keyword in the first voice when the voiceprint information is matched with the preset voiceprint information, so as to perform an effect of recognizing and speaking the identity of the user. The voiceprint information may be voiceprint information of the keyword in the first voice, or voiceprint information of another voice input by the user in advance.

Therefore, when the non-preset user speaks the same content, due to the fact that voiceprint information is not matched, the electronic device cannot call the preset database to obtain the first input content corresponding to the target segment, the disclosure of user privacy can be avoided, and the safety of voice input processing is improved.

Optionally, before the step 101, the method may further include:

receiving a second voice input of the user under the condition that a first operation of the user on the input setting interface is detected;

receiving a first input of a user to obtain a first text content;

establishing a mapping relation between the first text content and the second voice input;

and storing the mapping relation between the first text content and the second voice input in the preset database.

In the embodiment of the present application, referring to fig. 5 to 6, a user may associate and store the voice segment and the text content in the preset database through an input setting interface operation of the electronic device. Specifically, the interface may be an input method setting interface of the electronic device.

Furthermore, the user can perform second voice input by clicking a recording key in the input setting interface, then can perform input of first text content by clicking a text box key in the input setting interface, and stores the first text content and the second voice input in the preset database in a correlated manner by clicking a storage key. Of course, in some embodiments, the first text content may also be converted from a voice input of the user, and is not limited herein. Optionally, the input setting interface may further include a play button, and the user may confirm the content of the voice segment through the play button.

Specifically, the first text content may be stored in a form of a character string, and the second speech input may be stored in an audio format, or may be stored in a form of feature data, which is not further limited herein.

According to the embodiment of the application, the user sets interface operation through the input of the electronic equipment, the association between the recognition result and the second voice input is stored in the preset database, so that the associated text content can be output according to the user-defined voice segment, the voice input is more intelligent, and the user requirements are better met.

In order to better understand the present application, a specific implementation process of the present application will be described in detail below by taking a specific implementation mode of a user performing voice input through a voice input method as an example.

As shown in fig. 2, fig. 2 is a flowchart of a possible information processing method according to an embodiment of the present disclosure. The method can comprise the following steps:

step 201, a user initiates a voice session.

During the speaking process, the user needs to display the 'send this file to 123@123. com' on the screen in the input box, for example. If the user-defined instruction is not required to be called, the content can be directly dictated. If a custom instruction needs to be called, the modified dictation is "send this file to the small v my mailbox", where "small v" belongs to the category of the wakeup word.

Step 202, determining whether the input voice contains a wakeup word segment, if so, executing step 203, otherwise, executing step 205.

The method for detecting whether the sentence includes the wakeup word segment can refer to fig. 3 and the above description, and is not described herein again. Comparing the coded feature data with the feature data of the awakening word, if the segment is included, determining that the sentence needs to use a custom instruction, and executing step 203.

And step 203, intercepting the voice segment immediately following the awakening word, comparing the voice segment with all voices in the user-defined instruction library, and judging whether the voice print is matched. If there is a match, step 204 is performed, otherwise step 205 is performed.

The method for comparing the voiceprints is similar to the step 204, and the voice segments after the awakening words are compared with the audio frequency in the instruction base frame by frame, and the longest matching principle is adopted until the matching fails, and then the method is stopped. If there are multiple instructions in the instruction library that can match, the instruction with the longest match is used as the result, and the final result is the fragment d, as shown in FIG. 4.

And step 204, identifying the audio and replacing the content of the custom instruction.

And identifying the audio without the awakening words, and then replacing the segments related to the instructions in the decoded result with the contents input by the user in advance in the instruction library. For example, following the example of step 101: the user says the dialog "send this file to the small v-my mailbox" and decides that the user's voice is consistent with the voice stored in the library, the upper screen displays "send this file to 123@123. com". And completing the calling of a user-defined instruction.

And step 205, identifying the audio and directly displaying the final result on a screen.

And identifying the whole audio segment, and directly transmitting the audio segment to the input method client without performing replacement operation. Content ultimately rendered on the display screen is generated.

Through the steps, the method and the device for inputting the voice input method can solve the problem that the privacy of the voice input method of the user cannot be protected in the open occasion or the inconvenient situation. Meanwhile, common long text information can be stored in a user-defined mode, and the user can reduce the frequency of speaking the same content for multiple times. And only the voice of the user can identify the self-defined instruction, so that the data safety is ensured, and the embezzlement is avoided.

Optionally, before the step 201, the method may further include:

step 301, the user creates a custom instruction.

The user firstly defines the required instruction by himself, can open the setting of the voice input method and input the required shortcut instruction by himself. First the user can fill in notes, i.e. the label content in the personal order management system, at the interface as shown in fig. 5. And secondly, recording audio, recording short sentences or phrases of own voice in a quiet scene, such as four words reciting the 'identity card number', and finally filling the content needing to be displayed on a screen, namely the content which does not want to be spoken repeatedly in public places or many times, into the interface shown in the figure 5. After saving, the own instructions can be managed in a list of custom instructions as shown in FIG. 6.

And step 302, analyzing and storing the instructions saved by the user.

And audio is extracted and stored as features, and the features and the content which needs to be displayed by the user are stored in a one-to-one correspondence mode as the mappings of the features and the text.

It should be noted that, in the information processing method provided in the embodiment of the present application, the execution main body may be an information processing apparatus, or a control module in the information processing apparatus for executing the loaded information processing method. In the embodiment of the present application, an information processing method performed by an information processing apparatus is taken as an example, and the information processing method provided in the embodiment of the present application is described.

Referring to fig. 7, fig. 7 is a structural diagram of an information processing apparatus 700 according to an embodiment of the present application, and as shown in fig. 7, the information processing apparatus 700 includes:

a first obtaining module 701, configured to obtain, when a first voice received from a user includes a keyword, a target segment associated with the keyword in the first voice;

a second obtaining module 702, configured to obtain a first input content matched with the target segment in a preset database;

a display module 703, configured to replace the text information corresponding to the target segment and the keyword with the first input content, so as to update the recognition result corresponding to the first voice, and display the updated recognition result.

According to the embodiment of the application, the first acquisition module 701 is used for acquiring the target segment associated with the keyword in the first voice, the second acquisition module 702 is used for acquiring the first input content matched with the target segment in the preset database, and finally the first input content is replaced by the target segment and the text information corresponding to the keyword through the display module 703 so as to update the identification result corresponding to the first voice and display the updated identification result, so that the user can call the preset database by speaking the content of the keyword, the target segment and the text content corresponding to the keyword are replaced when being displayed in the input box, the user does not need to directly speak the text content corresponding to the target segment and the keyword, and the confidentiality of voice input of the user is improved.

Optionally, the first obtaining module 701 may specifically include:

the intercepting unit is used for intercepting a first voice segment which is associated with the keyword in the first voice;

the first acquisition unit is used for acquiring a second voice segment with the highest matching degree with the first voice segment from N voice segments of the preset database, wherein N is a positive integer;

and the second acquisition unit is used for determining a voice segment corresponding to the second voice segment in the first voice segment as the target segment.

Optionally, the information processing apparatus 700 may further include:

a third obtaining module, configured to obtain voiceprint information of the first voice;

the first obtaining module is specifically configured to:

Optionally, the information processing apparatus 700 may further include:

the first receiving module is used for receiving a second voice input of the user under the condition of detecting a first operation of the user on the input setting interface;

the second receiving module is used for receiving a first input of a user to obtain a first text content;

the establishing module is used for establishing a mapping relation between the first text content and the second voice input;

and the storage module is used for storing the mapping relation between the first text content and the second voice input into the preset database.

The information processing apparatus 700 in the embodiment of the present application may be an apparatus, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.

The information processing apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.

The information processing apparatus provided in the embodiment of the present application can implement each process implemented by the information processing method in the method embodiments of fig. 1 to 2, and is not described here again to avoid repetition.

Optionally, as shown in fig. 8, an electronic device is further provided in this embodiment of the present application, and includes a processor 802, a memory 801, and a program or an instruction stored in the memory 801 and executable on the processor 802, where the program or the instruction is executed by the processor 802 to implement each process of the information processing method embodiment, and can achieve the same technical effect, and no further description is provided here to avoid repetition.

It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic devices and the non-mobile electronic devices described above.

Fig. 9 is a schematic diagram of a hardware structure of an electronic device implementing various embodiments of the present application.

The electronic device 900 includes, but is not limited to: a radio frequency unit 901, a network module 902, an audio output unit 903, an input unit 904, a sensor 905, a display unit 906, a user input unit 907, an interface unit 908, a memory 909, and a processor 910.

Those skilled in the art will appreciate that the electronic device 900 may further include a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 910 through a power management system, so as to manage charging, discharging, and power consumption management functions through the power management system. The electronic device structure shown in fig. 9 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is not repeated here.

The processor 910 is configured to, in a case that a first voice input by a user includes a keyword, acquire a target segment associated with the keyword in the first voice; acquiring first input content matched with the target segment in a preset database; replacing the text information corresponding to the target segment and the keyword with the first input content to update the recognition result corresponding to the first voice and display the updated recognition result. It should be understood that, in the embodiment of the present application, the input Unit 904 may include a Graphics Processing Unit (GPU) 9041 and a microphone 9042, and the Graphics Processing Unit 9041 processes image data of a still picture or a video obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 906 may include a display panel 9061, and the display panel 9061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 907 includes a touch panel 9071 and other input devices 9072. A touch panel 9071 also referred to as a touch screen. The touch panel 9071 may include two parts, a touch detection device and a touch controller. Other input devices 9072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. Memory 909 can be used to store software programs as well as various data including, but not limited to, application programs and operating systems. The processor 910 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It is to be appreciated that the modem processor described above may not be integrated into processor 910.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the above-mentioned information processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of the information processing method embodiment, and can achieve the same technical effect, and the details are not repeated here to avoid repetition.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An information processing method characterized by comprising:

under the condition that first voice input by a user comprises a keyword, acquiring a target segment related to the keyword in the first voice;

2. The method of claim 1, wherein the obtaining of the target segment of the first speech associated with the keyword comprises:

3. The method according to claim 1 or 2, wherein before the obtaining of the target segment associated with the keyword in the first speech, the method further comprises:

acquiring voiceprint information of the first voice;

4. The method according to claim 1 or 2, wherein before the obtaining of the target segment associated with the keyword in the first speech, the method further comprises:

receiving a first input of a user to obtain a first text content;

5. An information processing apparatus characterized by comprising:

6. The apparatus of claim 5, wherein the first obtaining module comprises:

7. The apparatus of claim 5 or 6, further comprising:

the first obtaining module is specifically configured to:

8. The apparatus of claim 5 or 6, further comprising:

9. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the method of any one of claims 1 to 4.

10. A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.