CN108874172B

CN108874172B - Input method and device

Info

Publication number: CN108874172B
Application number: CN201710334619.XA
Authority: CN
Inventors: 马尔胡甫·曼苏尔; 张扬
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2017-05-12
Filing date: 2017-05-12
Publication date: 2022-12-13
Anticipated expiration: 2037-05-12
Also published as: CN108874172A

Abstract

The embodiment of the invention provides an input method and a device, wherein the method specifically comprises the following steps: acquiring an input string of a user; determining a target voice fragment corresponding to the input string according to a mapping relation between a preset character string and a voice fragment which is established in advance; and displaying the target voice fragment as a candidate item corresponding to the input string. The embodiment of the invention can show the target voice fragment corresponding to the input string as the candidate item, so that the user can complete the input of the target voice fragment by selecting the candidate item, thereby reducing the operation cost required by the user for inputting the voice fragment and improving the efficiency of inputting the voice fragment by the user.

Description

Input method and device

Technical Field

The invention relates to the technical field of input methods, in particular to an input method and device.

Background

With the continuous development of communication technology, users can communicate not only by inputting text information, but also by inputting voice information.

In some special communication scenarios, the user needs to input the same voice message multiple times. For example, in the scenario of customer service, the customer service staff needs to make some unified open inputs to different customers, such as voice information like "welcome to the home shop". However, since the existing communication applications do not support the copy or forwarding of the voice message, the customer service staff needs to repeatedly input the same voice message in a voice manner, which not only consumes more operation cost, but also results in lower input efficiency of the voice message.

Disclosure of Invention

In view of the above problems, embodiments of the present invention have been made to provide an input method and apparatus that overcome or at least partially solve the above problems, and can reduce the operation cost required for a user to input a speech segment and can improve the efficiency of the user to input a speech segment.

In order to solve the above problems, the present invention discloses an input method, comprising:

acquiring an input string of a user;

determining a target voice fragment corresponding to the input string according to a mapping relation between a preset character string and a voice fragment which is established in advance;

and displaying the target voice fragment as a candidate item corresponding to the input string.

Optionally, the method further comprises:

responding to the screen-on operation of the user on the target voice fragment, and screen-on the target voice fragment or the text information corresponding to the target voice fragment.

Optionally, the displaying the target voice fragment or the text information corresponding to the target voice fragment on the screen includes:

acquiring configuration information, wherein the configuration information is used for representing a target screen-on format corresponding to the target voice fragment;

and according to a target screen-on format represented by the configuration information, the target voice fragment or the text information corresponding to the target voice fragment is displayed on a screen.

determining a target screen-on format corresponding to the target voice fragment according to an application program environment corresponding to the input string;

and according to a target screen-on format corresponding to the target voice fragment, screen-on the target voice fragment or the text information corresponding to the target voice fragment.

Optionally, after the target voice segment corresponding to the input string is determined according to a mapping relationship between a preset character string and a voice segment, the method further includes:

acquiring text information corresponding to the target voice fragment;

the presenting the target speech segment as the candidate item corresponding to the input string includes:

and displaying the target voice fragment and the text information corresponding to the target voice fragment as a candidate item corresponding to the input string.

Optionally, the method further comprises:

acquiring a voice fragment input by a user;

acquiring a preset character string corresponding to the voice fragment and input by a user;

and establishing a mapping relation between the voice segment and the preset character string.

Optionally, after the obtaining of the voice segment input by the user, the method further includes:

acquiring text information corresponding to the voice fragment;

the establishing of the mapping relationship between the voice segments and the preset character strings comprises:

and establishing a mapping relation among the voice fragments, the text information and the preset character strings.

Optionally, the method further comprises:

and sending the mapping relation between the preset character string and the voice fragment to a server.

Optionally, before the determining, according to a mapping relationship between a preset character string and a voice segment established in advance, a target voice segment corresponding to the input string, the method further includes:

sending a corresponding relation synchronization request to a server;

and receiving the mapping relation between the preset character string and the voice fragment returned by the server according to the corresponding relation synchronization request.

In yet another aspect, the present invention discloses an input device, comprising:

the first acquisition module is used for acquiring an input string of a user;

the determining module is used for determining a target voice segment corresponding to the input string according to a mapping relation between a preset character string and a voice segment which is established in advance;

and the display module is used for displaying the target voice fragment as a candidate item corresponding to the input string.

Optionally, the apparatus further comprises:

and the screen-on module is used for responding to the screen-on operation of the user on the target voice fragment and displaying the target voice fragment or the text information corresponding to the target voice fragment on the screen.

Optionally, the screen-up module comprises:

the acquisition submodule is used for acquiring configuration information, and the configuration information is used for representing a target on-screen format corresponding to the target voice fragment;

and the screen-loading sub-module is used for loading the target voice fragment or the text information corresponding to the target voice fragment on the screen according to the target screen-loading format represented by the configuration information.

Optionally, the screen-up module further includes:

the determining submodule is used for determining a target on-screen format corresponding to the target voice fragment according to the application program environment corresponding to the input string;

and the screen-on sub-module is used for displaying the target voice fragment or the text information corresponding to the target voice fragment on a screen according to the target screen-on format corresponding to the target voice fragment.

Optionally, the apparatus further comprises:

the second acquisition module is used for acquiring text information corresponding to the target voice fragment;

the presentation module includes:

and the display sub-module is used for displaying the target voice fragment and the text information corresponding to the target voice fragment as the candidate item corresponding to the input string.

Optionally, the apparatus further comprises:

the third acquisition module is used for acquiring a voice segment input by a user;

the fourth acquisition module is used for acquiring a preset character string which is input by a user and corresponds to the voice fragment;

and the mapping relation establishing module is used for establishing the mapping relation between the voice fragment and the preset character string.

Optionally, the apparatus further comprises:

a fifth obtaining module, configured to obtain text information corresponding to the voice fragment;

the mapping relation establishing module comprises:

and the mapping relation establishing submodule is used for establishing the mapping relation among the voice fragment, the text information and the preset character string.

Optionally, the apparatus further comprises:

and the first sending module is used for sending the mapping relation between the preset character string and the voice fragment to a server.

Optionally, the apparatus further comprises:

the second sending module is used for sending a corresponding relation synchronization request to the server;

and the receiving module is used for receiving the mapping relation between the preset character string and the voice fragment returned by the server according to the corresponding relation synchronization request.

In yet another aspect, an input device is disclosed that includes a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for:

acquiring an input string of a user;

In yet another aspect, the present disclosure discloses a machine-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform the input method as described in the first aspect.

The embodiment of the invention has the following advantages:

the embodiment of the invention can show the target voice fragment corresponding to the input string as the candidate item, so that the user can complete the input of the target voice fragment by selecting the candidate item, thereby reducing the operation cost required by the user for inputting the voice fragment and improving the efficiency of inputting the voice fragment by the user.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is an exemplary block diagram of an input method system in accordance with an embodiment of the present invention;

FIG. 2 is a flow chart of the steps of a first embodiment of an input method of the present invention;

FIG. 3 is a flowchart illustrating steps of a second embodiment of an input method;

FIG. 4 is a flowchart illustrating steps of a method for establishing a mapping relationship between a preset string and a speech segment according to an embodiment of the present invention;

FIG. 5 is a block diagram of an input device according to an embodiment of the present invention;

FIG. 6 is a block diagram of an input device 600 according to the present invention; and

fig. 7 is a schematic diagram of a server in some embodiments of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The term "and/or" in the present invention is only an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The embodiment of the invention provides an input scheme, which can acquire an input string of a user, determine a target voice fragment corresponding to the input string according to a mapping relation between a preset character string and a voice fragment which are established in advance, and finally show the target voice fragment as a candidate item corresponding to the input string. The embodiment of the invention can display the target voice fragment corresponding to the input string as the candidate item, so that the user can complete the input of the target voice fragment by selecting the candidate item, thereby reducing the operation cost required by the user for inputting the voice fragment and improving the efficiency of inputting the voice fragment by the user.

It should be noted that the embodiment of the present invention may be applied to input method programs of various input methods such as keyboard symbols, voice, handwriting, and the like, that is, a user may input through a code character string (i.e., the input string in the embodiment of the present invention). In the field of input methods, for an input method program of, for example, chinese, japanese, korean, or other languages, an input string input by a user may be converted into a candidate of a corresponding language. In the embodiment of the invention, chinese is mainly taken as an example for explanation, and other languages are mutually referred. It is to be understood that the chinese input method may include, but is not limited to, a full pinyin, a simple pinyin, a stroke, a five-stroke, etc., and the embodiment of the present invention is not limited to a specific input method program corresponding to a certain language.

Referring to fig. 1, an exemplary block diagram of an input method system according to an embodiment of the present invention is shown, and as shown in fig. 1, the input method system may include: at least one client 100 and at least one server 200. As shown in fig. 1, the client 100 and the server 200 are located in a wired or wireless network through which the client 100 and the server 200 perform data interaction.

The client 100 may be a client corresponding to the input method program. In practical applications, the client 100 may run on an intelligent terminal, which specifically includes but is not limited to: smart phones, tablet computers, electronic book readers, sound recorders, MP3 (moving Picture Experts Group Audio Layer III) players, MP4 (moving Picture Experts Group Audio Layer IV) players, laptop portable computers, car pcs, desktop computers, set-top boxes, smart televisions, wearable devices, and the like.

Specifically, the client 100 may obtain an input string of a user, determine a target speech segment corresponding to the input string according to a mapping relationship between a preset character string and a speech segment established in advance, and finally display the target speech segment as a candidate item corresponding to the input string.

The server 200 may receive the mapping relationship between the preset character string and the voice segment sent by the client 100, and store the mapping relationship between the preset character string and the voice segment preset by the user; the mapping relationship between the stored preset character string and the voice clip may also be sent to the client 100, so that the client 100 may provide the voice clip to the user.

It should be noted that the mapping relationship between the preset character string and the voice segment in the embodiment of the present invention may be a one-to-one, one-to-many, or many-to-many relationship; that is, one preset character string may correspond to one voice segment, and may also correspond to multiple voice segments, or multiple different preset character strings may correspond to the same voice segment.

Referring to table 1, a schematic diagram of a mapping relationship between a preset character string and a voice segment in the embodiment of the present invention is shown, where the preset character string "hy" may correspond to the voice segment "welcome to the store", and the preset character string "xx" may correspond to the voice segment "thank you, goodbye".

TABLE 1

Preset character string	Speech segment
		hy	Welcome temporary local store
…	…
		xx	Decline and return to the heart

Referring to table 2, another schematic diagram of a mapping relationship between a preset character string and a voice segment in the embodiment of the present invention is shown, where the preset character string "welcome" may correspond to the voice segment "welcome to the store", and the preset character string "thank you" may correspond to the voice segment "thank you, bye".

TABLE 2

It should be noted that the preset character string may be composed of characters, and the corresponding characters may include: the preset character string may include one character or a combination of multiple characters, and the specific constituent characters of the preset character string are not limited in the embodiments of the present invention.

Method embodiment 1

Referring to fig. 2, a flowchart illustrating steps of a first embodiment of an input method according to the present invention is shown, which may specifically include:

step 201, obtaining an input string of a user.

In practical application, a client of an input method program can acquire an input string of a user according to an input operation triggered by the user. For example, when the input operation of the user is detected, the character corresponding to the key may be obtained according to the key triggered by the input operation of the user, so as to combine the characters corresponding to at least one key, and finally obtain the input string input by the user.

It should be noted that the input string input by the user may be one character or multiple characters, which is not limited in the embodiment of the present invention.

Step 202, determining a target voice segment corresponding to the input string according to a mapping relation between a preset character string and the voice segment which is established in advance.

In order to enable a user to input a voice segment quickly and conveniently, a mapping relation between a preset character string and the voice segment can be established in advance, and a target voice segment corresponding to the input string is provided for the user according to the mapping relation, so that the user can input the target voice segment. The preset character string may be a preset character string corresponding to the voice segment.

In an optional embodiment of the present invention, after the input string input by the user is obtained, the mapping relationship may be searched according to the input string of the user to obtain the preset input string matched with the input string, and then the voice segment corresponding to the preset input string matched with the input string may be used as the target voice segment corresponding to the input string of the user.

And step 203, presenting the target voice fragment as a candidate item corresponding to the input string.

After the target speech segment is obtained, the target speech segment may be presented to the user as a candidate. Optionally, the process of presenting the target speech segment as the candidate corresponding to the input string may include: displaying preset marks corresponding to the target voice segments, wherein the preset marks may include: icons, symbols and the like, so that the user realizes the recognition of the target voice segment according to the preset identification. Particularly, when the number of the target voice segments is greater than 1, the preset identifier may also be text information corresponding to the target voice segments, so as to improve the recognition degree of the target voice segments.

In an application example of the present invention, after obtaining a target voice segment, a client may set a corresponding rectangular icon for the target voice segment, set a length and a width of the rectangular icon according to a duration of the target voice segment, and add duration data of the target voice segment to one end of the rectangular icon.

In an optional embodiment of the present invention, the corresponding target speech segment may also be played in response to a play trigger operation of the user on the candidate item corresponding to the target speech segment. The played target language segment can be used as a basis for whether the user selects the candidate item corresponding to the target speech segment. The play trigger operation may be any operation different from the screen-up operation, such as clicking, sliding, and double-clicking, for example, the play trigger operation and the screen-up operation may be operations with different pressing force degrees, and the specific play trigger operation is not specifically limited in the embodiment of the present invention.

In an optional embodiment of the present invention, according to an obtained target speech segment, a preset identifier corresponding to the target speech segment may be presented as a candidate item. Optionally, the candidate corresponding to the target speech segment may be displayed in the preference of the candidate area of the input interface of the input method program, and the target speech segment may also be displayed on the right side of the syllable area of the input interface.

In an optional embodiment of the present invention, the candidate corresponding to the target speech segment may include: the candidate items in the voice format and/or the text format, so that a user can select the candidate items in the required format for screen display according to the actual application requirements.

In an optional embodiment, when the target voice segment is determined according to the mapping relationship between the preset character string and the voice segment, the text information corresponding to the target voice segment may be acquired, and the text information corresponding to the target voice segment is displayed for the user to select. The text information may be information set in advance for the user, or information obtained by the client based on the speech recognition of the target speech segment.

It should be noted that, in the embodiment of the present invention, a locally stored mapping relationship between a preset character string and a voice segment may also be sent to the server, so that a user may continue to input a voice segment according to the mapping relationship between the preset character string and the voice segment after logging in an account of the user at another client.

In an alternative embodiment, the mapping relationship between the preset character string and the voice segment may be sent to the server, so that the server receives and stores the mapping relationship between the preset character string and the voice segment. The mapping relation can be sent to the server every other first preset time; or sending the mapping relation to the server after detecting that the mapping relation between the preset character string and the voice fragment is updated; the mapping relationship may also be sent to the server after a synchronization operation triggered by the user is detected, where the synchronization operation is used to instruct the client to send the mapping relationship between the preset character string and the voice segment to the server. Optionally, a person skilled in the art may set the first preset time length according to a frequency of setting a mapping relationship between a preset character string and a voice segment by a user, and the specific first preset time length in the embodiment of the present invention is not limited.

Optionally, after detecting a synchronization operation triggered by a user, obtaining and sending a mapping relationship between the preset character string and the voice segment and identification information to a server, where the identification information is used to represent a user account logged on a client, so that the server receives the mapping relationship between the preset character string and the voice segment and the identification information, and stores the mapping relationship between the preset character string and the voice segment to a specified storage space according to the identification information.

Of course, when the mapping relationship between the preset character string and the voice segment is not stored in the client, the mapping relationship stored in the server can be requested from the server, so that the voice segment is provided to the user according to the mapping relationship returned by the server, and the screen-loading of the voice segment is completed.

In an optional embodiment, a corresponding relation synchronization request may be sent to the server, so that the server obtains a mapping relation between the preset character string and the voice segment according to the corresponding relation synchronization request, and sends the mapping relation to the client, and finally the client receives the mapping relation between the preset character string and the voice segment returned by the server according to the corresponding relation synchronization request. The corresponding relation synchronization request can be sent to the server every second preset time; the method and the device can also send the corresponding relation synchronization request to the server after detecting the corresponding relation request operation triggered by the user. The second preset duration is similar to the first preset duration, and the specific second preset duration is not limited in the present invention.

Optionally, after the client detects a correspondence request operation triggered by the user, the client may send a correspondence synchronization request to the server, so that the server obtains a mapping relationship between the corresponding preset character string and the voice segment according to the correspondence synchronization request, and sends the mapping relationship to the client, and accordingly, the client may receive the mapping relationship fed back by the server according to the correspondence synchronization request. The corresponding relation synchronization request is used for requesting the server to send a mapping relation between a preset character string and a voice fragment to the client.

In an application example of the present invention, it is assumed that a mapping relationship between a voice segment "i am busy now and contact you later" and a preset character string "ang" is set in a client by a user a through a mobile phone, and the client may send the mapping relationship to a server. Then, in the process of inputting by the user a through the PC, the client may send a corresponding relation synchronization request to the server, receive a mapping relation between the voice fragment "i am now busy and later contacts you" and the preset character string "mang" sent by the server, and thus, after the user a generates the input string "mang" through the PC, the embodiment of the present invention may provide a candidate item corresponding to the voice fragment "i am now busy and later contacts you" so that the user a completes the input of the voice fragment "i am now busy and later contacts you". It should be noted that, because mobile terminals such as mobile phones and tablet computers are mostly provided with voice acquisition devices such as microphones, and instant messaging applications such as wechat applications running on the mobile terminals provide voice acquisition interfaces, voices of users can be conveniently acquired through the mobile terminals; some fixed terminals such as a PC (personal computer) and the like are not provided with voice acquisition equipment such as a microphone and the like, and the instant messaging application of a webpage version is not provided with a voice acquisition interface, so that the voice of a user is difficult to acquire through the fixed terminals; the embodiment of the invention can output the voice input by the user on the first terminal on the second terminal through presetting the mapping relation between the character strings and the voice fragments, for example, the first terminal and the second terminal can be a mobile phone and a PC respectively, so that the voice communication of the user on the second terminal can be facilitated, and the use experience of the user can be improved.

It should be noted that, in some embodiments of the present invention, the method of an embodiment of the present invention may further include: the method includes the steps of obtaining a text candidate corresponding to an input string of a user, and presenting the text candidate to the user, for example, the text candidate and a candidate corresponding to a target speech segment may be presented at the same time, where the text candidate and the candidate corresponding to the target speech segment may be located in the same or different presented areas, for example, the candidate corresponding to the target speech segment may be located on the right side of a syllable region, and the text candidate may be located on the lower side of the syllable region. In practical application, the text candidate corresponding to the input string of the user may be obtained according to a thesaurus, optionally, the thesaurus may include a cloud thesaurus and/or a local thesaurus, and the type of the thesaurus may include: system thesaurus, user thesaurus, N-ary thesaurus, cell thesaurus, etc. It can be understood that the embodiment of the present invention does not limit the order of obtaining the target speech segment and obtaining the text candidate corresponding to the input string of the user.

To sum up, the embodiment of the present invention provides an input method, which may obtain an input string of a user, determine a target speech segment corresponding to the input string according to a mapping relationship between a preset character string and a speech segment, and finally display the target speech segment as a candidate item corresponding to the input string. The embodiment of the invention can show the target voice fragment corresponding to the input string as the candidate item so that the user can complete the input of the target voice fragment by selecting the candidate item, thereby reducing the operation cost required by the user for inputting the voice fragment and improving the efficiency of inputting the voice fragment by the user.

Method embodiment two

Referring to fig. 3, a flowchart illustrating steps of a second embodiment of the input method according to the first embodiment of the present invention is shown, and on the basis of the first embodiment of the method illustrated in fig. 2, this embodiment describes in detail a process of an upper screen target speech fragment, which may specifically include the following steps:

301, acquiring an input string of a user;

step 302, determining a target voice segment corresponding to an input string according to a mapping relation between a preset character string and a voice segment which is established in advance;

step 303, presenting the target voice segment as a candidate item corresponding to the input string;

with respect to the first embodiment of the method shown in fig. 2, the method of this embodiment may further include:

and 304, responding to the screen-on operation of the user on the target voice segment, and screen-on the target voice segment or the text information corresponding to the target voice segment.

After the candidate item corresponding to the target voice segment is displayed to the user, the screen-up operation of the user on the candidate item can be detected, so that the candidate item corresponding to the target voice segment can be displayed according to the screen-up operation triggered by the user, and the target voice segment or the text information corresponding to the target voice segment can be rapidly input.

In one embodiment, after the screen-up operation of the target voice segment triggered by the user is detected, the target voice segment can be subjected to screen-up.

In another embodiment, since the target voice fragment corresponds to the text information, the corresponding target on-screen format of the target voice fragment can be determined according to the actual input intention of the user, and the target voice fragment is on-screen in the target on-screen format. The on-screen format may include a voice format and/or a text format, and the target on-screen format may include: one of a speech format and a text format.

The embodiment of the invention can provide the following screen-loading scheme for loading the target voice fragment or the text information corresponding to the target voice fragment on the screen in response to the screen-loading operation of the user on the target voice fragment:

scheme 1 for screen mounting,

The screen-up scheme 1 may include: acquiring configuration information, wherein the configuration information is used for representing a target screen-on format corresponding to a target voice fragment; and according to the target on-screen format represented by the configuration information, the target voice fragment or the text information corresponding to the target voice fragment is on-screen.

The configuration information may be a target on-screen format corresponding to a target voice segment set by a user at a client in advance.

In an optional embodiment, in response to a user performing a screen-up operation on a target voice segment, configuration information of a client may be obtained first, and a target screen-up format corresponding to the target voice segment is determined according to the configuration information of the client. When the on-screen format set by the user in the configuration information is a voice format, determining that the target on-screen format corresponding to the target voice fragment is the voice format; and when the on-screen format set by the user in the configuration information is a text format, determining that the target on-screen format corresponding to the target voice fragment is the text format.

After the target screen-on format is determined, screen-on of the candidate item corresponding to the target voice fragment can be performed according to the target screen-on format. For example, if the target screen-on format is a voice format, the target voice fragment may be displayed on the screen, and if the target screen-on format is a text format, the text information corresponding to the target voice fragment may be displayed on the screen, so that the target voice fragment or the text information corresponding to the target voice fragment may be quickly input.

Screen-up scheme 2

The screen-up scheme 2 may include: determining a target screen-on format corresponding to the target voice fragment according to the application program environment corresponding to the input string; and according to a target on-screen format corresponding to the target voice fragment, the target voice fragment or the text information corresponding to the target voice fragment is on-screen.

In an optional embodiment, after detecting a screen-up operation triggered by a user, identification information of an application program corresponding to an input string may be acquired, and a type of the application program may be determined according to the identification information, where the type of the application program may include: for example, the types of the text editing application and the instant messaging interaction application, for example, when the identification information of the application program represents that the application program is the text editing application, the target screen-on format corresponding to the target voice segment can be determined to be a text format; when the identification information of the application program represents that the application program is an instant messaging interactive application, the target screen-up format corresponding to the target voice fragment can be determined to be a voice format. The application program corresponding to the input string is the application program corresponding to the active window of the intelligent terminal located in the foreground, namely, the application program hosted by the input method program.

For example, when a user is currently inputting in a text editing application program, after detecting a screen-up operation triggered by the user, the client may obtain identification information of the application program, match the identification information with an identification list pre-stored in the client, determine that the type of the application program is a text editing application, and determine that a target screen-up format corresponding to a target voice fragment is a text format. The identification list is used for recording a mapping relationship between the identification information of at least one application program and the type of the application program, and of course, the identification list can also record a screen-up format corresponding to the identification information of the application program, and the screen-up format corresponding to the application program can be adjusted in the identification list according to the actual input habit of the user.

Of course, other manners may also be used to determine the target on-screen format corresponding to the target voice segment, which is not limited in the embodiment of the present invention.

To sum up, the embodiment of the present invention provides an input method, which can determine a target on-screen format of a target voice segment according to preset configuration information, and display the target voice segment or text information corresponding to the target voice segment according to the determined target on-screen format. Because different target screen-on formats and target voice fragments or text information can be used for screen-on, the content of the screen-on of the client side is more consistent with the current input environment of the user, and the input efficiency is improved.

On the basis of the embodiment shown in fig. 2, this embodiment details the process of establishing the mapping relationship between the voice segment and the preset character string. Referring to fig. 4, a flowchart illustrating steps of a method for establishing a mapping relationship between a preset character string and a voice segment according to an embodiment of the present invention is shown, and the method specifically includes the following steps:

in the process of providing the target voice segment for the user, the client needs to acquire the target voice segment corresponding to the input string according to the mapping relation between the preset character string and the voice segment. Therefore, the client needs to establish a mapping relationship between the preset character string and the voice segment in advance.

Step 401, acquiring a voice segment input by a user.

In order to facilitate a user to establish a mapping relation between a preset character string and a voice segment, a preset interface for establishing the mapping relation can be provided for the user, and when the preset interface is detected to be triggered by the user to perform selection operation, the user can be prompted to enter the voice segment, so that the voice segment can be obtained according to the voice segment entered by the user. The preset interface is used for establishing a mapping relation between the preset character string and the voice fragment, and when the preset interface is detected to be triggered by a user to perform selection operation, an interface for setting the preset character string and the voice fragment can be provided for a display prompt user.

In an optional embodiment, after the voice segment input by the user is obtained, the text information corresponding to the voice segment can be obtained, so that in the process of displaying the voice segment, not only the voice segment can be displayed, but also the text information corresponding to the voice segment can be displayed.

It should be noted that, text information corresponding to a voice segment can be obtained based on voice recognition according to the voice segment; and acquiring text information corresponding to the voice fragment set by the user. That is, the obtained text information may be obtained by converting the client according to the voice fragment; and text information corresponding to the voice fragment actively set by the user can be also used.

And 402, acquiring a preset character string corresponding to the voice segment input by the user.

Because the mapping relation expected by the user is established according to the voice segment and the preset character string. Therefore, after the user enters the voice segment, the preset character string needs to be acquired, so as to establish a mapping relationship between the preset character string and the voice segment in the subsequent steps.

In an optional embodiment, after obtaining the voice segment input by the user, the client may prompt the user to input a preset character string corresponding to the voice segment, so that when an input string input by the user later matches the preset character string, the voice segment may be provided to the user.

And step 403, establishing a mapping relation between the voice segments and the preset character strings.

And according to the acquired voice fragment and the preset character string, establishing a mapping relation between the preset character string and the voice fragment. Further, in step 401, text information corresponding to the voice segment may also be acquired, and then in step 403, a mapping relationship between the voice segment, the text information, and the preset character string may also be established.

It should be noted that after the mapping relationship between the preset character string and the voice segment is established, the mapping relationship may also be maintained, for example, the mapping relationship is sent to the server every preset time period, so that the server may store the mapping relationship set by the user; certainly, the mapping relationship stored in the server can be requested from the server every other preset time, so that the mapping relationships stored in the client and the server can be updated in time. Of course, the mapping relationship between the preset character string and the voice segment may also be maintained in other manners, which is not limited in the embodiment of the present invention.

It should be noted that, in the embodiment of the present invention, the execution order of step 401 and step 402 is not limited, and the execution order of the two steps may be sequential, subsequent, parallel, and the like.

To sum up, the embodiment of the present invention provides an input method, which can establish a mapping relationship between a voice segment and a preset character string according to the voice segment and the preset character string input by a user. By establishing the mapping relation, when the target voice segment is determined according to the input string of the user, the target voice segment corresponding to the input string can be conveniently and quickly provided for the user, and the input efficiency of the user is improved.

It should be noted that, for simplicity of description, the method embodiments are described as a series of motion combinations, but those skilled in the art should understand that the present invention is not limited by the described motion sequences, because some steps may be performed in other sequences or simultaneously according to the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no moving act is required as an embodiment of the invention.

Device embodiment

Referring to fig. 5, a block diagram of an embodiment of an input device according to the present invention is shown, which may specifically include:

a first obtaining module 501, configured to obtain an input string of a user;

a determining module 502, configured to determine, according to a mapping relationship between a preset character string and a voice segment, a target voice segment corresponding to the input string;

and the presentation module 503 is configured to present the target speech segment as a candidate corresponding to the input string.

To sum up, the embodiment of the present invention provides an input apparatus, which can obtain an input string of a user, determine a target speech segment corresponding to the input string according to a mapping relationship between a preset character string and a speech segment, and finally display the target speech segment as a candidate item corresponding to the input string. The embodiment of the invention can display the target voice fragment corresponding to the input string as the candidate item, so that the user can complete the input of the target voice fragment by selecting the candidate item, thereby reducing the operation cost required by the user for inputting the voice fragment and improving the efficiency of inputting the voice fragment by the user.

Optionally, the apparatus further comprises:

Optionally, the screen-up module comprises:

Optionally, the screen-up module further includes:

Optionally, the apparatus further comprises:

the second acquisition module is used for acquiring the text information corresponding to the target voice fragment;

the presentation module includes:

Optionally, the apparatus further comprises:

the fourth acquisition module is used for acquiring the preset character string which is input by the user and corresponds to the voice segment;

and the mapping relation establishing module is used for establishing the mapping relation between the voice fragments and the preset character strings.

Optionally, the apparatus further comprises:

a fifth obtaining module, configured to obtain text information corresponding to the voice segment;

the mapping relation establishing module comprises:

and the mapping relation establishing submodule is used for establishing the mapping relation among the voice fragments, the text information and the preset character strings.

Optionally, the apparatus further comprises:

and the first sending module is used for sending the mapping relation between the preset character string and the voice fragment to the server.

Optionally, the apparatus further comprises:

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

Embodiments of the present invention also provide an input device, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors include instructions for: acquiring an input string of a user; determining a target voice fragment corresponding to the input string according to a pre-established mapping relation between a preset character string and the voice fragment; and displaying the target voice fragment as a candidate item corresponding to the input string.

Optionally, the device is also configured to execute the one or more programs by the one or more processors including instructions for:

Optionally, after the determining, according to the pre-established mapping relationship between the preset character string and the voice segment, the target voice segment corresponding to the input string, the device being further configured to execute, by one or more processors, the one or more programs including instructions for:

acquiring text information corresponding to the target voice fragment;

acquiring a voice fragment input by a user;

Optionally, after the obtaining the speech segment of the user input, the device is further configured to execute, by the one or more processors, the one or more programs including instructions for:

acquiring text information corresponding to the voice fragment;

Optionally, before the determining, according to the mapping relationship between the pre-established preset character string and the voice segment, the target voice segment corresponding to the input string, the device being further configured to execute the one or more programs by the one or more processors, including instructions for:

sending a corresponding relation synchronization request to a server;

Fig. 6 is a block diagram illustrating an input device 600 according to an exemplary embodiment, where the input device 600 may be an intelligent terminal or a server. For example, the apparatus 600 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 6, apparatus 600 may include one or more of the following components: processing component 602, memory 604, power component 606, multimedia component 608, audio component 610, input/output (I/O) interface 612, sensor component 614, and communication component 616.

The processing component 602 generally controls overall operation of the device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 602 may include one or more processors 620 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 can include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.

The memory 604 is configured to store various types of data to support operation at the device 600. Examples of such data include instructions for any application or method operating on device 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 604 may be implemented by any type or combination of volatile and non-volatile storage devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power supply component 606 provides power to the various components of device 600. The power components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 600.

The multimedia component 608 includes a screen that provides an output interface between the device 600 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 600 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 610 is configured to output and/or input audio signals. For example, audio component 610 includes a Microphone (MIC) configured to receive external audio signals when apparatus 600 is in an operating mode, such as a call mode, a record mode, and a voice recognition mode. The received audio signal may further be stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 614 includes one or more sensors for providing status assessment of various aspects of the apparatus 600. For example, the sensor component 614 may detect an open/closed state of the device 600, the relative positioning of components, such as a display and keypad of the apparatus 600, the sensor component 614 may also detect a change in position of the apparatus 600 or a component of the apparatus 600, the presence or absence of user contact with the apparatus 600, orientation or acceleration/deceleration of the apparatus 600, and a change in temperature of the apparatus 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 616 is configured to facilitate communication between the apparatus 600 and other devices in a wired or wireless manner. The apparatus 600 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 604 comprising instructions, executable by the processor 620 of the apparatus 600 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 7 is a schematic diagram of a server in some embodiments of the invention. The server 700 may vary significantly depending on configuration or performance, and may include one or more Central Processing Units (CPUs) 722 (e.g., one or more processors) and memory 732, one or more storage media 730 (e.g., one or more mass storage devices) storing applications 742 or data 744. Memory 732 and storage medium 730 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 730 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Further, the central processor 722 may be configured to communicate with the storage medium 730, and execute a series of instruction operations in the storage medium 730 on the server 700.

The server 700 may also include one or more power supplies 726, one or more wired or wireless network interfaces 750, one or more input-output interfaces 758, one or more keyboards 756, and/or one or more operating systems 741, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.

A machine-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform an input method as described in one or more of fig. 2-4.

A non-transitory computer readable storage medium in which instructions, when executed by a processor of an apparatus (smart terminal or server), enable the apparatus to perform an input method, the method comprising: acquiring an input string of a user; determining a target voice fragment corresponding to the input string according to a mapping relation between a preset character string and a voice fragment which is established in advance; and displaying the target voice fragment as a candidate item corresponding to the input string.

Optionally, the method further comprises: responding to the screen-on operation of the user on the target voice fragment, and screen-on the target voice fragment or the text information corresponding to the target voice fragment.

Optionally, the displaying the target voice fragment or the text information corresponding to the target voice fragment on the screen includes: acquiring configuration information, wherein the configuration information is used for representing a target screen-on format corresponding to the target voice fragment; and according to the target on-screen format represented by the configuration information, the target voice fragment or the text information corresponding to the target voice fragment is on-screen.

Optionally, the displaying the target voice fragment or the text information corresponding to the target voice fragment on the screen includes: determining a target screen-on format corresponding to the target voice fragment according to an application program environment corresponding to the input string; and according to a target screen-on format corresponding to the target voice fragment, screen-on the target voice fragment or the text information corresponding to the target voice fragment.

Optionally, after the target voice segment corresponding to the input string is determined according to a mapping relationship between a preset character string and a voice segment, the method further includes: acquiring text information corresponding to the target voice fragment; the presenting the target speech segment as the candidate item corresponding to the input string includes: and presenting the target voice fragment and the text information corresponding to the target voice fragment as a candidate item corresponding to the input string.

Optionally, the method further comprises: acquiring a voice fragment input by a user; acquiring a preset character string corresponding to the voice fragment and input by a user; and establishing a mapping relation between the voice segment and the preset character string.

Optionally, after the obtaining of the voice segment input by the user, the method further includes: acquiring text information corresponding to the voice fragment;

the establishing of the mapping relationship between the voice segments and the preset character strings comprises: and establishing a mapping relation among the voice fragment, the text information and the preset character string.

Optionally, the method further comprises: and sending the mapping relation between the preset character string and the voice fragment to a server.

Optionally, before the determining, according to a mapping relationship between a preset character string and a voice segment established in advance, a target voice segment corresponding to the input string, the method further includes: sending a corresponding relation synchronization request to a server; and receiving the mapping relation between the preset character string and the voice fragment returned by the server according to the corresponding relation synchronization request.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

The input method and the input device provided by the invention are described in detail, and the principle and the implementation mode of the invention are explained by applying specific examples, and the description of the examples is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. An input method, characterized in that the method comprises:

acquiring an input string of a user;

presenting the target voice fragment as a candidate item corresponding to the input string;

the method further comprises the following steps:

acquiring a voice fragment input by a user;

2. The method of claim 1, further comprising:

3. The method according to claim 2, wherein the displaying the target voice segment or the text message corresponding to the target voice segment on the screen comprises:

and according to the target on-screen format represented by the configuration information, the target voice fragment or the text information corresponding to the target voice fragment is on-screen.

4. The method according to claim 2, wherein the displaying the target voice segment or the text message corresponding to the target voice segment on the screen comprises:

5. The method according to claim 1, wherein after determining the target speech segment corresponding to the input string according to the pre-established mapping relationship between the preset character string and the speech segment, the method further comprises:

acquiring text information corresponding to the target voice fragment;

and presenting the target voice fragment and the text information corresponding to the target voice fragment as a candidate item corresponding to the input string.

6. The method of claim 1, wherein after the obtaining the speech segment of the user input, the method further comprises:

acquiring text information corresponding to the voice fragment;

and establishing a mapping relation among the voice fragment, the text information and the preset character string.

7. The method of claim 1, further comprising:

8. The method according to claim 1, wherein before the determining the target speech segment corresponding to the input string according to the pre-established mapping relationship between the preset character string and the speech segment, the method further comprises:

sending a corresponding relation synchronization request to a server;

9. An input device, the device comprising:

the first acquisition module is used for acquiring an input string of a user;

the display module is used for displaying the target voice fragment as a candidate item corresponding to the input string;

the device further comprises:

the third acquisition module is used for acquiring a voice fragment input by a user;

10. The apparatus of claim 9, further comprising:

11. The apparatus of claim 10, wherein the screen-up module comprises:

and the screen-on sub-module is used for displaying the target voice fragment or the text information corresponding to the target voice fragment on a screen according to the target screen-on format represented by the configuration information.

12. The apparatus of claim 10, wherein the screen-up module further comprises:

13. The apparatus of claim 9, further comprising:

the presentation module includes:

14. The apparatus of claim 9, further comprising:

the mapping relation establishing module comprises:

15. The apparatus of claim 9, further comprising:

16. The apparatus of claim 9, further comprising:

17. An input device comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:

acquiring an input string of a user;

the device also configured to execute, by one or more processors, the one or more programs also include instructions for:

acquiring a voice fragment input by a user;

and establishing a mapping relation between the voice fragment and the preset character string.

18. The apparatus of claim 17, wherein the apparatus is also configured to execute the one or more programs by one or more processors includes instructions for:

19. The apparatus of claim 17, wherein the displaying the target speech segment or the text message corresponding to the target speech segment on the screen comprises:

20. The apparatus of claim 18, wherein the displaying the target voice segment or the text message corresponding to the target voice segment on the screen comprises:

21. The apparatus of claim 17, wherein after determining the target speech segment corresponding to the input string according to a pre-established mapping relationship between preset character strings and speech segments, the apparatus is further configured to execute the one or more programs by the one or more processors including instructions for:

acquiring text information corresponding to the target voice fragment;

22. The device of claim 17, wherein, after the obtaining a speech segment for user input, the device is further configured to execute the one or more programs by one or more processors includes instructions for:

acquiring text information corresponding to the voice fragment;

23. The device of claim 17, wherein the device is also configured to execute the one or more programs by one or more processors includes instructions for:

24. The apparatus of claim 17, wherein before the determining the target speech segment corresponding to the input string according to the pre-established mapping relationship between the preset character string and the speech segment, the apparatus is further configured to execute the one or more programs by the one or more processors including instructions for:

sending a corresponding relation synchronization request to a server;

25. A machine-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform an input method as recited in one or more of claims 1-8.