CN114518805A

CN114518805A - Text generation method and device, electronic equipment and storage medium

Info

Publication number: CN114518805A
Application number: CN202210148464.1A
Authority: CN
Inventors: 王璟铭; 葛翔
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-02-17
Filing date: 2022-02-17
Publication date: 2022-05-20

Abstract

The disclosure provides a text generation method, a text generation device, electronic equipment and a storage medium, and relates to the technical field of computers, in particular to the fields of voice technology, intelligent search and the like. The specific implementation scheme is as follows: in response to receiving the voice information, generating initial text information corresponding to the voice information; in response to detecting that the initial text information comprises first digital information, generating at least one piece of replacement information, wherein the at least one piece of replacement information is used for representing that the first digital information is replaced by second digital information in other formats; and in response to receiving a selection operation for target replacement information in the at least one piece of replacement information, replacing the first digital information in the initial text information with the target digital information corresponding to the target replacement information in the second digital information to obtain target text information corresponding to the voice information.

Description

Text generation method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a text generation method and apparatus, an electronic device, and a storage medium.

Background

An input method refers to an encoding method employed for inputting various symbols into a computer or other electronic devices. The coding method for Chinese character input is to associate the sound, shape and meaning with specific keys and then to combine them according to different Chinese characters to complete the input of Chinese characters. With the development of computer technology, speech input has also evolved into a method that can be used to implement chinese character input. The speech input is a simple and easy-to-use input method for recognizing Chinese characters by computer according to the speech of operator.

Disclosure of Invention

The disclosure provides a text generation method, a text generation device, an electronic device and a storage medium.

According to an aspect of the present disclosure, there is provided a text generation method including: in response to receiving voice information, generating initial text information corresponding to the voice information; in response to detecting that first digital information is included in the initial text information, generating at least one piece of replacement information, wherein the at least one piece of replacement information is used for representing that the first digital information is replaced by second digital information in other formats; and in response to receiving a selection operation aiming at target replacement information in the at least one piece of replacement information, replacing first digital information in the initial text information with target digital information corresponding to the target replacement information in the second digital information to obtain target text information corresponding to the voice information.

According to another aspect of the present disclosure, there is provided a text generation apparatus including: the device comprises a first generation module, a second generation module and a third generation module, wherein the first generation module is used for responding to received voice information and generating initial text information corresponding to the voice information; a second generating module, configured to generate at least one piece of replacement information in response to detecting that the initial text information includes first digital information, where the at least one piece of replacement information is used to represent that the first digital information is replaced with second digital information in another format; and a first replacing module, configured to replace, in response to receiving a selection operation for target replacement information in the at least one piece of replacement information, first digital information in the initial text information with target digital information corresponding to the target replacement information in the second digital information, so as to obtain target text information corresponding to the voice information.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the text generation method of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a text generation method of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the text generation method of the present disclosure.

It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates an exemplary system architecture to which the text generation method and apparatus may be applied, according to an embodiment of the disclosure;

FIG. 2 schematically shows a flow diagram of a text generation method according to an embodiment of the disclosure;

FIG. 3 schematically shows a schematic diagram of a text generation method according to one embodiment of the present disclosure;

FIG. 4 schematically shows a schematic diagram of generating at least one replacement information according to an embodiment of the present disclosure;

FIG. 5 schematically shows a schematic diagram of a text generation method according to another embodiment of the present disclosure;

FIG. 6 schematically shows a block diagram of a text generation apparatus according to an embodiment of the present disclosure; and

FIG. 7 shows a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure, application and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations, necessary confidentiality measures are taken, and the customs of the public order is not violated.

In the technical scheme of the disclosure, before the personal information of the user is acquired or collected, the authorization or the consent of the user is acquired.

With the development of voice technology and the improvement of voice input accuracy, more and more users can select a voice input mode when inputting texts, and particularly under the condition that the users are inconvenient to operate by hands, the voice input can provide great convenience for the users. Under the condition that the voice input comprises digital information, after the numbers are recognized by each input method, the display form of the input method is determined to be Arabic numerals, Chinese characters or a specific format mainly by formulating various rules. For example, when a long string of digits is detected, the format of Arabic numerals is adjusted to be displayed.

The inventor finds in implementing the disclosed concept that presenting numbers in a suitable format is a difficult point to avoid in the speech input process, including for example: the collocation and the situation of the numbers are variable, the display strategy is difficult to cover all situations, and when the same digital information is processed by a plurality of rules, unreasonable display results can be generated. The user may have personalized preferences, the presentation format of the numbers itself has no absolute standard, and as the situation changes, the user may prefer different presentation formats for the same digital information. In the process of voice input, because of different habits of spoken language input, the correct recognition and number display are more difficult. Some scenarios cannot be judged according to voice. Therefore, the display result cannot always meet the requirements of the user, and the number is often used as key information, and the correct expression of the number is crucial to the correct communication of the information.

The disclosure provides a text generation method, a text generation device, an electronic device and a storage medium. The text generation method comprises the following steps: in response to receiving the voice information, initial text information corresponding to the voice information is generated. In response to detecting that the first digital information is included in the initial text information, at least one piece of replacement information is generated, wherein the at least one piece of replacement information is used for representing that the first digital information is replaced by second digital information in other formats. And in response to receiving a selection operation aiming at the target replacing information in the at least one replacing information, replacing the first digital information in the initial text information with the target digital information corresponding to the target replacing information in the second digital information to obtain the target text information corresponding to the voice information.

Fig. 1 schematically illustrates an exemplary system architecture to which the text generation method and apparatus may be applied, according to an embodiment of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. For example, in another embodiment, an exemplary system architecture to which the text generation method and apparatus may be applied may include a terminal device, but the terminal device may implement the text generation method and apparatus provided in the embodiments of the present disclosure without interacting with a server.

As shown in fig. 1, a system architecture 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104, and a server 105. Network 104 is the medium used to provide communication links between

terminal devices

101, 102, 103 and server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.

A user may use

terminal devices

101, 102, 103 to interact with a server 105 over a network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as a knowledge reading application, a web browser application, a search application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (by way of example only).

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for content browsed by the user using the

terminal devices

101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and a VPS service ("Virtual Private Server", or "VPS" for short). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be noted that the text generation method provided by the embodiment of the present disclosure may be generally executed by the

terminal device

101, 102, or 103. Accordingly, the text generation apparatus provided in the embodiment of the present disclosure may also be provided in the

terminal device

101, 102, or 103.

Alternatively, the text generation method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the text generation apparatus provided by the embodiment of the present disclosure may be generally disposed in the server 105. The text generation method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the text generation apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

For example, when text information needs to be generated, the

terminal devices

101, 102, 103 may acquire voice information and then transmit the acquired voice information to the server 105, the server 105 generates initial text information corresponding to the voice information in response to receiving the voice information, generates at least one piece of replacement information in response to detecting that first digital information is included in the initial text information, the at least one piece of replacement information being used to represent replacement of the first digital information with second digital information in another format, and replaces the first digital information in the initial text information with target digital information corresponding to the target replacement information in the second digital information in response to receiving a selection operation for the target replacement information in the at least one piece of replacement information, to obtain the target text information corresponding to the voice information. Or by a server or server cluster capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105, and to achieve the target text information corresponding to the voice information.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.

Fig. 2 schematically shows a flow chart of a text generation method according to an embodiment of the present disclosure.

As shown in fig. 2, the method includes operations S210 to S230.

In operation S210, in response to receiving the voice information, initial text information corresponding to the voice information is generated.

In operation S220, in response to detecting that the first digital information is included in the initial text information, at least one replacement information is generated, the at least one replacement information being used to represent replacement of the first digital information with second digital information in another format.

In operation S230, in response to receiving a selection operation for target replacement information of the at least one replacement information, first digital information of the initial text information is replaced with target digital information of the second digital information corresponding to the target replacement information, resulting in target text information corresponding to the voice information.

According to an embodiment of the present disclosure, the first numerical information may characterize each of the numerical related information in the initial textual information. Other formats may include at least one of a chinese format, an arabic numeral format, and a roman numeral format, among others. The replacement information may include at least one of replacement information representing that the first numerical information is replaced with the target numerical information in the chinese format, replacement information representing that the first numerical information is replaced with the target numerical information in the arabic numerical format, replacement information representing that the first numerical information is replaced with the target numerical information in the roman numerical format, and the like. The second digital information may represent digital information in other formats than the digital format of the first digital information, and the number of the second digital information may be the same as the number of the replacement information, i.e., each of the replacement information corresponds to the second digital information which is the same as the voice information of the first digital information. The target text information may be determined based on the initial text information that is replaced with the numeric format.

Fig. 3 schematically shows a schematic diagram of a text generation method according to an embodiment of the present disclosure.

As shown in fig. 3, clicking on microphone 310 may trigger voice input page 300 and begin detecting voice information. For detected speech information, it may be converted in real-time to corresponding initial text information and may be presented in text entry box 320. In the case where it is detected that the first numerical information is included in the initial text information, the

replacement information

330, 340 may be generated, and may not be limited thereto. The replacement information 330 may, for example, represent that the first digital information is all replaced with numbers in a chinese format for presentation. The replacement information 340 may, for example, characterize that the first numerical information is all replaced with digits in an arabic numerical format for presentation.

According to an embodiment of the present disclosure, as shown in fig. 3, for example, from the voice information, an initial text message 321 "i teach three and two shifts high, roughly a dozen or so 20 persons" may be obtained, and the initial text message may be determined: 321 includes first digital information such as "three or two", "tens of 20", etc. By clicking the replacement information 330, the first numeric information in the initial text information 321 may be replaced by a Chinese character, and the obtained target text information may be "i teach three or two shifts higher, and about a dozen or twenty persons", for example. By clicking the replacement information 340 by the user, the first numeric information in the initial text information 321 may be replaced by arabic numerals, and the obtained target text information may be "i teach up to 32 shifts, which is about 10 or 20 persons", for example.

Through the embodiment of the disclosure, aiming at the digital recognition and display in the voice input scene, a scheme which is convenient for a user to quickly modify the digital format is realized. Based on the scheme, when the digital related content is identified, at least one piece of replacement information is generated, so that a quick replacement option can be provided, the modification cost of a user on the digital is reduced, and text information which better meets the actual application requirement can be obtained.

The method shown in fig. 2 is further described below with reference to specific examples.

According to an embodiment of the present disclosure, in response to detecting that the first numerical information is included in the initial text information, generating the at least one replacement information may include: in response to detecting that the first numeric information is included in the initial text information, generating intermediate information. At least one replacement information is generated in response to receiving the selection operation for the intermediate information.

According to embodiments of the present disclosure, the intermediate information may include, for example, a popup for characterizing the replaceable number format. Selecting the pop-up window may, for example, characterize an operation that allows subsequent replacement of the first numerical information in the initial textual information.

Fig. 4 schematically shows a schematic diagram of generating at least one replacement information according to an embodiment of the present disclosure.

As shown in fig. 4, clicking on the microphone 410 may trigger the voice input page 400 to start detecting voice information, and may convert the detected voice information into corresponding initial text information in real time, which is shown in the text input box 420, such as the initial text information 421. In case it is detected that the first numerical information is included in the initial text information 421, a popup 430 for getting the

replacement information

431, 432, etc. may be first generated. The pop-up window 430 may characterize that a format replacement operation is to be performed on the first numeric information in the initial textual information 421. By the user clicking the popup 430, for example, and without limitation, the

replacement information

431, 432 for implementing format replacement may be generated.

Through the embodiment of the disclosure, the method for obtaining at least one piece of replacement information according to the intermediate information can add one step of operation for confirming replacement before selecting the replacement information, and can effectively reduce the situation of mistakenly replacing the first digital information possibly caused by mistakenly selecting a certain piece of replacement information, thereby improving the user experience.

According to an embodiment of the present disclosure, the text generation method may further include: at least one presentation information is generated in response to detecting the cursor moving to a position corresponding to the third numeric information in the initial text information. The voice information corresponding to each of the at least one presentation information is the same as the voice information corresponding to the third digital information. And replacing the third digital information with the target display information in response to receiving the selection operation aiming at the target display information.

According to an embodiment of the present disclosure, the third numerical information may characterize at least one of each of the numerical related information in the initial textual information, and other information in a predefined format, and the like. Other information in the predefined format may include, for example, information in a format of X time, etc. The location corresponding to the third digital information may include: and the position before the first digit in the third digit information can be used for placing the cursor, the position between every two adjacent digits in the third digit information can be used for placing the cursor, the position after the last digit in the third digit information can be used for placing the cursor, and the like, and the method is not limited to this.

According to an embodiment of the present disclosure, the presentation information may include at least one of time-type presentation information, quality-type presentation information, formula-type presentation information, and the like. The time-type presentation information may include, for example, XX: at least one of XX, X, and the like. The quality type presentation information may include, for example, at least one of x.x grams, x.x kg, and the like. The formula-based presentation information may include various calculation forms such as X × Y ═ Z, for example.

According to the embodiment of the present disclosure, the third digital information and the presentation information having the same voice information may be stored in the database in advance in the form of matching pairs, and each matching item in the matching pairs may be digital information with a preset structure, such as XX: XX, X time X, x.x grams, etc. to match the numbers in the actual application. For example, "work for 30 days" and "work for 31 days", "three-point-and-one-moment" and "3.1 grams" have no difference in voice.

According to an embodiment of the present disclosure, the presentation information may also be presentation information having the same semantic meaning as that of the semantic information corresponding to the third digital information. The third digital information and the presentation information with the same semantic information may be pre-stored in the database in the form of matching pairs, and each matching item in the matching pairs may be digital information with a preset structure, such as XX: XX, X time X, X. dot and half, etc. to match the numbers in the actual application. For example, "eight and a half," "8: 30," and "8-time 30-point" have the same semantics.

According to the embodiment of the disclosure, the cursor can position the third digital information of the corresponding structure, and when the user moves the cursor to the vicinity of a certain third digital information, at least one piece of presentation information corresponding to the third digital information can be generated according to the matching item related to the third digital information and stored in the database.

For example, when the user inputs "liu yi" by voice, he may want to refer to "six one" in the section of six children, or actually want to refer to "61" due to the expression habit of the user, and in the case where the initial text information is displayed as "six one", the display information of "61" may be generated. Based on the positioning of the cursor, it is also possible to implement a substitution between arabic and kanji, such as substituting "20" for "twenty" in the initial text information, and a substitution in ambiguous numeric information, such as substituting "3: 15 "to" 3.1 grams ", etc.

Fig. 5 schematically shows a schematic diagram of a text generation method according to another embodiment of the present disclosure.

As shown in fig. 5, clicking on the microphone 510 may trigger the voice input page 500 to start detecting voice information, and may convert the detected voice information into corresponding initial text information in real time, which is shown in the text input box 520. In case it is detected that the first numerical information is included in the initial text information, the

replacement information

531, 532 may be generated.

According to an embodiment of the present disclosure, referring to FIG. 5, for example, the initial text message 521 "may be derived from the speech message" 3.1 grams in ten minutes. In the event that the user moves the cursor 540 between "grams" and "what" was in the initial text information 521, the third digital information "3.1 grams" may be located and the

presentation information

541, 542 may be generated accordingly. The user may select the presentation information 541, for example, and may replace "3.1 grams" in the initial text information 521 with "three points and one moment", and the obtained target text information may be "three points and one moment in ten minutes later", for example.

Through the embodiment of the disclosure, the specific digital information in the initial text information can be positioned, the corresponding display information is generated, the quick replacement option is provided, the modification of the specific digital information is realized, the problem of inaccurate replacement result possibly caused by one-key replacement is solved, and the user experience is improved.

Fig. 6 schematically shows a block diagram of a text generation apparatus according to an embodiment of the present disclosure.

As shown in fig. 6, the text generating apparatus 600 includes a first generating module 610, a second generating module 620, and a replacing module 630.

A first generating module 610 is configured to generate initial text information corresponding to the voice information in response to receiving the voice information.

A second generating module 620, configured to generate at least one replacement information in response to detecting that the initial text information includes the first numerical information. At least one replacement information is used to characterize replacement of the first digital information with second digital information in another format.

A first replacing module 630, configured to replace, in response to receiving a selection operation for a target replacement information in the at least one replacement information, the first digital information in the initial text information with the target digital information corresponding to the target replacement information in the second digital information, resulting in target text information corresponding to the voice information.

According to an embodiment of the present disclosure, the text generation apparatus further includes a third generation module and a second replacement module.

And the third generation module is used for responding to the detection that the cursor moves to the position corresponding to the third digital information in the initial text information and generating at least one piece of display information. The voice information corresponding to each of the at least one presentation information is the same as the voice information corresponding to the third digital information.

And the second replacing module is used for replacing the third digital information with the target display information in response to receiving the selection operation aiming at the target display information.

According to an embodiment of the present disclosure, the second generating module includes a first generating unit and a second generating unit.

And the first generating unit is used for responding to the detection that the initial text information comprises the first digital information and generating the intermediate information.

A second generating unit configured to generate at least one piece of replacement information in response to receiving a selection operation for the intermediate information.

According to an embodiment of the present disclosure, the replacement information may include at least one of: the information processing apparatus includes replacement information representing that the first numerical information is replaced with target numerical information in a chinese character format, replacement information representing that the first numerical information is replaced with target numerical information in an arabic number format, and replacement information representing that the first numerical information is replaced with target numerical information in a roman number format.

According to an embodiment of the present disclosure, the presentation information may include at least one of: time-based presentation information, quality-based presentation information, and formula-based presentation information.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the text generation methods of the present disclosure.

According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium having stored thereon computer instructions for causing a computer to execute a text generation method of the present disclosure.

According to an embodiment of the disclosure, a computer program product comprising a computer program which, when executed by a processor, implements the text generation method of the disclosure.

FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 can be stored. The calculation unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (J/O) interface 705 is also connected to bus 704.

Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 701 executes the respective methods and processes described above, such as the text generation method. For example, in some embodiments, the text generation method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the text generation method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the text generation method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A text generation method, comprising:

in response to receiving voice information, generating initial text information corresponding to the voice information;

in response to detecting that first digital information is included in the initial text information, generating at least one piece of replacement information, wherein the at least one piece of replacement information is used for representing that the first digital information is replaced by second digital information in other formats; and

In response to receiving a selection operation for target replacement information in the at least one piece of replacement information, replacing first digital information in the initial text information with target digital information corresponding to the target replacement information in the second digital information, and obtaining target text information corresponding to the voice information.

2. The method of claim 1, further comprising:

generating at least one piece of presentation information in response to detecting that the cursor moves to a position corresponding to third digital information in the initial text information, wherein the voice information corresponding to each piece of presentation information in the at least one piece of presentation information is the same as the voice information corresponding to the third digital information; and

in response to receiving a selection operation for target presentation information, replacing the third digital information with the target presentation information.

3. The method of claim 1, wherein said generating at least one replacement information in response to detecting that first numeric information is included in the initial textual information comprises:

generating intermediate information in response to detecting that the initial text information comprises first digital information; and

Generating the at least one replacement information in response to receiving a selection operation for the intermediate information.

4. The method of any of claims 1 to 3, wherein the replacement information comprises at least one of:

replacement information representing the target digital information in which the first digital information is replaced with the Chinese character format,

Representing replacement information for replacing the first digital information with target digital information in an Arabic digital format; and

replacement information characterizing replacement of the first numerical information with target numerical information in roman numeral format.

5. The method of claim 2, wherein the presentation information comprises at least one of: time-based presentation information, quality-based presentation information, and formula-based presentation information.

6. A text generation apparatus comprising:

the first generation module is used for responding to the received voice information and generating initial text information corresponding to the voice information;

the second generation module is used for responding to the detection that the initial text information comprises the first digital information, and generating at least one piece of replacement information, wherein the at least one piece of replacement information is used for representing that the first digital information is replaced by second digital information in other formats; and

And the first replacing module is used for replacing the first digital information in the initial text information with the target digital information corresponding to the target replacing information in the second digital information in response to receiving a selection operation aiming at the target replacing information in the at least one replacing information, so as to obtain the target text information corresponding to the voice information.

7. The apparatus of claim 6, further comprising:

a third generating module, configured to generate at least one piece of presentation information in response to detecting that the cursor moves to a position corresponding to third digital information in the initial text information, where voice information corresponding to each piece of presentation information in the at least one piece of presentation information is the same as voice information corresponding to the third digital information; and

a second replacement module, configured to replace the third digital information with the target presentation information in response to receiving a selection operation for the target presentation information.

8. The apparatus of claim 6, wherein the second generating means comprises:

the first generating unit is used for responding to the detection that the initial text information comprises first digital information and generating intermediate information; and

A second generating unit configured to generate the at least one replacement information in response to receiving a selection operation for the intermediate information.

9. The apparatus of any of claims 6 to 8, wherein the replacement information comprises at least one of:

replacement information representing that the first digital information is replaced with the target digital information in the Chinese character format,

10. The apparatus of claim 7, wherein the presentation information comprises at least one of: time-based presentation information, quality-based presentation information, and formula-based presentation information.

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.

13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-5.