CN111798829A

CN111798829A - Method, system, computer equipment and storage medium for reading text information by voice

Info

Publication number: CN111798829A
Application number: CN202010617492.4A
Authority: CN
Inventors: 赵慧; 陈蛟
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2020-10-20

Abstract

The embodiment of the present disclosure provides a method, a system, a computer device and a computer readable storage medium for reading text information by voice, wherein the method is applied to a communication management platform, and comprises: receiving character information edited by a sender user and identification marks of users of both a transceiver and a sender, which are sent by a sender user terminal; sending the text information to an edge server so that the edge server converts the text information into an audio frequency with the voiceprint characteristics of a sender user; receiving the audio sent by an edge server; and establishing communication connection with the receiver user terminal according to the identity marks of the users of the transmitter and the receiver, and playing the audio after the connection is successful. According to the voice text reading method and device, on one hand, voice control operation is avoided in the process of reading text information through voice, on the other hand, the text information is read through the personalized voice which accords with the voiceprint characteristics of the sender user, the reading voice is rich in emotion and affinity, and user experience is good.

Description

Method, system, computer equipment and storage medium for reading text information by voice

Technical Field

The present disclosure relates to the field of communications technologies, and in particular, to a method, a system, a computer device, and a computer-readable storage medium for reading text information with voice.

Background

Because the text information usually needs to be read by a person with attention, the text short messages are often inconvenient to view, such as driving and in motion, and the text short messages are inconvenient to view for people with visual impairment, old people and the like. Although some vehicle-mounted systems can be connected to the mobile phone at present, the mobile phone can read the short message of the mobile phone by reading the voice when receiving the short message, the method for reading the text information by voice requires voice control operation of a user, and the reading voice is relatively hard, does not have emotion, lacks affinity and needs to be improved.

Therefore, it is an urgent need to solve the above-mentioned problems to provide a scheme for reading text information by using personalized voice without voice control operation.

Disclosure of Invention

The present disclosure has been made to at least partially solve the technical problems occurring in the prior art.

According to an aspect of the embodiments of the present disclosure, a method for reading text information by voice is provided, which is applied to a communication management platform, and the method includes:

receiving character information edited by a sender user and identification marks of users of both a transceiver and a sender, which are sent by a sender user terminal;

sending the text information to an edge server so that the edge server converts the text information into an audio frequency with the voiceprint characteristics of a sender user;

receiving the audio sent by an edge server; and the number of the first and second groups,

and establishing communication connection with the receiver user terminal according to the identity marks of the users of the transmitter and the receiver, and playing the audio after the connection is successful.

According to another aspect of the embodiments of the present disclosure, there is provided a method for reading text information in voice, which is applied to an edge server, the method including:

receiving character information sent by a communication management platform, wherein the character information is the character information which is received by the communication management platform from a sender user terminal and is edited by a sender user;

converting the text information into audio with voiceprint characteristics of a sender user; and the number of the first and second groups,

and sending the audio to a communication management platform so that the communication management platform establishes communication connection with a receiver user terminal according to the identity marks of the users of the receiver and the transmitter, and playing the audio after the connection is successful.

According to another aspect of the embodiments of the present disclosure, there is provided a system for reading text information by voice, the system including a communication management platform, the communication management platform including:

the first receiving module is arranged to receive the character information edited by the sender user and the identity marks of the users of the two transceivers, which are sent by the sender user terminal;

the forwarding module is used for sending the character information to the edge server so that the edge server converts the character information into an audio frequency with the voiceprint characteristics of the sender user;

the first receiving module is further configured to receive the audio sent by the edge server; and the number of the first and second groups,

and the communication module is set to establish communication connection with the receiver user terminal according to the identity marks of the users of the two transceivers and play the audio after the connection is successful.

According to another aspect of the embodiments of the present disclosure, there is provided a system for reading text information by voice, the system including an edge server, the edge server including:

the second receiving module is used for receiving the character information sent by the communication management platform, wherein the character information is the character information which is received by the communication management platform from the terminal of the sender user and is edited by the sender user; and the number of the first and second groups,

and the conversion module is arranged for converting the text information into audio with voiceprint characteristics of the sender user and sending the audio to the communication management platform, so that the communication management platform establishes communication connection with the receiver user terminal according to the identity of the users of the sender and the receiver and plays the audio after the connection is successful.

According to a further aspect of the embodiments of the present disclosure, there is provided a computer device, including a memory and a processor, where the memory stores a computer program, and when the processor runs the computer program stored in the memory, the processor executes the method for reading text information by voice.

According to a further aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the processor executes the method for reading text information by voice.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

according to the method for reading the text information by voice, the edge server converts the text information edited by the sender user into the audio with the voiceprint characteristics of the sender user, and the audio is automatically played to the receiver user after the communication management platform is in communication connection with the receiver user terminal.

Additional features and advantages of the disclosure will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the disclosure. The objectives and other advantages of the disclosure may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the disclosed embodiments and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the example serve to explain the principles of the disclosure and not to limit the disclosure.

Fig. 1 is a schematic flow chart illustrating a method for reading text information by voice according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart illustrating another method for reading text messages with voice according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a system for reading text information by voice according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of another system for reading text messages with voice according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of another system for reading text information by voice according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a computer device provided in an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, specific embodiments of the present disclosure are described below in detail with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order; also, the embodiments and features of the embodiments in the present disclosure may be arbitrarily combined with each other without conflict.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of explanation of the present disclosure, and have no specific meaning in themselves. Thus, "module", "component" or "unit" may be used mixedly.

Fig. 1 is a schematic flow chart of a method for reading text information by voice according to an embodiment of the present disclosure. The method for reading the text information by voice is applied to a communication management platform, and as shown in fig. 1, the method includes the following steps S101 to S104.

S101, receiving character information edited by a sender user and identification marks of users of both transceivers, wherein the character information is sent by a sender user terminal, and the identification marks at least comprise: a mobile phone number;

s102, the character information is sent to an edge server, so that the edge server converts the character information into an audio frequency with the voiceprint characteristics of a sender user;

s103, receiving the audio sent by the edge server;

and S104, establishing communication connection with the receiver user terminal according to the identity marks of the users of the two transceivers, and playing the audio after the connection is successful.

In step S104, the communication management platform dials a voice call to the recipient user with the identity of the sender user, and when the recipient user is connected, the communication management platform establishes a communication connection with the recipient user terminal, and then the communication management platform immediately plays the audio, and automatically hangs up after the audio is played.

It should be noted that, before the above-mentioned steps related to the voice reading text message are executed, the sender user is further required to customize a "personalized voice reading text message service" at the communication management platform, where the service refers to: when the receiver user is inconvenient to check the text short message from the sender user, the communication management platform and the edge server cooperate with each other to convert the text short message into audio with the voiceprint characteristics of the sender user, and then the audio is played to the receiver user in a dialing mode, so that the text short message is read out by using the own voice of the sender user.

In the embodiment of the disclosure, the character information edited by the sender user is converted into the audio with the voiceprint characteristics of the sender user through the edge server, and the audio is automatically played to the receiver user after the communication management platform establishes communication connection with the receiver user terminal, so that on one hand, voice control operation is avoided in the process of reading the character information in a voice mode, on the other hand, the character information is read in a personalized voice mode conforming to the voiceprint characteristics of the sender user, the read voice is rich in emotion and affinity, and user experience is good.

In one embodiment, the method further includes the following step S105:

and S105, making the audio into a voice short message, and sending the voice short message to a receiver user terminal, so that the receiver user can conveniently check the voice short message later.

The voice short message is also made by the voice of the sender user. The sending of the voice short message to the receiver user terminal can occur when the audio is played or after the audio is played.

In one embodiment, the method further includes the following step S106:

s106, voice call data of the sender user and other users are collected according to the identity of the sender user, a plurality of sound samples with the time length exceeding a preset time length threshold are obtained, the sound samples are sent to an edge server, so that the edge server trains the sound samples to obtain a neural network voice model, and the text information is input into the neural network voice model to generate audio with the voiceprint characteristics of the sender user.

In the embodiment of the present disclosure, before the communication management platform collects the sound sample of the originator user, the originator user is further required to authorize the communication management platform to permit the communication management platform to obtain the voice call data between the originator user and other users, so as to obtain a plurality of sound samples. Specifically, after the sender user authorizes, the sender user establishes a call with other users, and in the process of transmitting voice call data, the communication management platform acquires the voice call data of the sender user for N seconds, stores the voice call data as a sound sample, and then sends the stored sound samples to the edge server. By the mode, the communication management platform collects a large number of sound samples of the sender user in the call process, so that the edge server can generate the text information into audio with the voiceprint characteristics of the sender user.

The edge server can simulate a large number of sound samples of real human natural language training by a neural network language method to obtain personalized voice which accords with the voiceprint characteristics of the sender user so as to read the text information through the personalized voice of the sender user. The edge server has the functions of data receiving and sending, data processing and judging, object identification, data storage and the like.

In one embodiment, before step S102, the method further comprises the following step S107:

s107, inquiring whether the receiver user is in a state of conveniently checking the text information according to the identity of the receiver user.

Step S102 specifically includes: and if the receiver user is in a state that the receiver user is inconvenient to check the text information, sending the text information to an edge server.

In one embodiment, step S107 includes at least one of the following steps S107-1, S107-2, and S107-3.

S107-1, issuing a detection instruction to the recipient user terminal according to the identity of the recipient user so that the recipient user terminal detects whether the recipient user terminal is in a mobile state or is connected with a vehicle-mounted system (such as carplay, carpife and the like) after receiving the detection instruction, and if the detection result indicates that the recipient user terminal is in the mobile state or is connected with the vehicle-mounted system, feeding back the state that the recipient user is inconvenient to check the text information to a communication management platform;

s107-2, acquiring the age of the recipient user according to the identity of the recipient user, judging whether the age of the recipient user is greater than a preset age threshold (such as 60 years), and if so, judging that the recipient user is in a state of inconvenient viewing of the text information;

s107-3, sending an inquiry instruction to the medical service platform according to the identity of the receiver user so that the medical service platform inquires whether the receiver user has the vision disorder disease or not based on the inquiry instruction, and if the receiver user has the vision disorder disease, feeding back the state that the receiver user is inconvenient to check the text information to the communication management platform. The medical service platform is a platform having medical information such as personal health data.

In this embodiment of the present disclosure, the communication management platform may query whether the recipient user is in a state where the recipient user is convenient to view the text information by using at least one of the steps S107-1, S107-2, and S107-3, and the communication management platform sends the text information to the edge server as long as a query result that the recipient user is in a state where the recipient user is inconvenient to view the text information is obtained. Certainly, if the communication management platform does not obtain an inquiry result that the receiver user is in a state that the receiver user is inconvenient to check the text information, the communication management platform indicates that the receiver user is convenient to check the text information at the moment, and directly sends the text information to the receiver user terminal.

Fig. 2 is a flowchart illustrating another method for reading text information by voice according to an embodiment of the present disclosure. The method for reading the text information by voice is applied to an edge server, as shown in fig. 2, and the method includes the following steps S201 to S203.

S201, receiving character information sent by a communication management platform, wherein the character information is the character information which is received by the communication management platform from a sender user terminal and is edited by the sender user;

s202, converting the text information into an audio frequency with a voiceprint characteristic of a sender user;

and S203, sending the audio to a communication management platform so that the communication management platform establishes communication connection with a receiver user terminal according to the identity of the users of the transceiver and plays the audio after the connection is successful.

In one embodiment, the method further includes the following steps S204 and S205.

S204, receiving a plurality of sound samples sent by a communication management platform, wherein the sound samples are obtained by the communication management platform according to the identity of the sender user and collecting voice call data of the sender user and other users, and the time length of the sound samples exceeds a preset time length threshold value;

s205, training the sound samples to obtain a neural network voice model.

Step S202 specifically includes: and inputting the text information into the neural network voice model to generate audio with the voiceprint characteristics of the sender user.

In the embodiment of the disclosure, the neural network speech model is exclusive for the originator user, the edge server inputs the received text information into the exclusive neural network speech model of the originator user, the neural network speech model inputs the character sequence of the text information into the encoder, the sequence of the text extracted by the encoder is represented, each character is represented as a unique heat vector and embedded into a continuous vector, then a nonlinear transformation is added, and a dropout is added to reduce overfitting, thereby substantially reducing pronunciation errors of words. The decoder used by the neural network speech model is a content attention-based tanh decoder, and then a waveform map is generated by using a Griffin-Lim algorithm, namely the text information is successfully converted into audio with speaking tone of the sender user.

In one embodiment, step S205 includes steps S205-1 through S205-3 as follows.

S205-1, respectively preprocessing each sound sample to form a standardized digital voice feature file;

s205-2, respectively converting each sound sample into a voice text, wherein the voice text comprises characters, tone and duration;

s205-3, the standardized digital voice feature file and the voice text are trained and parameter-fitted through a neural network to form a neural network voice model.

In the embodiment of the present disclosure, the step S205-1 of preprocessing the sound sample specifically includes: and (3) carrying out processes of removing noise, reducing dimension, structuring frames, inserting mute frames and the like on the sound samples. In step S205-3, the normalized digital speech feature file is used as an input set, the speech text is used as an output set, and a neural network (such as a convolutional network or a semi-hidden markov network) is used for training and parameter fitting, so as to form a neural network speech model.

It should be noted that the sequence of the above steps is only a specific example provided for illustrating the embodiment of the present disclosure, and the present disclosure does not limit the sequence of the above steps, and those skilled in the art can adjust the sequence as required in practical application.

The method for reading text information by voice provided by the embodiment of the disclosure acquires voice call data of a sender user and other users through a communication management platform to obtain a plurality of sound samples with the time length exceeding a preset time length threshold, trains the sound samples through an edge server to obtain a neural network voice model, inputs the text information into the neural network voice model to generate an audio frequency with voiceprint characteristics of the sender user, judges whether a receiver user is in a state where the text information is inconvenient to view through the communication management platform, if the receiver user is in the state where the text information is inconvenient to view, the edge server inputs the text information into the neural network voice model to generate an audio frequency with personalized voice characteristics of the user, calls the receiver user through the communication management platform according to the identity of the sender user and plays the audio frequency, meanwhile, a piece of voice information is issued to the receiver user, so that the receiver user can conveniently check the voice information later, and the user experience is better.

Fig. 3 is a schematic structural diagram of a system for reading text information by voice according to an embodiment of the present disclosure. As shown in fig. 3, the system includes a communication management platform 3, and the communication management platform 3 includes: a first receiving module 31, a forwarding module 32 and a communication module 33.

The first receiving module 31 is configured to receive the text information edited by the sender user and the identification of the users at both the sender and the receiver, which are sent by the sender user terminal; the forwarding module 32 is configured to send the text message to the edge server, so that the edge server converts the text message into an audio having a voiceprint characteristic of the sender user; the first receiving module 31 is further configured to receive the audio sent by the edge server; the communication module 33 is configured to establish a communication connection with the recipient user terminal according to the identification of the users of both the transceiver and the recipient, and play the audio after the connection is successful.

In one embodiment, the communication management platform 3 further comprises: and the voice short message module 34 is configured to make the audio into a voice short message and send the voice short message to the receiver user terminal.

In one embodiment, the communication management platform 3 further comprises: the acquisition module 35 is configured to acquire voice call data of the originator user and other users according to the identity of the originator user, obtain a plurality of sound samples with duration exceeding a preset duration threshold, and send the sound samples to the edge server, so that the edge server trains the sound samples to obtain a neural network voice model, and inputs the text information into the neural network voice model to generate an audio with voiceprint characteristics of the originator user.

In one embodiment, the communication management platform 3 further comprises: the query module 36 is configured to query whether the recipient user is in a state where the text information is convenient to view according to the identity of the recipient user.

The forwarding module 32 is specifically configured to send the text message to the edge server if the querying module 36 queries that the recipient user is in a state where the recipient user is inconvenient to view the text message.

In one embodiment, the query module 36 includes: at least one of the first query unit, the second query unit, and the third query unit.

The first query unit is set to issue a detection instruction to the receiver user terminal according to the identity of the receiver user so that the receiver user terminal detects whether the receiver user terminal is in a moving state or connected with a vehicle-mounted system after receiving the detection instruction, and if the detection result shows that the receiver user terminal is in the moving state or connected with the vehicle-mounted system, the first query unit feeds back the state that the receiver user is inconvenient to check the text information to the first query unit;

the second query unit is set to acquire the age of the recipient user according to the identity of the recipient user, judge whether the age of the recipient user is greater than a preset age threshold, and judge that the recipient user is in a state that the text information is inconvenient to view if the age of the recipient user is greater than the preset age threshold;

the third query unit is set to send a query instruction to the medical service platform according to the identity of the recipient user so that the medical service platform queries whether the recipient user has the vision disorder disease or not based on the query instruction, and if the recipient user has the vision disorder disease, the third query unit feeds back the state that the recipient user is inconvenient to check the text information.

In the embodiment of the present disclosure, the query module 36 may query whether the recipient user is in a state that the recipient user is convenient to view the text information by using at least one of the first query unit, the second query unit, and the third query unit, and send the text information to the edge server as long as one of the first query unit, the second query unit, and the third query unit obtains a query result that the recipient user is in a state that the recipient user is inconvenient to view the text information.

Fig. 4 is a schematic structural diagram of another system for reading text information by voice according to an embodiment of the present disclosure. As shown in fig. 4, the system includes an edge server 4, and the edge server 4 includes: a second receiving module 41 and a converting module 42.

The second receiving module 41 is configured to receive the text information sent by the communication management platform, where the text information is the text information that is received by the communication management platform from the sender user terminal and edited by the sender user; the conversion module 42 is configured to convert the text information into an audio with voiceprint characteristics of the sender user, and send the audio to the communication management platform, so that the communication management platform establishes communication connection with the receiver user terminal according to the identity of the users of the sender and the receiver, and plays the audio after the connection is successful.

In one embodiment, the second receiving module 41 is further configured to receive a plurality of sound samples sent by the communication management platform, where the plurality of sound samples are obtained by the communication management platform according to the identity of the originator user and voice call data of other users, and a duration of the sound samples exceeds a preset duration threshold.

The edge server 4 further includes: a training module 43 configured to train the plurality of sound samples to derive a neural network speech model.

The conversion module 42 is specifically configured to: and inputting the text information into the neural network voice model to generate audio with the voiceprint characteristics of the sender user.

In one embodiment, training module 43 includes: the device comprises a preprocessing unit, a conversion unit and a training unit.

The preprocessing unit is used for preprocessing each sound sample to form a standardized digital voice feature file; the conversion unit is set to respectively convert each sound sample into a voice text, and the voice text comprises characters, intonation and duration; the training unit is configured to train and parameter fit the standardized digital speech feature file and the speech text through a neural network to form a neural network speech model.

Based on the same technical concept, the embodiment of the disclosure correspondingly provides a system for reading the text information by voice. As shown in fig. 5, the system includes: a sender user terminal 1, a receiver user terminal 2, a communication management platform 3 and an edge server 4.

The sender user terminal 1 is configured to send the character information edited by the sender user and the identity of the users of both the sender and the receiver to the communication management platform 3; the communication management platform 3 is set to send the character information to the edge server 4; the edge server 4 is set to convert the text information into audio with voiceprint characteristics of the sender user and send the audio to the communication management platform 3; the communication management platform 3 is also configured to establish communication connection with the recipient user terminal 2 according to the identification of the users of the two transceivers, and play the audio after the connection is successful.

In one embodiment, the communication management platform 3 is further configured to make the audio into a voice short message, and send the voice short message to the recipient user terminal 2.

In one embodiment, the communication management platform 3 is further configured to collect voice call data between the originator user and other users according to the identity of the originator user, obtain a plurality of sound samples with a duration exceeding a preset duration threshold, and send the sound samples to the edge server 4; the edge server 4 is further arranged to train the plurality of sound samples to derive a neural network speech model.

The edge server 4 converts the text information into an audio with voiceprint characteristics of the sender user, specifically: the edge server 4 inputs the text information into the neural network speech model to generate audio with voiceprint characteristics of the sender user.

In one embodiment, the edge server 4 trains the plurality of sound samples to obtain a neural network speech model, specifically:

respectively preprocessing each sound sample to form a standardized digital voice feature file;

respectively converting each sound sample into a voice text, wherein the voice text comprises characters, tone and duration; and the number of the first and second groups,

and training and parameter fitting the standardized digital voice feature file and the voice text through a neural network to form a neural network voice model.

In an embodiment, the communication management platform 3 is further configured to query whether the recipient user is in a state where the recipient user is convenient to view the text information according to the identity of the recipient user, and send the text information to the edge server 4 if the recipient user is in a state where the recipient user is inconvenient to view the text information.

In an embodiment, the communication management platform 3 queries whether the recipient user is in a state where the recipient user is convenient to view the text information according to the identity of the recipient user, specifically at least one of the following three ways:

the communication management platform 3 sends a detection instruction to the receiver user terminal 2 according to the identity of the receiver user so that the receiver user terminal 2 detects whether the receiver user terminal is in a moving state or is connected with a vehicle-mounted system after receiving the detection instruction, and if the detection result shows that the receiver user terminal 2 is in the moving state or is connected with the vehicle-mounted system, the communication management platform 3 feeds back the state that the receiver user is inconvenient to check the text information;

the communication management platform 3 acquires the age of the recipient user according to the identity of the recipient user, judges whether the age of the recipient user is greater than a preset age threshold, and judges that the recipient user is in a state of inconvenient viewing of the text information if the age of the recipient user is greater than the preset age threshold;

the communication management platform 3 sends an inquiry instruction to the medical service platform according to the identity of the recipient user so that the medical service platform inquires whether the recipient user has the vision disorder disease or not based on the inquiry instruction, and if the recipient user has the vision disorder disease, the communication management platform 3 feeds back the state that the recipient user is inconvenient to check the text information.

In the embodiment of the present disclosure, the communication management platform 3 may query whether the recipient user is in a state where the recipient user is convenient to view the text information in at least one of the three manners, and the communication management platform 3 sends the text information to the edge server 4 as long as a query result that the recipient user is in a state where the recipient user is inconvenient to view the text information is obtained. Of course, if the communication management platform 3 does not obtain an inquiry result that the receiver user is in a state that the receiver user is inconvenient to view the text information, it indicates that the receiver user is convenient to view the text information at this time, and directly sends the text information to the receiver user terminal 2.

The system for reading text information by voice provided by the embodiment of the disclosure acquires voice call data of a sender user and other users through a communication management platform to obtain a plurality of sound samples with the time length exceeding a preset time length threshold, trains the plurality of sound samples through an edge server to obtain a neural network voice model, inputs the text information into the neural network voice model to generate an audio frequency with voiceprint characteristics of the sender user, judges whether a receiver user is in a state where the text information is inconvenient to view through the communication management platform, if the receiver user is in a state where the text information is inconvenient to view, the edge server inputs the text information into the neural network voice model to generate an audio frequency with personalized voice characteristics of the user, calls the receiver user through the communication management platform according to the identity of the sender user and plays the audio frequency, meanwhile, a piece of voice information is issued to the receiver user, so that the receiver user can conveniently check the voice information later, and the user experience is better.

Based on the same technical concept, the embodiment of the present disclosure correspondingly provides a computer device, as shown in fig. 6, the computer device 6 includes a memory 61 and a processor 62, the memory 61 stores a computer program, and when the processor 62 runs the computer program stored in the memory 61, the processor 62 executes the foregoing method for reading text information by voice.

Based on the same technical concept, embodiments of the present disclosure correspondingly provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the processor executes the method for reading text information by voice.

To sum up, the method, system, computer device and computer readable storage medium for reading text information by voice provided by the embodiments of the present disclosure identify the identities of both the receiving and sending users, train a large number of voice samples of the customized service during the communication between the sending user and other users, analyze and calculate a neural network voice model specific to the sending user by using a neural network voice training method, that is, input text information to the neural network voice model to generate an audio frequency with the personalized voice feature of the sending user, then the communication management platform determines whether the receiving user is in a state where it is inconvenient to view text information, if the receiving user is in a state where it is inconvenient to view text information, the edge server inputs the text information to the neural network voice model to generate an audio frequency with the personalized voice feature of the user, and then the communication management platform calls the receiver user according to the identity of the sender user and plays the section of audio, and simultaneously sends a piece of voice information to the receiver user, so that the receiver user can conveniently check the information later, and the user experience is better.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present disclosure, and not for limiting the same; while the present disclosure has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present disclosure.

Claims

1. A method for reading text information by voice is applied to a communication management platform, and is characterized in that the method comprises the following steps:

2. The method of claim 1, further comprising:

and making the audio into a voice short message, and sending the voice short message to a receiver user terminal.

3. The method of claim 1, further comprising:

the method comprises the steps of collecting voice call data of a sender user and other users according to an identity of the sender user, obtaining a plurality of sound samples with the time length exceeding a preset time length threshold value, sending the sound samples to an edge server, enabling the edge server to train the sound samples to obtain a neural network voice model, and inputting character information into the neural network voice model to generate audio with voiceprint characteristics of the sender user.

4. The method of claim 1, wherein prior to sending the textual information to an edge server, the method further comprises:

inquiring whether the receiver user is in a state of conveniently checking the character information according to the identity of the receiver user;

the sending the text information to an edge server includes:

and if the receiver user is in a state that the receiver user is inconvenient to check the text information, sending the text information to an edge server.

5. The method of claim 4, wherein said querying whether the recipient user is in a state that facilitates viewing the text message based on the recipient user's identity comprises:

sending a detection instruction to a receiver user terminal according to the identity of the receiver user so that the receiver user terminal detects whether the receiver user terminal is in a moving state or is connected with a vehicle-mounted system after receiving the detection instruction, and if the detection result is that the receiver user terminal is in the moving state or is connected with the vehicle-mounted system, feeding back the state that the receiver user is inconvenient to check the text information to a communication management platform;

and/or the presence of a gas in the gas,

acquiring the age of the recipient user according to the identity of the recipient user, judging whether the age of the recipient user is greater than a preset age threshold, and if so, judging that the recipient user is in a state of inconvenient viewing of the text information;

and/or the presence of a gas in the gas,

and sending a query instruction to the medical service platform according to the identity of the recipient user so that the medical service platform queries whether the recipient user has the vision disorder disease or not based on the query instruction, and if the recipient user has the vision disorder disease, feeding back the state that the recipient user is inconvenient to check the text information to the communication management platform.

6. A method for reading text information by voice is applied to an edge server, and is characterized in that the method comprises the following steps:

7. The method of claim 6, further comprising:

receiving a plurality of sound samples sent by a communication management platform, wherein the sound samples are obtained by the communication management platform according to the identity of an originator user and voice call data of other users, and the time length of the sound samples exceeds a preset time length threshold; and the number of the first and second groups,

training the plurality of sound samples to derive a neural network speech model;

the converting the text information into an audio with a voiceprint feature of a sender user comprises:

and inputting the text information into the neural network voice model to generate audio with the voiceprint characteristics of the sender user.

8. The method of claim 7, wherein training the plurality of sound samples to derive a neural network speech model comprises:

9. A system for reading text information by voice comprises a communication management platform, and is characterized in that the communication management platform comprises:

10. A system for voice reading text messages, comprising an edge server, wherein the edge server comprises:

11. A computer device comprising a memory and a processor, the memory having a computer program stored therein, the processor, when executing the computer program stored in the memory, performing the method of voice reading text information according to any one of claims 1 to 8.

12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method for voice-speaking text information according to any one of claims 1 to 8.