CN112187721A

CN112187721A - Voice processing method and device, intelligent voice message leaving equipment and storage medium

Info

Publication number: CN112187721A
Application number: CN202010908296.2A
Authority: CN
Inventors: 王立颖; 王沅召; 杨丰玮; 葛春光
Original assignee: Gree Electric Appliances Inc of Zhuhai; Zhuhai Lianyun Technology Co Ltd
Current assignee: Gree Electric Appliances Inc of Zhuhai; Zhuhai Lianyun Technology Co Ltd
Priority date: 2020-09-01
Filing date: 2020-09-01
Publication date: 2021-01-05
Anticipated expiration: 2040-09-01
Also published as: CN112187721B

Abstract

The embodiment of the invention relates to a voice processing method, a voice processing device, intelligent voice message equipment and a storage medium, wherein the method comprises the following steps: when the existence of audio is detected, a first voice fragment is obtained through a voice receiving module of the intelligent voice message equipment; performing encryption operation on the first voice fragment to obtain a second voice fragment; the second voice fragment is sent to the server for storage, so that the method can encrypt the message information received by the intelligent voice message leaving equipment and store the message information in different places, prevent the message from being stolen and monitored, and realize the information safety.

Description

Voice processing method and device, intelligent voice message leaving equipment and storage medium

Technical Field

The embodiment of the invention relates to the field of intelligent home information security, in particular to a voice processing method and device, intelligent voice message equipment and a storage medium.

Background

With the coming of the internet of things era, facilities of furniture life are integrated by using the internet of things technology, comfortable, convenient and safe living environment is provided for people, but the intelligent home also faces serious safety problems while bringing a lot of benefits to the life of people.

At present, many smart homes have a recording function, people can leave a message for other members in the home through recording, and the family members can be conveniently and timely informed to handle things. However, at present, the smart home system rarely sets a function of encrypting the recording or has a problem that an encryption method is simply stolen and modified, so that the recording of a user is easily monitored by lawbreakers, so that the privacy of the user is leaked, and the property loss of the user is possibly caused by the leakage of privacy information.

Disclosure of Invention

In view of this, in order to solve the technical problem that the security of the recording information of the intelligent device is low, embodiments of the present invention provide a voice processing method and apparatus, an intelligent voice message device, and a storage medium.

In a first aspect, an embodiment of the present invention provides a speech processing method, including:

when the existence of audio is detected, a first voice fragment is obtained through a voice receiving module of the intelligent voice message equipment;

performing encryption operation on the first voice fragment to obtain a second voice fragment;

and sending the second voice fragment to a server for storage.

In one possible embodiment, the method further comprises:

sending the second voice fragment to a server so that the server checks the second voice fragment and returns first check information corresponding to the second language fragment to the intelligent voice message device;

verifying the received first verification information;

if the verification is passed, prompting a message of successful storage;

and if the verification fails, prompting a message that the storage with the security risk fails.

In one possible embodiment, the method further comprises:

sending a reading request of the second voice fragment to the server;

receiving the second voice segment sent by the server in response to the reading request and verifying the second voice segment to obtain second verification information;

verifying the second check-up information;

if the verification is passed, based on a secret key for decrypting the second voice segment, executing decryption operation on the second voice segment to obtain the first voice segment and playing the first voice segment;

and if the verification fails, displaying a message of abnormal verification and deleting the second voice segment.

In one possible embodiment, the method further comprises:

receiving and storing a plurality of pieces of personal information, wherein each piece of personal information carries corresponding facial features;

setting a corresponding label for the first voice fragment based on a plurality of pieces of character information;

and setting part or all of the plurality of pieces of character information as the reading authority of the first voice fragment.

In one possible embodiment, the method further comprises:

determining the reading authority of the person information based on the facial features;

and based on the reading permission, executing the step of receiving the second voice fragment sent by the server in response to the reading request of the second voice fragment and verifying the second voice fragment to obtain first verification information.

In a second aspect, an embodiment of the present invention provides a speech processing apparatus, including:

the acquisition module is used for acquiring a first voice fragment through the voice receiving module of the intelligent voice message equipment when the existence of the audio is detected;

the encryption module is used for carrying out encryption operation on the first voice fragment to obtain a second voice fragment;

and the sending module is used for sending the second voice fragment to a server for storage.

In a possible implementation manner, the obtaining module is specifically configured to receive and store a plurality of pieces of personal information, where each piece of personal information carries a corresponding facial feature; setting a corresponding label for the first voice fragment based on a plurality of pieces of character information; and setting part or all of the plurality of pieces of character information as the reading authority of the first voice fragment.

In a possible implementation manner, the sending module is specifically configured to send the second voice fragment to a server, so that the server checks the second voice fragment, and returns first check information corresponding to the second language fragment to the intelligent voice message apparatus.

In a third aspect, an embodiment of the present invention provides an intelligent voice message apparatus, including: a processor and a memory, the processor being configured to execute a speech processing program stored in the memory to implement the speech processing method described in the above first aspect.

In a fourth aspect, an embodiment of the present invention provides a storage medium, including: the storage medium stores one or more programs that are executable by one or more processors to implement the speech processing method described in the above first aspect.

According to the voice processing scheme provided by the embodiment of the invention, when the existence of audio is detected, a first voice fragment is obtained through a voice receiving module of the intelligent voice message equipment; performing encryption operation on the first voice fragment to obtain a second voice fragment; and sending the second voice fragment to a server for storage. The method can realize the encryption and the remote storage of the message information received by the intelligent voice message equipment, prevent the information from being stolen and monitored and realize the information safety.

Drawings

Fig. 1 is a schematic flow chart of a speech processing method according to an embodiment of the present invention;

FIG. 2 is a flow chart of another speech processing method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a speech processing apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an intelligent voice message leaving device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

For the convenience of understanding of the embodiments of the present invention, the following description will be further explained with reference to specific embodiments, which are not to be construed as limiting the embodiments of the present invention.

Fig. 1 is a schematic flow chart of a speech processing method according to an embodiment of the present invention, and as shown in fig. 1, the method specifically includes:

this scheme can be applied to in the intelligent home equipment, and equipment possesses intelligent voice message module, for example, the intelligence refrigerator can be equipped with the voice message board function, is equipped with the display screen that supplies user operation recording and listening to the message on the refrigerator door, and the user operates on this display screen, again like all can be provided with the voice message board function on equipment such as family burglary-resisting door, cell-phone or intelligent TV, can the family member can leave a message for other members in the family according to above-mentioned equipment, makes things convenient for family life.

And S11, when the existence of the audio is detected, acquiring a first voice fragment through the voice receiving module of the intelligent voice message equipment.

When a user records by using the voice message equipment, recording the audio after a voice receiving module of the voice message equipment detects that the audio appears, and storing the acquired audio locally to obtain a first voice fragment.

S12, carrying out encryption operation on the first voice fragment to obtain a second voice fragment.

After the recording is stored locally, a prompt box for inquiring whether the user uploads the recording information is popped up by a display screen, the user can select whether to upload the first voice fragment to the server, and if the user selects to upload the first voice fragment to the server, the system encrypts the first voice fragment according to a preset encryption mode to obtain an encrypted first voice fragment.

And S13, sending the second voice fragment to a server for storage.

And according to the uploading instruction selected by the user, the built-in processor of the voice message leaving equipment responds to the uploading instruction and sends the second voice segment to the remote server, the remote server checks the second voice segment after receiving the second voice segment, and if the check result shows that the second voice segment is not modified, the second voice segment is stored.

Fig. 2 is a schematic flow chart of another speech processing method according to an embodiment of the present invention, and as shown in fig. 2, the method specifically includes:

and S21, receiving and storing a plurality of pieces of personal information, wherein each piece of personal information carries corresponding facial features.

In the embodiment of the invention, all the personal information needs to be input into the intelligent voice message equipment in advance by family members, and the personal information at least comprises the name, the gender and the identity and the facial features corresponding to each person.

For example, in a family, relative to a child, family members include dad, mom, grande, milk and child, when dad inputs information, at least three names, gender male and identity dad need to be input, then face photographing is performed, the system stores a face photo of dad and extracts a face feature of dad by a face feature recognition method to be stored in the person identity information corresponding to dad.

And S22, when the existence of the audio is detected, acquiring a first voice fragment through the voice receiving module of the intelligent voice message equipment.

When a user needs to leave a message for a family member, the intelligent voice message equipment is firstly logged in according to pre-recorded information, the intelligent voice message equipment system verifies the input login information, after the login information is verified successfully, a face acquisition module of the intelligent voice message equipment can acquire and identify the face of the current user, after the current user is identified as a member in the family, the user can perform subsequent operation, after a voice receiving module (such as a microphone) of the voice message equipment detects that audio appears, the audio is recorded, the acquired audio is firstly stored locally, and a first voice fragment is obtained.

For example, when a mother leaves a message in a family, the message content may be "three pm, air clothes in the washing machine".

And S23, setting the corresponding label of the first voice fragment based on the plurality of pieces of character information.

And S24, setting part or all of the plurality of pieces of character information as the reading authority of the first voice fragment.

After recording first voice segment, the system pops up the prompt box, choose to allow who listens to this first voice segment, the user can select one or more people to give listening authority, the system then sets up the label for first voice segment according to the personage information that the user selected, the label is used for representing this first voice segment to who says, and after the person who leaves a message set for the listening authority of other personnel, the personnel that are given listening authority can receive the suggestion that has new message at the mobile terminal.

For example, after mom in the family leaves a message, the message can be selected to allow dad to listen or dad and children to listen, and all family members to listen.

S25, carrying out encryption operation on the first voice fragment to obtain a second voice fragment.

After the first voice fragment is stored locally, a prompt box for inquiring whether a user uploads the first voice fragment to the server or not is popped up on the display screen, the user can select whether the first voice fragment is uploaded to the server or not, if the user selects to upload the first voice fragment to the server, the system firstly carries out first-layer encryption on the first voice fragment according to an encryption method of base64, then verifies an equipment secret key, carries out second-layer encryption on the first voice fragment after the first-layer encryption by using an AES (advanced encryption standard) encryption method after the equipment secret key is verified to be correct, and obtains a second voice fragment after the two-layer encryption is finished.

And S26, sending the second voice fragment to a server so that the server checks the second voice fragment and returns first check information corresponding to the second voice fragment to the intelligent voice message device.

And sending the second voice fragment to a remote server by protocol agreement, and packaging and uploading the sent data according to the following format: data header information, a second voice fragment and verification information, wherein the data header information is hexadecimal characters of user id encrypted by base64, the verification information is hexadecimal characters representing the data length of the second voice fragment, after the second voice fragment is uploaded, a remote server background decrypts the data header information firstly, then decrypts the verification information to obtain the original data length L1, decrypts the uploaded second voice fragment to obtain the length L2, and returns all decrypted data information to the intelligent voice message leaving equipment.

And S27, verifying the received first verification information.

And S28, if the verification is passed, prompting a message of successful storage.

And the processing system of the intelligent voice message leaving equipment judges whether the decrypted user id exists or not according to the received data information, judges whether the user id is equal to L1 and L2, and if the user id exists and the data lengths of L1 and L2 are equal, determines that the packaged data are not modified and stores the packaged data.

And S29, if the verification is not passed, prompting a message that the storage fails in the security risk.

And if the user id exists but the data lengths are not equal, determining that the packed data are modified, prompting the user that the data have security risk, failing to store, and deleting the packed data.

Optionally, if the user id does not exist, it is determined that the second voice segment may be sent by an abnormal device, and it is prompted that the user data has a security risk, the storage fails, the packaged data is deleted, and a non-initialization file in the system is detected to be deleted, so that the system security is ensured.

S210, determining the reading authority of the person information based on the facial features.

In the embodiment of the invention, when one or more users want to listen to messages, the intelligent voice message equipment needs to log in according to the pre-recorded information, the intelligent voice message equipment system verifies the input login information, after the login information is successfully verified, a face acquisition module of the intelligent voice message equipment can acquire and identify the face of the current user, the current user is identified as a member in a family, the subsequent operation can be carried out, and if the face identification of the current user does not belong to one of all pre-recorded family members, the subsequent operation cannot be carried out.

S211, sending a reading request of the second voice fragment to the server.

And after the identity information of the current user passes the verification, the current user clicks a message listening request instruction, and after receiving the message listening request instruction, the system responds to the instruction and sends a request for reading the second voice segment to the remote server, wherein the request carries the identity information of the current user.

S212, receiving the second voice segment sent by the server in response to the reading request and verifying the second voice segment to obtain second verification information.

After receiving a request for reading the second voice fragment, the remote server finds all messages for the current user according to the identity information of the current user carried in the request, and sends the messages to the intelligent voice message leaving equipment, each message is packaged and sent in a format of 'data header information + second voice fragment + verification information', the intelligent voice message leaving equipment system checks the data packet after receiving the data packet sent by the remote server, and second verification information is obtained, wherein the second verification information comprises user id, original data length L1 represented by the verification information and length L2 obtained by decrypting the uploaded second voice fragment.

S213, verifying the second verification information.

And S214, if the verification is passed, performing decryption operation on the second voice segment based on the secret key for decrypting the second voice segment to obtain the first voice segment and playing the first voice segment.

And the intelligent voice message equipment system judges whether the decrypted user id exists or not, judges whether the L1 is equal to the L2 or not, and determines that the packaged data sent by the remote server is not modified if the user id exists and the data lengths of the L1 and the L2 are equal.

Further, the secret key is verified according to a decryption secret key preset by a developer, if the secret key is verified to be correct, the second voice segment is decrypted to obtain an original first voice segment recorded by the message leaving user, and the intelligent voice message leaving equipment plays the first voice segment.

S215, if the verification fails, displaying a message of abnormal verification and deleting the second voice segment.

If the user id exists but the data lengths of L1 and L2 are not equal, the packaged data are determined to be modified in the transmission process, the system prompts the user that the packaged data downloaded by the user have safety risks, deletes the packaged data, and automatically queries and deletes non-initialization files existing in the system.

Optionally, if the user id does not exist, it is determined that the packaged data may be sent by an abnormal device, the user is prompted that the packaged data has a security risk, the packaged data is deleted, and a dangerous file in the system is automatically detected and deleted.

According to the voice processing scheme provided by the embodiment of the invention, when the existence of audio is detected, a first voice fragment is obtained through a voice receiving module of the intelligent voice message equipment; performing encryption operation on the first voice fragment to obtain a second voice fragment; the second voice fragment is sent to a server for storage, and the server checks and verifies the integrity of the data after receiving the second voice fragment; when a user needs to listen to voice, the voice is downloaded from the server and data integrity verification is carried out, and the voice data can be detected whether to be stolen or modified in the transmission process through two verification processes, so that information is prevented from being stolen and monitored in the transmission process, and the safety of voice information is realized.

Fig. 3 is a schematic structural diagram of a speech processing apparatus according to an embodiment of the present invention, which specifically includes:

an obtaining module 301, configured to obtain, when it is detected that audio exists, a first voice segment through a voice receiving module of the intelligent voice message apparatus;

an encryption module 302, configured to perform an encryption operation on the first voice segment to obtain a second voice segment;

a sending module 303, configured to send the second voice segment to a server for storage.

In a possible implementation manner, the obtaining module is specifically configured to receive the second voice segment sent by the server in response to the read request and check the second voice segment to obtain second check information; verifying the second check-up information; if the verification is passed, based on a secret key for decrypting the second voice segment, executing decryption operation on the second voice segment to obtain the first voice segment and playing the first voice segment; and if the verification fails, displaying a message of abnormal verification and deleting the second voice segment.

In a possible implementation manner, the obtaining module is further configured to receive and store a plurality of pieces of personal information, where each piece of personal information carries a corresponding facial feature; setting a corresponding label for the first voice fragment based on a plurality of pieces of character information; and setting part or all of the plurality of pieces of character information as the reading authority of the first voice fragment. Determining the reading authority of the person information based on the facial features; and executing the step of sending a reading request of the second voice fragment to the server based on the reading authority.

In a possible implementation manner, the sending module is specifically configured to send the second voice fragment to a server, so that the server checks the second voice fragment, and returns first check information corresponding to the second language fragment to the intelligent voice message leaving device; verifying the received first verification information; if the verification is passed, prompting a message of successful storage; and if the verification fails, prompting a message that the storage with the security risk fails.

The speech processing apparatus provided in this embodiment may be the speech processing apparatus shown in fig. 3, and may perform all the steps of the speech processing method shown in fig. 1-2, so as to achieve the technical effect of the speech processing method shown in fig. 1-2, and for brevity, it is not described herein again.

Fig. 4 is a schematic structural diagram of an intelligent voice message apparatus according to an embodiment of the present invention, where the intelligent voice message apparatus 400 shown in fig. 4 includes: at least one processor 401, memory 402, at least one network interface 404, and other user interfaces 403. The various components in the intelligent voice messaging device 400 are coupled together by a bus system 405. It is understood that the bus system 405 is used to enable connection communication between these components. The bus system 405 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 405 in fig. 4.

The user interface 403 may include, among other things, a display, a keyboard, or a pointing device (e.g., a mouse, trackball, touch pad, or touch screen, among others.

It will be appreciated that memory 402 in embodiments of the invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be a Read-only memory (ROM), a programmable Read-only memory (PROM), an erasable programmable Read-only memory (erasabprom, EPROM), an electrically erasable programmable Read-only memory (EEPROM), or a flash memory. The volatile memory may be a Random Access Memory (RAM) which functions as an external cache. By way of example, but not limitation, many forms of RAM are available, such as static random access memory (staticiram, SRAM), dynamic random access memory (dynamic RAM, DRAM), synchronous dynamic random access memory (syncronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM ), Enhanced Synchronous DRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and direct memory bus RAM (DRRAM). The memory 402 described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some embodiments, memory 402 stores the following elements, executable units or data structures, or a subset thereof, or an expanded set thereof: an operating system 4021 and application programs 4022.

The operating system 4021 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is configured to implement various basic services and process hardware-based tasks. The application 4022 includes various applications, such as a media player (MediaPlayer), a Browser (Browser), and the like, for implementing various application services. A program for implementing the method according to the embodiment of the present invention may be included in the application 4022.

In this embodiment of the present invention, by calling a program or an instruction stored in the memory 402, specifically, a program or an instruction stored in the application 4022, the processor 401 is configured to execute the method steps provided by the method embodiments, for example, including:

when the existence of audio is detected, a first voice fragment is obtained through a voice receiving module of the intelligent voice message equipment; performing encryption operation on the first voice fragment to obtain a second voice fragment; and sending the second voice fragment to a server for storage.

In a possible implementation manner, the second voice fragment is sent to a server, so that the server checks the second voice fragment, and returns first check information corresponding to the second language fragment to the intelligent voice message leaving device; verifying the received first verification information; if the verification is passed, prompting a message of successful storage; and if the verification fails, prompting a message that the storage with the security risk fails.

In one possible embodiment, sending a read request for the second voice segment to the server; receiving the second voice segment sent by the server in response to the reading request and verifying the second voice segment to obtain second verification information; verifying the second check-up information; if the verification is passed, based on a secret key for decrypting the second voice segment, executing decryption operation on the second voice segment to obtain the first voice segment and playing the first voice segment; and if the verification fails, displaying a message of abnormal verification and deleting the second voice segment.

In one possible implementation, receiving and storing a plurality of pieces of personal information, wherein each piece of personal information carries a corresponding facial feature; setting a corresponding label for the first voice fragment based on a plurality of pieces of character information; and setting part or all of the plurality of pieces of character information as the reading authority of the first voice fragment.

In one possible embodiment, based on facial features, the reading authority of the personal information is determined; and based on the reading permission, executing the step of receiving the second voice fragment sent by the server in response to the reading request of the second voice fragment and verifying the second voice fragment to obtain first verification information.

The method disclosed in the above embodiments of the present invention may be applied to the processor 401, or implemented by the processor 401. The processor 401 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 401. The processor 401 may be a general-purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software elements in the decoding processor. The software elements may be located in ram, flash, rom, prom, or eprom, registers, among other storage media that are well known in the art. The storage medium is located in the memory 402, and the processor 401 reads the information in the memory 402 and completes the steps of the method in combination with the hardware.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented by means of units performing the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

The intelligent voice message leaving device provided in this embodiment may be the intelligent voice message leaving device shown in fig. 4, and may execute all the steps of the voice processing method shown in fig. 1-2, so as to achieve the technical effect of the voice processing method shown in fig. 1-2, and for brevity, please refer to the description related to fig. 1-2, which is not described herein again.

The embodiment of the invention also provides a storage medium (computer readable storage medium). The storage medium herein stores one or more programs. Among others, the storage medium may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory may also comprise a combination of memories of the kind described above.

When one or more programs in the storage medium can be executed by one or more processors, the voice processing method executed on the intelligent voice message leaving device side is realized.

The processor is used for executing the voice processing program stored in the memory so as to realize the following steps of the voice processing method executed on the intelligent voice message equipment side:

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A voice processing method is characterized by being applied to intelligent voice message leaving equipment and comprising the following steps:

and sending the second voice fragment to a server for storage.

2. The method of claim 1, wherein sending the second speech segment to a server for storage comprises:

verifying the received first verification information;

if the verification is passed, prompting a message of successful storage;

3. The method of claim 1, further comprising:

sending a reading request of the second voice fragment to the server;

verifying the second check-up information;

4. The method according to any one of claims 1-3, further comprising:

5. The method of claim 4, further comprising:

and executing the step of sending a reading request of the second voice fragment to the server based on the reading authority.

6. A speech processing apparatus, comprising:

7. The apparatus of claim 6, wherein the obtaining module is specifically configured to receive and store a plurality of pieces of personal information, each piece of personal information carrying a corresponding facial feature; setting a corresponding label for the first voice fragment based on a plurality of pieces of character information; and setting part or all of the plurality of pieces of character information as the reading authority of the first voice fragment.

8. The apparatus according to claim 6, wherein the sending module is specifically configured to send the second voice clip to a server, so that the server checks the second voice clip, and returns first check information corresponding to the second language clip to the intelligent voice message device.

9. An intelligent voice message leaving device, comprising: a processor and a memory, the processor being configured to execute a speech processing program stored in the memory to implement the speech processing method of any one of claims 1 to 5.

10. A storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the speech processing method of any one of claims 1 to 5.