CN107516534B

CN107516534B - Voice information comparison method and device and terminal equipment

Info

Publication number: CN107516534B
Application number: CN201710769644.0A
Authority: CN
Inventors: 吴小龙
Original assignee: Guangdong Genius Technology Co Ltd
Current assignee: Guangdong Genius Technology Co Ltd
Priority date: 2017-08-31
Filing date: 2017-08-31
Publication date: 2020-11-03
Anticipated expiration: 2037-08-31
Also published as: CN107516534A

Abstract

The invention is suitable for the technical field of voice information, and provides a method, a device and a terminal device for comparing voice information, wherein the method comprises the following steps: acquiring two pieces of voice information to be compared, and generating a corresponding audio waveform diagram for each piece of voice information to be compared; obtaining the similarity of the two pieces of voice information by comparing the audio oscillograms corresponding to the two pieces of voice information; if the similarity of the two voices is greater than a threshold value, outputting a comparison result that the voice contents of the two voice messages are the same; otherwise, outputting a comparison result that the contents of the two pieces of voice information are different. The invention generates the audio waveform diagram by the two pieces of voice information, and determines whether the contents of the two pieces of voice information are consistent or not by comparing the audio waveform diagrams, so that the recognition result is more accurate.

Description

Voice information comparison method and device and terminal equipment

Technical Field

The invention belongs to the technical field of voice information, and particularly relates to a method and a device for comparing voice information and terminal equipment.

Background

At present, the combination of text and voice data is increasing in types, and correspondingly, voice information also generates a plurality of data formats. In the application of voice information, in order to meet different requirements, a voice data file in one format is often converted into a voice data file in another format, which requires that the consistency of data content is ensured in the process of converting the voice data file format.

However, even if the consistency of the data content is ensured in the process of converting the format of the voice data file, in the process of processing and utilizing subsequent voice data, the intermediate voice data file in a certain format may be modified, which results in that the data content of the voice data file in different formats or different stages of the same voice data source file is inconsistent, but a user cannot judge whether the data content of the voice data file is consistent when using the voice data files in different formats or different stages.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method, an apparatus, and a terminal device for comparing voice information, so as to solve the problem that it is not possible to accurately compare whether a plurality of voice information are consistent.

A first aspect of the embodiments of the present invention provides a method for comparing voice information, including:

acquiring two pieces of voice information to be compared, and generating a corresponding audio waveform diagram for each piece of voice information to be compared;

comparing the audio oscillograms corresponding to the two pieces of voice information to obtain the similarity of the two pieces of voice information;

if the similarity of the two pieces of voice information is greater than a threshold value, outputting a comparison result that the voice contents of the two pieces of voice information are the same;

and if the similarity of the two pieces of voice information is smaller than or equal to the threshold, outputting a comparison result that the contents of the two pieces of voice information are different.

A second aspect of the embodiments of the present invention provides a device for comparing voice information, including:

the oscillogram generating module is used for acquiring two pieces of voice information to be compared and generating a corresponding audio oscillogram for each piece of voice information to be compared;

the similarity obtaining module is used for comparing the audio oscillograms corresponding to the two pieces of voice information to obtain the similarity of the two pieces of voice information;

the processing module is used for outputting a comparison result that the voice contents of the two pieces of voice information are the same if the similarity of the two pieces of voice information is greater than a threshold value;

the processing module is further configured to output a comparison result that the contents of the two pieces of voice information are different if the similarity of the two pieces of voice information is smaller than or equal to the threshold.

A third aspect of the present embodiment provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method provided in the first aspect of the present embodiment when executing the computer program.

A fourth aspect of embodiments of the present invention provides a computer-readable storage medium storing a computer program which, when executed by one or more processors, performs the steps of the method provided by the first aspect of embodiments of the present invention.

Compared with the prior art, the embodiment of the invention has the following beneficial effects: the embodiment of the invention generates the audio waveform diagrams of the two pieces of voice information to be compared, and then compares the audio waveform diagrams of the two pieces of voice information to determine whether the contents of the two pieces of voice information are consistent. The method utilizes the oscillogram of the audio frequency, and confirms whether the voice contents of the two pieces of voice information are consistent or not through the comparison of the oscillogram, so that the comparison result is more accurate.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flow chart illustrating an implementation of a method for comparing voice information according to an embodiment of the present invention;

fig. 2 is a schematic flow chart illustrating an implementation of a method for comparing voice information according to an embodiment of the present invention;

fig. 3 is a schematic block diagram of a comparing apparatus for comparing voice messages according to an embodiment of the present invention;

fig. 4 is a schematic block diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Fig. 1 is a schematic flow chart illustrating an implementation of a method for comparing voice information according to an embodiment of the present invention, where as shown in the figure, the method may include the following steps:

step S101, two pieces of voice information to be compared are obtained, and each piece of voice information to be compared is generated into a corresponding audio waveform diagram.

In the embodiment of the invention, firstly, two pieces of voice information to be compared are obtained, wherein one piece of voice information can be used as standard voice information, the other piece of voice information can be used as comparison voice information, the standard voice information is used as a reference, the comparison voice information is compared with the standard voice information, and whether the comparison voice information is consistent with the voice content of the standard voice information or not is obtained. Because voice information can be converted into text information, two pieces of voice information can be converted into text information, and then the text information in the two pieces of voice information is compared, so that whether the voice contents of the two pieces of voice information are consistent or not can be obtained. According to the embodiment of the invention, the voice information to be compared is generated into the corresponding audio waveform diagram, and the audio waveform diagram contains information such as loudness, timbre and frequency of the audio and can represent the voice information better.

Specifically, generating a corresponding audio waveform map for each piece of voice information to be compared includes:

decompressing the voice information and randomly dividing the voice information into a plurality of data blocks;

acquiring sampling points and amplitude values of the sampling points in each data block according to a preset sampling mode;

and sequencing the sampling points according to time, and generating an audio waveform diagram according to the amplitude value of each sampling point.

In the embodiment of the present invention, in the process of generating an audio waveform map from voice information, sampling points may be obtained according to time, and only data of the sampling points are selected to generate the audio waveform map, so that the calculation amount may be reduced, we may decompress the voice information first, divide the decompressed voice information into a plurality of data blocks at random, the size of each data block is not fixed, then obtain sampling points in each data block according to a preset sampling mode, since the data blocks divided in advance are divided according to a random mode, and sampling is performed in each data block according to a fixed mode, this sampling mode with both randomness and regularity makes the obtained sampling points more represent voice information samples, and after obtaining sampling points of each data block, the sampling points are also ordered according to time in the voice information, which is equivalent to that the sampling points in the voice information samples both contain randomness, and sampling in a regular sampling mode, acquiring sampling points, and acquiring data corresponding to the sampling points, such as amplitude values, to generate an audio waveform diagram.

And S102, comparing the audio waveform diagrams corresponding to the two pieces of voice information to obtain the similarity of the two pieces of voice information.

In the embodiment of the present invention, the audio waveform diagram includes information such as loudness, timbre, and frequency of a segment of speech information, for example, in the audio waveform diagram, the upper and lower amplitudes represent loudness, the combination of frequencies represents timbre, and the period interval represents frequency. The similarity of two pieces of voice information can be obtained by comparing the audio waveform diagrams of the two pieces of voice information. For example, the two audio waveform diagrams may be subjected to an overlap comparison method, a portion of the two audio waveform diagrams of the two pieces of voice information that overlap represents a consistent portion, a portion that cannot overlap represents an inconsistent portion, and a percentage of the overlapped portion in the total waveform diagram may be used as a similarity of the two audio waveform diagrams, where the similarity of the two audio waveform diagrams is a similarity of the two pieces of voice information.

Step S103, if the similarity of the two pieces of voice information is greater than a threshold value, outputting a comparison result that the voice contents of the two pieces of voice information are the same; and if the similarity of the two pieces of voice information is smaller than or equal to the threshold, outputting a comparison result that the contents of the two pieces of voice information are different.

In the embodiment of the present invention, since processes such as format conversion and waveform diagram generation may be performed before the similarity of two pieces of voice information to be compared is obtained, even if the two pieces of information of the same voice content are obtained, in the format conversion process, the encoding difference between different formats may cause that the two finally generated audio waveform diagrams are not completely consistent, or because the sampling methods in generating the audio waveform diagrams are different, the two finally generated audio waveform diagrams may also be not completely consistent. At this time, a threshold needs to be set, when the similarity of the two pieces of voice information is greater than the threshold, it is indicated that the voice contents of the two pieces of voice information are consistent, and a voice result with the same voice contents of the two pieces of voice information can be output, otherwise, a comparison result with different contents of the two pieces of voice information is output.

The preset sampling mode may be sampling according to a fixed step length, or sampling according to other existing sampling modes, which is not described herein again.

The embodiment of the invention respectively generates the corresponding audio waveform diagrams from the two pieces of voice information to be compared, then compares the audio waveform diagrams of the two pieces of voice information to obtain the similarity of the two pieces of voice information, and judges whether the contents of the two pieces of voice information are consistent or not according to the similarity.

Fig. 2 is a schematic flow chart illustrating an implementation of a method for comparing voice information according to another embodiment of the present invention, where the method includes the following steps:

step S201, acquiring two pieces of voice information to be compared, and setting the two pieces of voice information to be compared as standard voice information and comparison voice information, respectively.

Step S202, acquiring the data format of the standard voice information, and converting the data format of the compared voice information into the data format of the standard voice information.

In the embodiment of the present invention, since an audio waveform diagram needs to be generated, and the encoding modes of audio files with different formats are different, and there may be a slight difference when the audio waveform diagram is generated, after two pieces of voice information to be compared are obtained, it is first identified whether the data formats of the two pieces of voice information to be compared are the same, and if not, the two pieces of voice information to be compared are converted into the same data format, which may be simultaneously converted into a predetermined audio format, and one piece of voice information may also be converted into a voice file with the same data format as the other piece of voice information, specifically, the data format of the standard voice information may be obtained, and the data format of the compared voice information is converted into the data format of the standard voice information. The data format of the voice information, i.e., the format of the audio, may be MP3, WAV, AU, SND, RAW, AFC, etc.

In step S203, a mute section in each piece of speech information is recognized, and the mute section in the recognized speech information is cut off.

In the embodiment of the present invention, since the voice information may include non-voice portions, such as mute portions, if the comparison of the mute portions obviously increases the amount of calculation in the comparison process, we may first identify the mute portions in each piece of voice information, and then cut off the mute portions in the identified voice information from the voice information.

Step S204, generating a corresponding audio waveform map for each piece of voice information to be compared.

The step is the same as step S102, and the description of step S102 can be specifically referred to, which is not repeated herein.

Step S205, generating a dot matrix image according to the audio waveform image corresponding to each piece of voice information, and comparing the dot matrix images corresponding to the two pieces of voice information by a dot matrix comparison method to obtain the similarity of the two pieces of voice information.

In the embodiment of the invention, the audio waveform image can be further generated into a dot matrix image, the dot matrix image is also called a dot matrix image, the minimum unit of the dot matrix image is a pixel, and the dot matrix image is a picture for realizing the display effect by the arrangement of the pixel array. After the dot matrix image is generated, the dot matrix image corresponding to the voice information can be compared through a dot matrix comparison method, so that the dot matrix image corresponding to each voice information can be composed of a plurality of pixels, two pieces of voice information can be compared through comparing one pixel, and the similarity can be obtained through the proportion of the number of the same pixel points to the number of the total pixel points.

Step S206, if the similarity of the two pieces of voice information is greater than a threshold value, outputting a comparison result that the voice contents of the two pieces of voice information are the same;

if the similarity of the two pieces of voice information is smaller than or equal to the threshold value, outputting a comparison result that the contents of the two pieces of voice information are different, and modifying the comparison voice information to enable the data contents of the comparison voice information and the standard voice information to be consistent or converting the data format of the standard voice information into the data format of the comparison voice information to replace the comparison voice information according to the data format of the comparison voice information.

In the embodiment of the present invention, if the similarity between the two pieces of voice information is smaller than the threshold, it indicates that the voice contents of the comparison voice information are inconsistent with the voice contents of the standard voice information, and at this time, the two pieces of voice information need to be modified into the same voice contents, the comparison voice information can be modified into the voice information consistent with the data contents of the standard voice information, the comparison voice information can be deleted, and the standard voice information is directly converted into the data format of the comparison voice information and then used as the comparison voice information, so that the voice contents of the standard voice information and the comparison voice information are consistent.

According to the embodiment of the invention, the acquired voice information to be compared is subjected to format conversion, the mute part is cut off, the oscillogram is generated, then the dot matrix is generated according to the oscillogram, and the similarity of the two pieces of voice information can be obtained by comparing the dot matrix, so that the similarity of the two pieces of voice information to be compared can be more accurately obtained by the comparison mode.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Fig. 3 is a schematic block diagram of a comparison apparatus for voice information according to an embodiment of the present invention, and only the parts related to the embodiment of the present invention are shown for convenience of description.

The voice information comparison device can be a software unit, a hardware unit or a combination unit of software and hardware which are built in the terminal equipment (such as a mobile phone, a computer, a tablet computer, a notebook computer and the like), and can also be integrated into the terminal equipment as an independent pendant.

The voice information comparison device 3 comprises:

the oscillogram generating module 31 is configured to obtain two pieces of voice information to be compared, and generate a corresponding audio oscillogram for each piece of voice information to be compared;

a similarity obtaining module 32, configured to compare the audio waveform diagrams corresponding to the two pieces of voice information, and obtain a similarity between the two pieces of voice information;

the processing module 33 is configured to output a comparison result that the voice contents of the two pieces of voice information are the same if the similarity of the two pieces of voice information is greater than a threshold;

the processing module 33 is further configured to output a comparison result that the contents of the two pieces of voice information are different if the similarity of the two pieces of voice information is smaller than or equal to the threshold.

Optionally, the similarity obtaining module 32 includes:

a dot matrix image generating unit 321 configured to generate a dot matrix image according to the audio waveform image corresponding to each piece of voice information;

the comparing unit 322 is configured to compare the dot matrix images corresponding to the two pieces of voice information by using a dot matrix comparison method.

Optionally, the waveform diagram generating module 31 includes:

a data format obtaining unit 311, configured to set the two pieces of voice information to be compared as standard voice information and comparison voice information, respectively, and obtain a data format of the standard voice information;

a format conversion unit 312, configured to convert the data format of the comparison voice message into the data format of the standard voice message.

Optionally, the waveform diagram generating module 31 further includes:

a silence removal unit 313 for recognizing a silence portion in each piece of voice information and removing the silence portion in the recognized voice information.

Optionally, the waveform diagram generating module 31 further includes:

a decompression unit 314, configured to decompress the voice information and randomly divide the voice information into a plurality of data blocks;

the sampling unit 315 is configured to obtain sampling points and amplitude values of the sampling points in each data block according to a preset sampling manner;

the waveform generating unit 316 is configured to sort the sampling points according to time and generate an audio waveform according to the amplitude value of each sampling point.

Optionally, the processing module 33 is further configured to: if the similarity of the two pieces of voice information is smaller than or equal to the threshold value, modifying the comparison voice information to enable the data content of the comparison voice information to be consistent with that of the standard voice information;

or, according to the data format of the comparison voice information, converting the data format of the standard voice information into the data format of the comparison voice information to replace the comparison voice information.

It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing functional modules and units are merely illustrated in terms of division, and in practical applications, the foregoing functional allocation may be performed by different functional modules and units as needed, that is, the internal structure of the apparatus for comparing voice information is divided into different functional modules and units to perform all or part of the above described functions. Each functional module and unit in the embodiments may be integrated in one processing module, or each module and unit may exist alone physically, or two or more modules are integrated in one module, and the integrated modules and units may be implemented in a form of hardware, or in a form of software functional modules. In addition, specific names of the functional modules and units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the modules and units in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Fig. 4 is a schematic block diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 4, the terminal device 4 of this embodiment includes: one or more processors 40, a memory 41, and a computer program 42 stored in the memory 41 and executable on the processors 40. When the processor 40 executes the computer program 42, the steps in the embodiment of the comparison method for each piece of voice information described above are implemented, for example, steps S101 to S103 shown in fig. 1. Alternatively, the processor 40 executes the computer program 42 to implement the functions of the modules in the learning time statistic device, such as the modules 31 to 33 shown in fig. 3.

Illustratively, the computer program 42 may be partitioned into one or more modules/units that are stored in the memory 41 and executed by the processor 40 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 42 in the terminal device 4. For example, the computer program 42 may be divided into a waveform map generation module, a similarity obtaining module, and a processing module.

The other modules or units may refer to the description of the modules or units in the apparatus for comparing voice information, and are not described herein again.

The terminal device includes, but is not limited to, a processor 40, and a memory 41. Those skilled in the art will appreciate that fig. 4 is merely an example of a terminal device 4 and does not constitute a limitation of terminal device 4 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input devices, output devices, network access devices, buses, etc.

The Processor 40 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 41 may be an internal storage unit of the terminal device 4, such as a hard disk or a memory of the terminal device 4. The memory 41 may also be an external storage device of the terminal device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 4. Further, the memory 41 may also include both an internal storage unit and an external storage device of the terminal device 4. The memory 41 is used for storing the computer program and other programs and data required by the terminal device. The memory 41 may also be used to temporarily store data that has been output or is to be output.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed terminal device, apparatus and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. . Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A method for comparing voice messages, comprising:

comparing the audio oscillograms corresponding to the two pieces of voice information to obtain the similarity of the two pieces of voice information; the comparing the audio waveform diagrams corresponding to the two pieces of voice information comprises: generating a dot matrix image according to the audio waveform image corresponding to each piece of voice information; comparing the dot matrix images corresponding to the two pieces of voice information by a dot matrix comparison method; the minimum unit of the dot matrix image is a pixel; the obtaining the similarity of the two pieces of voice information comprises: comparing pixels of the two voice information dot matrix images one by one, and obtaining similarity according to the proportion of the number of the same pixel points to the total number of the pixel points;

2. The method of claim 1, wherein after obtaining the two pieces of speech information to be compared, the method further comprises:

setting the two pieces of voice information to be compared as standard voice information and comparison voice information respectively;

acquiring a data format of the standard voice information;

and converting the data format of the compared voice information into the data format of the standard voice information.

3. The method of claim 1, wherein before generating the corresponding audio waveform map for each piece of speech information to be compared, the method further comprises:

and identifying a mute part in each piece of voice information, and cutting off the identified mute part in the voice information.

4. The method of claim 1, wherein the generating a corresponding audio waveform map for each piece of speech information to be compared comprises:

5. The method of claim 2, further comprising:

if the similarity of the two pieces of voice information is smaller than or equal to the threshold value, modifying the comparison voice information to enable the data content of the comparison voice information to be consistent with that of the standard voice information;

6. A device for comparing voice messages, comprising:

the similarity obtaining module is used for comparing the audio oscillograms corresponding to the two pieces of voice information to obtain the similarity of the two pieces of voice information; the similarity obtaining module comprises: the dot matrix image generating unit is used for generating a dot matrix image according to the audio waveform image corresponding to each piece of voice information; the comparison unit is used for comparing the dot matrix images corresponding to the two pieces of voice information by a dot matrix comparison method; the minimum unit of the dot matrix image is a pixel; the obtaining the similarity of the two pieces of voice information comprises: comparing pixels of the two voice information dot matrix images one by one, and obtaining similarity according to the proportion of the number of the same pixel points to the total number of the pixel points;

7. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 5 when executing the computer program.

8. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.