CN115712699A - Voice information extraction method, device, equipment and storage medium - Google Patents

Voice information extraction method, device, equipment and storage medium Download PDF

Info

Publication number
CN115712699A
CN115712699A CN202211438892.4A CN202211438892A CN115712699A CN 115712699 A CN115712699 A CN 115712699A CN 202211438892 A CN202211438892 A CN 202211438892A CN 115712699 A CN115712699 A CN 115712699A
Authority
CN
China
Prior art keywords
information
deviation rectifying
key information
target
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211438892.4A
Other languages
Chinese (zh)
Inventor
姜卫宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202211438892.4A priority Critical patent/CN115712699A/en
Publication of CN115712699A publication Critical patent/CN115712699A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of artificial intelligence and discloses a voice information extraction method, a device, equipment and a storage medium. The voice information extraction method comprises the following steps: acquiring audio data generated based on human-computer interaction, and converting the audio data into text information by using an automatic voice recognition technology; extracting key information of the text information based on an information extraction model; determining a service type according to the extracted key information, and matching a preset deviation rectifying base according to the service type to obtain a target deviation rectifying base; and calling the target deviation rectifying library to carry out deviation rectifying treatment on the key information to obtain the rectified target key information. Through the mode, the method and the device can improve the accuracy rate of voice information extraction and the correction efficiency, and solve the problem of inaccurate voice recognition.

Description

Voice information extraction method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a voice information extraction method, a voice information extraction device, voice information extraction equipment and a storage medium.
Background
Currently, in some financial insurance business scenarios, such as car insurance, some key information needs to be extracted from the voice information in the call, such as: the name, the mobile phone number, the address, the reservation time, the vehicle type, the product information and the like of the client are used for reducing the operation of the seat in the system, so that the operation efficiency is improved. The accuracy of speech content recognition in a speech system is a key constraint factor influencing the accuracy of speech information extraction, but due to technical limitations, a generated text may contain errors, which leads to deviation in subsequent key information extraction, and the accuracy is low.
Disclosure of Invention
The invention provides a voice information extraction method, a voice information extraction device, voice information extraction equipment and a storage medium, which can improve the accuracy rate and the deviation rectification efficiency of voice information extraction and solve the problem of inaccurate voice recognition.
In order to solve the technical problems, the invention adopts a technical scheme that: provided is a voice information extraction method, including:
acquiring audio data generated based on human-computer interaction, and converting the audio data into text information by using an automatic voice recognition technology;
extracting key information of the text information based on an information extraction model;
determining a service type according to the extracted key information, and matching a preset deviation rectifying base according to the service type to obtain a target deviation rectifying base;
and calling the target deviation rectifying library to carry out deviation rectifying treatment on the key information to obtain the rectified target key information.
According to an embodiment of the present invention, the obtaining a target deviation rectifying base according to the service type matching preset deviation rectifying base further includes:
matching the service type with a preset deviation rectifying base, and judging whether a matched deviation rectifying base exists or not;
if so, determining the matched preset deviation rectifying base as the target deviation rectifying base;
if not, a new calling interface is created according to the service type to add a new correction library, and the new correction library is determined as the target correction library.
According to an embodiment of the present invention, the invoking the target deviation rectifying library to perform deviation rectifying processing on the key information to obtain the rectified target key information further includes:
calling the target deviation rectifying library to carry out deviation rectifying processing on the key information;
labeling the key information according to a deviation rectification processing result;
and acquiring the corrected target key information according to the labeling processing result.
According to an embodiment of the present invention, the labeling processing on the key information according to the deviation rectification processing result further includes:
if the deviation rectifying result is that the key information is matched with target key information in the target deviation rectifying library, labeling a first label on the key information;
if the deviation rectifying result is that the key information is not matched with the target key information in the target deviation rectifying base, converting the key information into pinyin information, carrying out deviation rectifying processing on the key information again on the basis of the pinyin information and a preset database, and carrying out labeling processing on the key information according to the deviation rectifying result again.
According to an embodiment of the present invention, after the extracting key information from the text information based on the information extraction model, the method further includes:
automatically checking whether the format of the extracted key information is preset with format requirements;
if yes, determining the service type according to the extracted key information, and matching a preset deviation rectifying library according to the service type to obtain a target deviation rectifying library;
and if not, discarding the key information.
According to an embodiment of the present invention, the converting the audio data into text information using an automatic speech recognition technique includes:
performing voiceprint recognition on the audio data by using an automatic voice recognition technology to obtain at least one voice characteristic;
acquiring the voice time and the voice frequency spectrum of each voice feature in the audio data;
and determining target voice data from the audio data according to the voice duration and the voice frequency spectrum so as to perform text conversion on the target voice data to obtain the text information.
According to an embodiment of the present invention, the extracting key information of the text information based on the information extraction model includes:
vectorizing the text information through a word embedding layer to obtain a vector sequence corresponding to the text information;
extracting the features of the vector sequence through an attention mechanism layer to obtain a feature vector containing context information;
and deep feature extraction is carried out on the feature vector through the pooling layer and the convolution layer to obtain key information and a corresponding information type label.
In order to solve the technical problem, the invention adopts another technical scheme that: provided is a voice information extraction device including:
the acquisition module is used for acquiring audio data generated based on human-computer interaction and converting the audio data into text information by utilizing an automatic voice recognition technology;
the extraction module is used for extracting key information of the text information based on an information extraction model;
the matching module is used for determining the service type according to the extracted key information, matching a preset deviation rectifying base according to the service type and obtaining a target deviation rectifying base;
and the deviation rectifying module is used for calling the target deviation rectifying library to rectify the key information to obtain the rectified target key information.
In order to solve the technical problems, the invention adopts another technical scheme that: there is provided a computer device comprising: the device comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the voice information extraction method when executing the computer program.
In order to solve the technical problems, the invention adopts another technical scheme that: there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described speech information extraction method.
The beneficial effects of the invention are: the service type is determined according to the extracted key information, and the target deviation rectifying base is obtained by matching the preset deviation rectifying base according to the service type, so that the data processing amount can be reduced, and the deviation rectifying efficiency is improved; the key information is corrected by calling the target correction library to obtain corrected target key information, so that the accuracy of voice information extraction can be improved, and the problem of inaccurate voice recognition is solved.
Drawings
FIG. 1 is a flow chart of a method for extracting speech information according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of step S103 in the voice information extracting method according to the embodiment of the present invention;
fig. 3 is a schematic flowchart of step S104 in the voice information extracting method according to the embodiment of the present invention;
fig. 4 is a schematic structural diagram of a speech information extraction device according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a computer apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a computer storage medium according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first", "second" and "third" in the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise. In the embodiment of the present invention, all directional indicators (such as up, down, left, right, front, rear \8230;) are used only to explain the relative positional relationship between the components, the motion situation, etc. in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indicator is changed accordingly. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
Fig. 1 is a flowchart illustrating a voice information extracting method according to an embodiment of the present invention. It should be noted that the method of the present invention is not limited to the flow sequence shown in fig. 1 if the results are substantially the same. As shown in fig. 1, the method comprises the steps of:
step S101: and audio data generated based on human-computer interaction is acquired, and the audio data is converted into text information by utilizing an automatic voice recognition technology.
In step S101, in a car insurance business scenario, the intelligent robot may be used to communicate with the client in the processes of insurance application and insurance verification, so as to obtain key information required by insurance application and insurance verification, for example: the name, the mobile phone number, the address, the reservation time, the vehicle type, the product information and the like of the client. The audio data generated based on the human-computer interaction at least comprises voice data of the intelligent robot and the client, and the embodiment converts the audio data into text information through an Automatic Speech Recognition (ASR) technology.
The aim of the automatic speech recognition technology is to enable a computer to "listen" to continuous speech spoken by different people, namely a commonly known "speech dictation machine", which is a technology for realizing conversion from "sound" to "text". Automatic Speech Recognition is also known as Speech Recognition (Speech Recognition) or computer Speech Recognition (Com puter Speech Recognition). When man-machine interaction is performed, noise exists in the speech background of a client, for example, more than two people can speak, or background music is played, or a movie and television series is played, or a noisy environment exists, and the like, which can cause text conversion to be performed on the background noise, and therefore, the obtained text information is inaccurate, and accurate recognition cannot be performed on the speech of a target client. In addition, the speech recognition technology has the problem of inaccurate recognition due to the influence of subjective or objective factors such as accent and oral habits.
In some embodiments, the text may be converted based on the manner in which the noise is removed. Specifically, voice print recognition is carried out on audio data by utilizing an automatic voice recognition technology to obtain at least one sound characteristic; acquiring voice time and voice frequency spectrum of each voice feature in the audio data; and determining target voice data from the audio data according to the voice time length and the voice frequency spectrum so as to perform text conversion on the target voice data to obtain text information.
Each sound feature of this embodiment refers to a sound feature of a speaker, such as a voiceprint feature, a timbre feature, and the like, which can distinguish a voice feature of a person, and then determines a voice duration and a voice spectrum of voice data in audio data corresponding to each sound feature, where the longer the voice duration is, the higher the possibility that a voice corresponding to the voice data is a voice of a target client is. Only the target client can answer the question of the intelligent robot correspondingly, but the duration of the background voice is generally short; in a special case, the duration of the background voice is longer than the voice of the target client, for example, the background music will last from beginning to end, and in this case, determining whether the voice feature is the voice feature of the target client according to the voice duration has some limitations, so that the voice spectrum of the voice data corresponding to each voice feature can be obtained. The sound spectrum is a waveform diagram representing a segment of sound with low pitch, and generally, a large-waveform place is large in sound and a small-waveform place is small in sound, so that a segment of waveform is large and lasts for a period of time, which is generally the human voice of a target client. In the embodiment, after the target voice data is obtained by combining the voice frequency spectrum and the voice duration, the text conversion is performed on the target voice data, and the target voice data obtained in this way has high possibility of being a target client, and can remove background noise to a certain extent and improve the accuracy of text conversion.
Step S102: and extracting key information of the text information based on the information extraction model.
In step S102, the network structure of the information extraction model includes a word embedding layer, an attention mechanism layer, a pooling layer, and a convolution layer, where the word embedding layer is configured to perform vectorization processing on text information to obtain a vector sequence corresponding to the text information, and specifically, a BERT word embedding module may be used to perform vectorization operation on the text information from three dimensions of a word, and a sentence to obtain a vector sequence formed by a word embedding vector, and a sentence embedding vector; the attention device system layer is used for extracting the characteristics of the vector sequence to obtain a characteristic vector containing context information, specifically, the attention device system module is used for extracting the characteristics of the vector sequence from three dimensions of characters, words and sentences at the same time to obtain corresponding character characteristics, word characteristics and sentence characteristics, and then the character characteristics, the word characteristics and the sentence characteristics are spliced to obtain a spliced characteristic vector; deep feature extraction is carried out on the feature vectors through the pooling layer and the convolution layer to obtain key information and corresponding information type labels, and specifically, context information compression is carried out on the spliced feature vectors through equal-length convolution to obtain key semantic information; based on the key semantic information, deep feature extraction is performed through a circulation network unit comprising a pooling layer and a convolution layer, and the key information and a corresponding information type label are predicted according to the extracted deep feature, for example, the key information is a string of numbers, the corresponding information type label is a mobile phone number, or the key information is Li Ming, and the corresponding information type label is a name.
Step S103: determining the service type according to the extracted key information, and matching a preset deviation rectifying base according to the service type to obtain a target deviation rectifying base.
In step S103, the service type may be set according to a service scenario, for example, in an insurance service scenario, the service type may include insurance product information, insured vehicle information (such as vehicle type and license plate information), insurance policy mailing address, insured person information (such as name, telephone number, identity information), and the like. The deviation rectifying library is a database storing customer history information records, and the customer history information comprises but is not limited to insurance product information, insured vehicle information, insurance policy mailing address, insured person information and the like. And assuming that the service type of the key information belongs to the insurance policy mailing address, the target deviation rectifying database is an address database.
In an embodiment, the deviation rectifying library supports addition and modification, so that addition and modification of deviation rectifying rules are realized, and the application range and user experience are expanded. Referring to fig. 2, step S103 further includes the following steps:
step S201: and matching the service type with a preset deviation rectifying base, and judging whether a matched deviation rectifying base exists or not.
Step S202: if yes, determining the matched preset deviation rectifying base as a target deviation rectifying base.
Step S203: if not, a new calling interface is created according to the service type to add a new deviation rectifying base, and the new deviation rectifying base is determined as a target deviation rectifying base.
Specifically, if no preset deviation rectifying base matched with the service type exists, a deviation rectifying rule is added according to the service type, a new deviation rectifying base and a corresponding calling interface are created, the new deviation rectifying base is determined as a target deviation rectifying base, the interface is directly called in the subsequent steps to realize the deviation rectifying of the key information, and the accuracy of the key information is further improved.
Compared with the method for correcting the text information by using the artificial intelligence model, the method for correcting the text information based on the artificial intelligence model has the advantages that on one hand, correction is performed through the correction library, when a new correction rule appears, a newly-added correction library is supported, the correction accuracy rate is improved, the artificial intelligence model does not need to be retrained, the time cost is saved, on the other hand, correction is performed through extracted key information instead of directly correcting the text information, the correction data processing amount is reduced, and therefore the correction efficiency is improved.
Step S104: and calling a target deviation rectifying library to carry out deviation rectifying treatment on the key information to obtain the rectified target key information.
In step S104, invoking the target deviation rectifying library to rectify the key information substantially searches whether the target deviation rectifying library stores the same or related information as the key information, if so, the same or related information is the target key information, if not, secondary deviation rectifying is required, and a result after secondary deviation rectifying is taken as the target key information.
Further, referring to fig. 3, step S104 further includes the following steps:
step S301: and calling a target deviation rectifying library to carry out deviation rectifying treatment on the key information.
Specifically, whether the target deviation rectifying library stores the information which is the same as or related to the key information is searched.
Step S302: and marking the key information according to the deviation rectifying result.
Specifically, if the deviation rectification processing result is that the key information is matched with the target key information in the target deviation rectification library, the first label is marked on the key information.
If the deviation rectifying result is that the key information is not matched with the target key information in the target deviation rectifying base, converting the key information into pinyin information, carrying out deviation rectifying processing on the key information again based on the pinyin information and a preset database, and carrying out labeling processing on the key information according to the deviation rectifying result again. And further, matching the pinyin information with a preset database, judging whether a keyword matched with the pinyin information exists in the preset database, if so, replacing the key information by using the keyword, labeling the replaced key information with a second label, and if not, labeling the key information with a third label. For example, the key information is a "xxx number of a certain city according to a stop way", wherein the "according to the stop way" does not match a corresponding address in the target deviation rectifying base, the "according to the stop way" is converted into an "an ting lu", pinyin and keyword matching is performed in the ES database, if the "ann pavilion way" is matched, the key information is corrected according to the matching result, the corrected key information is labeled with a second label, and if the "ann pavilion way" is not matched, the original key information is labeled with a third label. The embodiment carries out secondary deviation correction by converting pinyin, and can solve the problem of inaccurate voice recognition caused by accent.
According to the embodiment, the key information is classified and identified through the deviation rectifying result, so that the system can conveniently carry out differentiation statistics and related on the UI layer according to different types of information, and the user experience is improved.
Step S303: and acquiring the corrected target key information according to the labeling processing result.
The first label of the embodiment is correct and available information and is directly provided for a user to use; the second label is standby information to be confirmed and gives prompt reference when the user uses the second label; the third label is the information which is not available in error and is directly discarded.
The voice information extraction method of the embodiment of the invention determines the service type according to the extracted key information, and matches the preset correction library according to the service type to obtain the target correction library, thereby reducing the data processing amount and improving the correction efficiency; the key information is corrected by calling the target correction library to obtain corrected target key information, so that the accuracy of voice information extraction can be improved, and the problem of inaccurate voice recognition is solved.
In an implementable embodiment, after step S102, the following steps are further included:
and automatically checking whether the format of the extracted key information is preset with format requirements.
If yes, go to step S103; if not, the key information is abandoned.
Some key information has a fixed format, for example, the fixed format of a mobile phone number has 11 digits, the fixed format of a name has a first last name and a second last name, and the like.
According to the embodiment, before the extracted key information is corrected, format screening is carried out, and the key information of which the format does not meet the requirement is eliminated, so that the correction data processing amount is reduced, the correction efficiency is improved, and the accuracy of the key information is improved.
The scheme of the invention can be applied to the financial field, such as an insurance business scene, and can also be applied to the artificial intelligence field, in particular to the fields of intelligent voice, deep learning and the like. Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Fig. 4 is a schematic structural diagram of a speech information extraction apparatus according to an embodiment of the present invention. As shown in fig. 6, the apparatus 40 includes an obtaining module 41, an extracting module 42, a matching module 43, and a deskewing module 44.
The obtaining module 41 is configured to obtain audio data generated based on human-computer interaction, and convert the audio data into text information by using an automatic speech recognition technology;
the extraction module 42 is used for extracting key information of the text information based on the information extraction model
The matching module 43 is configured to determine a service type according to the extracted key information, and match a preset deviation correction library according to the service type to obtain a target deviation correction library;
the deviation rectifying module 44 is configured to call a target deviation rectifying library to perform deviation rectifying processing on the key information, so as to obtain the rectified target key information.
Further, the step of the obtaining module 41 performing the step of converting the audio data into the text information by using the automatic speech recognition technology includes:
performing voiceprint recognition on the audio data by using an automatic voice recognition technology to obtain at least one voice characteristic;
acquiring voice time and voice frequency spectrum of each voice characteristic in audio data;
and determining target voice data from the audio data according to the voice time length and the voice frequency spectrum so as to perform text conversion on the target voice data to obtain text information.
Further, the step of the extraction module 42 performing key information extraction on the text information based on the information extraction model includes:
vectorizing the text information through a word embedding layer to obtain a vector sequence corresponding to the text information;
extracting the features of the vector sequence through an attention mechanism layer to obtain a feature vector containing context information;
and deep feature extraction is carried out on the feature vector through the pooling layer and the convolution layer to obtain key information and a corresponding information type label.
Further, the step of the matching module 43 determining the service type according to the extracted key information, and matching the preset correction library according to the service type to obtain the target correction library includes:
matching the service type with a preset deviation rectifying base, and judging whether a matched deviation rectifying base exists or not;
if so, determining the matched preset deviation rectifying base as a target deviation rectifying base;
if not, a new calling interface is created according to the service type to add a new deviation rectifying base, and the new deviation rectifying base is determined as a target deviation rectifying base.
Further, the step of invoking the target deviation rectifying library to perform deviation rectifying processing on the key information by the deviation rectifying module 44 to obtain the corrected target key information includes:
calling a target correction library to perform correction processing on the key information;
labeling the key information according to the deviation rectification processing result;
and acquiring corrected target key information according to the labeling processing result.
Furthermore, labeling the key information according to the deviation rectification processing result further comprises:
if the deviation rectification processing result is that the key information is matched with the target key information in the target deviation rectification library, labeling a first label on the key information;
if the deviation rectifying result is that the key information is not matched with the target key information in the target deviation rectifying base, converting the key information into pinyin information, carrying out deviation rectifying processing on the key information again based on the pinyin information and a preset database, and carrying out labeling processing on the key information according to the deviation rectifying result again.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention. As shown in fig. 5, the computer device 50 includes a processor 51 and a memory 52 coupled to the processor 51.
The memory 52 stores program instructions for implementing the voice information extraction method according to any of the above embodiments.
The processor 51 is operative to execute program instructions stored in the memory 52 to retrieve speech information.
The processor 51 may also be referred to as a CPU (Central Processing Unit). The processor 51 may be an integrated circuit chip having signal processing capabilities. The processor 51 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a computer storage medium according to an embodiment of the present invention. The computer storage medium of the embodiment of the present invention stores a program file 61 capable of implementing all the methods described above, wherein the program file 61 may be stored in the computer storage medium in the form of a software product, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned computer storage media comprise: various media capable of storing program codes, such as a usb disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices such as a computer, a server, a mobile phone, and a tablet.
In the several embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method for extracting voice information, comprising:
acquiring audio data generated based on human-computer interaction, and converting the audio data into text information by using an automatic voice recognition technology;
extracting key information of the text information based on an information extraction model;
determining a service type according to the extracted key information, and matching a preset deviation rectifying base according to the service type to obtain a target deviation rectifying base;
and calling the target deviation rectifying library to carry out deviation rectifying treatment on the key information to obtain the rectified target key information.
2. The method for extracting voice information according to claim 1, wherein the obtaining a target correction library by matching a preset correction library according to the service type further comprises:
matching the service type with a preset deviation rectifying base, and judging whether a matched deviation rectifying base exists or not;
if so, determining the matched preset deviation rectifying base as the target deviation rectifying base;
if not, a new calling interface is created according to the service type to add a new deviation rectifying base, and the new deviation rectifying base is determined as the target deviation rectifying base.
3. The method for extracting voice information according to claim 1, wherein the invoking the target deviation rectifying library to rectify the key information to obtain rectified target key information further comprises:
calling the target deviation rectifying library to carry out deviation rectifying processing on the key information;
labeling the key information according to a deviation rectification processing result;
and acquiring the corrected target key information according to the labeling processing result.
4. The method for extracting voice information according to claim 3, wherein the labeling the key information according to the deviation rectification processing result further comprises:
if the deviation rectifying result is that the key information is matched with target key information in the target deviation rectifying library, marking a first label on the key information;
if the deviation rectifying result indicates that the key information is not matched with the target key information in the target deviation rectifying base, converting the key information into pinyin information, carrying out deviation rectifying processing on the key information again based on the pinyin information and a preset database, and carrying out labeling processing on the key information according to the deviation rectifying result again.
5. The method of claim 1, wherein after extracting key information from the text information based on the information extraction model, the method further comprises:
automatically checking whether the format of the extracted key information is preset with format requirements;
if yes, determining the service type according to the extracted key information, and matching a preset deviation rectifying base according to the service type to obtain a target deviation rectifying base;
and if not, discarding the key information.
6. The method of claim 1, wherein the converting the audio data into text information using automatic speech recognition comprises:
performing voiceprint recognition on the audio data by using an automatic voice recognition technology to obtain at least one voice characteristic;
acquiring voice time and voice frequency spectrum of each voice feature in the audio data;
and determining target voice data from the audio data according to the voice duration and the voice frequency spectrum so as to perform text conversion on the target voice data to obtain the text information.
7. The method of claim 1, wherein the extracting the key information of the text information based on the information extraction model comprises:
vectorizing the text information through a word embedding layer to obtain a vector sequence corresponding to the text information;
extracting the features of the vector sequence through an attention mechanism layer to obtain a feature vector containing context information;
and performing deep feature extraction on the feature vector through the pooling layer and the convolution layer to obtain key information and a corresponding information type label.
8. A speech information extraction apparatus characterized by comprising:
the acquisition module is used for acquiring audio data generated based on human-computer interaction and converting the audio data into text information by utilizing an automatic voice recognition technology;
the extraction module is used for extracting key information of the text information based on an information extraction model;
the matching module is used for determining the service type according to the extracted key information, and matching a preset deviation rectifying base according to the service type to obtain a target deviation rectifying base;
and the deviation rectifying module is used for calling the target deviation rectifying library to carry out deviation rectifying treatment on the key information to obtain the rectified target key information.
9. A computer device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the method for extracting speech information according to any of claims 1 to 7 when executing the computer program.
10. A computer storage medium on which a computer program is stored, the computer program, when being executed by a processor, implementing the speech information extraction method according to any one of claims 1-7.
CN202211438892.4A 2022-11-17 2022-11-17 Voice information extraction method, device, equipment and storage medium Pending CN115712699A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211438892.4A CN115712699A (en) 2022-11-17 2022-11-17 Voice information extraction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211438892.4A CN115712699A (en) 2022-11-17 2022-11-17 Voice information extraction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115712699A true CN115712699A (en) 2023-02-24

Family

ID=85233552

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211438892.4A Pending CN115712699A (en) 2022-11-17 2022-11-17 Voice information extraction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115712699A (en)

Similar Documents

Publication Publication Date Title
KR102222317B1 (en) Speech recognition method, electronic device, and computer storage medium
CN108847241B (en) Method for recognizing conference voice as text, electronic device and storage medium
US9437192B2 (en) Method and device of matching speech input to text
US8285546B2 (en) Method and system for identifying and correcting accent-induced speech recognition difficulties
CN107680588B (en) Intelligent voice navigation method, device and storage medium
CN109979450B (en) Information processing method and device and electronic equipment
CN112669842A (en) Man-machine conversation control method, device, computer equipment and storage medium
CN110717021A (en) Input text and related device for obtaining artificial intelligence interview
CN110503956B (en) Voice recognition method, device, medium and electronic equipment
CN112201275A (en) Voiceprint segmentation method, voiceprint segmentation device, voiceprint segmentation equipment and readable storage medium
US20140297255A1 (en) System and method for speech to speech translation using cores of a natural liquid architecture system
CN112466287B (en) Voice segmentation method, device and computer readable storage medium
CN111898363B (en) Compression method, device, computer equipment and storage medium for long and difficult text sentence
CN111508481B (en) Training method and device of voice awakening model, electronic equipment and storage medium
CN113051384A (en) User portrait extraction method based on conversation and related device
CN115691503A (en) Voice recognition method and device, electronic equipment and storage medium
CN115712699A (en) Voice information extraction method, device, equipment and storage medium
CN114528851A (en) Reply statement determination method and device, electronic equipment and storage medium
CN111970311B (en) Session segmentation method, electronic device and computer readable medium
CN113987202A (en) Knowledge graph-based interactive telephone calling method and device
CN111382322B (en) Method and device for determining similarity of character strings
CN112712793A (en) ASR (error correction) method based on pre-training model under voice interaction and related equipment
CN112786041A (en) Voice processing method and related equipment
CN111785259A (en) Information processing method and device and electronic equipment
CN111783471B (en) Semantic recognition method, device, equipment and storage medium for natural language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination