CN116186201A

CN116186201A - Government affair item searching method and device based on voice recognition, medium and equipment

Info

Publication number: CN116186201A
Application number: CN202310148316.4A
Authority: CN
Inventors: 邢亮; 张兆勇; 潘震
Original assignee: Inspur Software Co Ltd
Current assignee: Inspur Software Co Ltd
Priority date: 2023-02-21
Filing date: 2023-02-21
Publication date: 2023-05-30

Abstract

The invention provides a government affair item searching method and device based on voice recognition, a medium and equipment. The method comprises the following steps: acquiring voice for searching government affair data; performing recognition processing on the voice to obtain a corresponding representation text; performing full-text search according to the representation text to obtain a plurality of government affair data; calculating the similarity between the representation text and each piece of government affair data; and outputting government affair item data corresponding to the highest similarity as a search result. The invention can improve the convenience of query operation, and especially provides convenience for the elderly population to query data. And full text retrieval is performed, so that text similarity calculation is performed according to the input representation text, and the query efficiency and accuracy can be improved.

Description

Government affair item searching method and device based on voice recognition, medium and equipment

Technical Field

The invention relates to the technical field of voice recognition, in particular to a government affair item searching method, a government affair item searching device, a government affair item searching medium and government affair item searching equipment based on voice recognition.

Background

In recent years, the improvement of government services such as the Internet and government services is greatly promoted, the government field is widely related, the government knowledge regulations are huge in quantity, all government matters cannot be displayed in a proper mode, the capability of accurately positioning the government matters is urgently needed to be improved, and the current mainstream is to manually input and inquire the government matters to implement a list, so that the inquiry efficiency is low, the time consumption is long, and the operation of the old is more inconvenient.

Disclosure of Invention

Aiming at least one technical problem, the embodiment of the invention provides a government affair item searching method, a government affair item searching device, a government affair item searching medium and government affair item searching equipment based on voice recognition.

According to a first aspect, a government affair item searching method based on voice recognition provided by an embodiment of the present invention includes:

acquiring voice for searching government affair data;

performing recognition processing on the voice to obtain a corresponding representation text;

performing full-text search according to the representation text to obtain a plurality of government affair data;

calculating the similarity between the representation text and each piece of government affair data;

and outputting government affair item data corresponding to the highest similarity as a search result.

According to a second aspect, a government affair item searching device based on voice recognition provided by an embodiment of the present invention includes:

the voice acquisition module is used for acquiring voice for searching government affair data;

the text representation module is used for carrying out recognition processing on the voice to obtain a corresponding representation text;

the item searching module is used for carrying out full-text searching according to the representation text to obtain a plurality of government affair item data;

the similarity calculation module is used for calculating the similarity between the representation text and each piece of government affair data;

and the result output module is used for outputting government affair item data corresponding to the highest similarity as a search result.

According to a third aspect, embodiments of the present invention provide a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method provided by the first aspect.

According to a fourth aspect, a computing device provided by an embodiment of the present invention includes a memory and a processor, where the memory stores executable code, and the processor implements the method provided by the first aspect when executing the executable code.

According to the government affair item searching method, device, medium and equipment based on voice recognition, voice for searching government affair item data is firstly obtained, and then recognition processing is carried out on the voice to obtain corresponding representation text. And then carrying out full-text search according to the representation text to obtain a plurality of pieces of government affair data, calculating the similarity between the representation text and each piece of government affair data, and outputting the government affair data corresponding to the highest similarity as a search result. The embodiment of the invention optimizes the human-computer interaction process through the voice recognition technology, improves the convenience of query operation, and particularly provides convenience for the elderly population to query data. And full text retrieval is performed, so that text similarity calculation is performed according to the input representation text, and the query efficiency and accuracy can be improved.

Drawings

Fig. 1 is a flow chart of a government affair searching method based on voice recognition according to an embodiment of the invention.

Detailed Description

In a first aspect, an embodiment of the present invention provides a government affair item searching method based on voice recognition, referring to fig. 1, the method includes steps S110 to S150 as follows:

s110, acquiring voice for searching government affair data;

it is understood that the voice is a voice input by the user and is used for searching the desired government affair data.

S120, carrying out recognition processing on the voice to obtain a corresponding representation text;

it will be appreciated that the recognition process is performed on the speech to obtain text, which is actually converting the information in speech form to that in text.

In one embodiment, S120 may specifically include the following steps S121 to S125:

s121, preprocessing the voice;

in one embodiment, S121 may specifically include: the speech is converted into a spectrogram. That is, the preprocessing is to convert the voice signal in the time domain into a frequency domain map, i.e., a frequency domain signal. For example, the conversion from time domain to frequency domain is performed by preprocessing such as framing, windowing, pre-emphasis, etc.

S122, extracting features of the preprocessed voice to obtain a feature vector sequence;

it will be appreciated that feature extraction is performed on a pre-processed speech, such as the aforementioned spectrogram, to obtain a sequence of feature vectors characterizing the speech, at least one feature vector being included in the sequence of feature vectors.

In one embodiment, S122 may specifically include: and carrying out feature extraction on the spectrogram according to the Mel cepstrum coefficient to obtain the feature vector sequence.

It is understood that Mel-frequency cepstral coefficients, english Mel-Frequency CepstrumCoefficients, abbreviated MFCCs, are linear transforms of the logarithmic energy spectrum based on the nonlinear Mel scale of sound frequencies. The mel-frequency cepstrum coefficient is a coefficient constituting the mel-frequency cepstrum. It is derived from the cepstrum of the audio piece. The band division of the mel-frequency cepstrum is equally divided on the mel scale, which more closely approximates the human auditory system than the linearly spaced bands used in normal cepstrum. Such a non-linear representation may provide a better representation of the sound signal in a number of fields.

S123, converting the feature vector sequence into a pinyin sequence;

it is understood that the sequence of feature vectors is converted into a sequence of pinyin forms.

In one embodiment, S123 may specifically include: converting the feature vector sequence into the pinyin sequence through an acoustic model; wherein the acoustic model comprises a hidden Markov model.

Wherein, the hidden Markov model is a deep neural network-hidden Markov model, the English is Deep Neural Networks-Hidden Markov Model, and the hidden Markov model is abbreviated as DNN-HMM.

The hidden Markov model is trained in advance, input information of the hidden Markov model is a characteristic vector sequence of voice, and the input information is a pinyin sequence of the voice. The hidden Markov model converts the feature vector sequence into at least one possible pinyin sequence, and then selects the pinyin sequence with the highest score from the plurality of pinyin sequences, thereby outputting a final pinyin sequence.

Wherein, because of different dialects and speaking habits, a section of voice may correspond to various pinyin sequences. The hidden markov model outputs a pinyin sequence in which the likelihood is greatest.

S124, converting the pinyin sequence into a phrase sequence;

it is understood that a pinyin may correspond to a plurality of words or phrases, for example, the phrase corresponding to a shoijiao may be a sleep, a dumpling, a cistern, etc., and the phrase sequences may be combined differently due to different pause or split positions in the pinyin sequences.

In one embodiment, S124 may specifically include: converting the pinyin sequence into the phrase sequence through a voice model; wherein the speech model comprises an N-gram model.

Wherein the N-Gram statistical language model is the N-Gram statistical language model.

It can be understood that when the N-gram model needs to convert the pinyin sequence without space into the chinese character strings, the chinese character strings are possible to be multiple, so that the N-gram model can score each chinese character string, and then select one chinese character string with the highest score as the phrase sequence, thereby realizing the conversion from pinyin to chinese character, needing no manual selection by the user, and avoiding the problem of repeated codes of a plurality of chinese characters corresponding to the same pinyin.

The N-element model is obtained through training in advance, input information is a pinyin sequence, and output information is a phrase sequence.

S125, decoding the phrase sequence to obtain the representation text.

That is, the phrase sequence is decoded into text, which is used as the representation text of the speech as the input text for the subsequent full-text search query.

S130, performing full-text search according to the representation text to obtain a plurality of government affair data;

that is, the search is performed in the database of government affair data using the presentation text as a search term, and a plurality of pieces of related government affair data are obtained.

S140, calculating the similarity between the representation text and each piece of government affair data;

it can be understood that, in order to output a piece of government affair data with the highest correlation, it is necessary to select from the pieces of government affair data searched in S130, specifically, select a piece of government affair data with the highest correlation based on the similarity.

In one embodiment, S140 may specifically include S141 to S143:

s141, preprocessing the representation text;

the preprocessing may include word segmentation, error correction, normalization, and the like.

S142, identifying the named entities in the preprocessed representation text, and forming a keyword array by the identified named entities;

where named entities refer to core words, phrases, etc. that represent text.

That is, after keywords such as core words and phrases are identified from the preprocessed representation text, the keywords are formed into a keyword array.

In one embodiment, S142 may specifically include:

and identifying named entities in the preprocessed representation text by adopting a two-way long-short term memory neural network model and/or a conditional random field deep learning model.

The bidirectional long-short-term memory neural network model is Long Short Term Memory neural network model, which is called LSTM neural network model for short.

The conditional random field deep learning model Conditional Random Field is abbreviated as a CRF deep learning model, and the CRF deep learning model is a basic model in natural language processing.

The LSTM neural network model and the CRF deep learning model are obtained through training in advance, input information is a representation text, and output information is a keyword in the representation text.

S143, calculating the text similarity between the keyword array and each piece of government affair data, and taking the text similarity as the similarity between the representation text and the piece of government affair data.

After the keyword group is obtained in S142, the similarity between the keyword group and each piece of government affair data searched in S130 is calculated, for example, the similarity may be calculated by means of cosine similarity, euclidean distance, and the like. The larger the cosine similarity is, the larger the text similarity between the keyword array and the government affair data is. The larger the Euclidean distance is, the smaller the text similarity between the keyword array and the government affair data is.

And S150, outputting government affair item data corresponding to the highest similarity as a search result.

For example, after the similarity between the presentation text and each piece of government matter data is calculated, the pieces of government matter data are ordered in the order of the similarity from large to small, the government matter data corresponding to the highest similarity, which is the first government matter data, are arranged, the government matter data corresponding to the highest similarity are output as search results, and then the user can see the search results.

It can be understood that S120 is actually an automatic speech recognition technology, i.e. Automatic Speech Recognition, abbreviated as ASR, which is a technology for converting human speech into text, and performs feature extraction on an audio signal to be analyzed to provide an appropriate feature vector sequence for an acoustic model; converting the characteristic vector sequence into a pinyin sequence according to the acoustic characteristics in the acoustic model; converting the spelling sequence into phrase sequence by language model; and finally decoding the phrase sequence according to the existing dictionary to obtain the representation text.

Wherein, S140-S150 uses natural language to communicate with computer, english is Natural Language Processing, namely natural language processing. Preprocessing the input representation text, identifying a named entity of the preprocessed representation text, determining a core word or phrase of the representation text, calculating the similarity, and finally sorting and selecting government affair item data corresponding to the highest similarity for output.

It can be understood that for the query function of massive government matters, the government matters to be queried and the implementation list thereof need to be precisely positioned, so that convenience is provided for the query of the data of the aged population. Therefore, the embodiment of the invention provides a government affair data searching method based on voice recognition, which optimizes the man-machine interaction process through the voice recognition technology and improves the convenience of inquiry operation; and then, through a full text retrieval technology, text similarity calculation is carried out according to the input representation text, so that the query efficiency and accuracy are improved.

In a second aspect, an embodiment of the present invention provides a government affair item searching device based on voice recognition, including:

In one embodiment, the text representation module includes:

the first preprocessing unit is used for preprocessing the voice;

the feature extraction unit is used for extracting features of the preprocessed voice to obtain a feature vector sequence;

the first conversion unit is used for converting the characteristic vector sequence into a pinyin sequence;

the second conversion unit is used for converting the Pinyin sequence into a phrase sequence;

and the sequence decoding unit is used for decoding the phrase sequence to obtain the representation text.

Further, the first preprocessing unit is specifically configured to: converting the voice into a spectrogram; correspondingly, the feature extraction unit is specifically configured to: and carrying out feature extraction on the spectrogram according to the Mel cepstrum coefficient to obtain the feature vector sequence.

In one embodiment, the first conversion unit is specifically configured to: converting the feature vector sequence into the pinyin sequence through an acoustic model; wherein the acoustic model comprises a hidden Markov model.

In one embodiment, the second conversion unit is specifically configured to: converting the pinyin sequence into the phrase sequence through a voice model; wherein the speech model comprises an N-gram model.

In one embodiment, the similarity calculation module includes:

a second preprocessing unit, configured to preprocess the representation text;

the entity identification unit is used for identifying the named entities in the preprocessed representation text and forming a keyword array from the identified named entities;

and the similarity calculation unit is used for calculating the text similarity between the keyword array and each piece of government affair data, and taking the text similarity as the similarity between the representation text and the piece of government affair data.

Further, the entity identification unit is specifically configured to: and identifying named entities in the preprocessed representation text by adopting a two-way long-short term memory neural network model and/or a conditional random field deep learning model.

It may be understood that, for explanation, specific implementation, beneficial effects, examples, etc. of the content in the apparatus provided by the embodiment of the present invention, reference may be made to corresponding parts in the method provided in the first aspect, which are not repeated herein.

In a third aspect, embodiments of the present invention provide a computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the method provided in the first aspect.

Specifically, a system or apparatus provided with a storage medium on which a software program code realizing the functions of any of the above embodiments is stored, and a computer (or CPU or MPU) of the system or apparatus may be caused to read out and execute the program code stored in the storage medium.

In this case, the program code itself read from the storage medium may realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code form part of the present invention.

Examples of the storage medium for providing the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD+RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer by a communication network.

Further, it should be apparent that the functions of any of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform part or all of the actual operations based on the instructions of the program code.

Further, it is understood that the program code read out by the storage medium is written into a memory provided in an expansion board inserted into a computer or into a memory provided in an expansion module connected to the computer, and then a CPU or the like mounted on the expansion board or the expansion module is caused to perform part and all of actual operations based on instructions of the program code, thereby realizing the functions of any of the above embodiments.

It may be appreciated that, for explanation, specific implementation, beneficial effects, examples, etc. of the content in the computer readable medium provided by the embodiment of the present invention, reference may be made to corresponding parts in the method provided in the first aspect, and details are not repeated herein.

In a fourth aspect, one embodiment of the present specification provides a computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, performs the method of any one of the embodiments of the specification.

It may be appreciated that, for explanation, specific implementation, beneficial effects, examples, etc. of the content in the computing device provided by the embodiment of the present invention, reference may be made to corresponding parts in the method provided in the first aspect, which are not repeated herein.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.

Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in hardware, software, a pendant, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention in further detail, and are not to be construed as limiting the scope of the invention, but are merely intended to cover any modifications, equivalents, improvements, etc. based on the teachings of the invention.

Claims

1. A government affair item searching method based on voice recognition is characterized by comprising the following steps:

acquiring voice for searching government affair data;

2. The method of claim 1, wherein the performing recognition processing on the speech to obtain corresponding representation text includes:

preprocessing the voice;

extracting features of the preprocessed voice to obtain a feature vector sequence;

converting the characteristic vector sequence into a pinyin sequence;

converting the pinyin sequence into a phrase sequence;

and decoding the phrase sequence to obtain the representation text.

3. The method of claim 2, wherein the preprocessing the speech comprises: converting the voice into a spectrogram;

correspondingly, the feature extraction is performed on the preprocessed voice to obtain a feature vector sequence, which comprises the following steps: and carrying out feature extraction on the spectrogram according to the Mel cepstrum coefficient to obtain the feature vector sequence.

4. The method of claim 2, wherein said converting the sequence of feature vectors into a sequence of pinyin comprises:

converting the feature vector sequence into the pinyin sequence through an acoustic model; wherein the acoustic model comprises a hidden Markov model.

5. The method of claim 2, wherein said converting the pinyin sequence to a phrase sequence comprises:

converting the pinyin sequence into the phrase sequence through a voice model; wherein the speech model comprises an N-gram model.

6. The method of claim 1, wherein said calculating the similarity between the representative text and each piece of government matter data comprises:

preprocessing the representation text;

identifying the named entities in the preprocessed representation text, and forming a keyword array from the identified named entities;

and calculating the text similarity between the keyword array and each piece of government affair data, and taking the text similarity as the similarity between the representation text and the piece of government affair data.

7. The method of claim 6, wherein the identifying named entities in the preprocessed representation text comprises:

8. A government affair item searching device based on voice recognition, comprising:

9. A computer readable storage medium, having stored thereon a computer program which, when executed in a computer, causes the computer to perform a method implementing any of claims 1 to 7.

10. A computing device comprising a memory and a processor, the memory having executable code stored therein, the processor, when executing the executable code, implementing the method of any one of claims 1-7.