CN114613361A

CN114613361A - Voice feedback and interaction system and method

Info

Publication number: CN114613361A
Application number: CN202210134308.XA
Authority: CN
Inventors: 张学军; 李斌; 韦依妮; 张素素; 陈婧娴
Original assignee: Guangxi University
Current assignee: Guangxi University
Priority date: 2022-02-14
Filing date: 2022-02-14
Publication date: 2022-06-10
Anticipated expiration: 2042-02-14
Also published as: CN114613361B

Abstract

The invention relates to a voice feedback and interaction system and a method, which solve the technical problems of low efficiency and low accuracy, and the voice feedback and interaction system comprises the following components: the voice recognition device is connected with the voice acquisition device, and the voice feedback and interaction result output device is connected with the voice recognition device; the voice recognition device comprises a voice characteristic library storage unit and a processing unit, wherein the processing unit is used for establishing a historical statement voice characteristic diagram library, acquiring a target statement voice characteristic diagram of an interaction person in real time, acquiring an effective area of the target statement voice characteristic diagram, and then carrying out consistency comparison matching.

Description

Voice feedback and interaction system and method

Technical Field

The invention relates to the field of voice feedback and interaction, in particular to a voice feedback and interaction system and method.

Background

With the popularization of computer technology, people's lives have gradually entered the intelligent era nowadays. Not only computer, cell-phone, PAD, people's clothing and eating the square of walking all begin to use the intelligent technology that appears soon, smart television, intelligent navigation, intelligent house etc. and the intelligent technology will provide convenient and fast service in each aspect of people's life.

The intelligent voice interaction is a new generation interaction mode based on voice input, and a feedback result can be obtained by speaking. Typical application scenario-voice assistant. Since the introduction of SIRI from iPhone 4S, intelligent voice interactive applications have been rapidly developed. The typical intelligent voice interactive application in Chinese is as follows: wormhole voice assistants, news flying spots, have gained increasing user acceptance.

The existing voice feedback and interaction system and method have the problems of low efficiency and low precision. The invention provides a voice feedback and interaction system and a voice feedback and interaction method, which can solve the problem.

Disclosure of Invention

The invention aims to solve the technical problems of low efficiency and low precision in the prior art. The novel voice feedback and interaction system is provided, and has the characteristics of high efficiency and high precision.

In order to solve the technical problems, the technical scheme is as follows:

a voice feedback and interaction system, the voice feedback and interaction system comprising: the voice recognition device is connected with the voice acquisition device, and the voice feedback and interaction result output device is connected with the voice recognition device;

the voice recognition device comprises a voice characteristic database storage unit and a processing unit, wherein the processing unit executes the following steps:

step one, establishing a historical sentence voice feature map library, wherein the historical voice feature map is a sentence voice feature map drawn by extracting features of previously input or historically recorded sentence voice, and the sentence voice feature map comprises character, word and sentence feature maps;

secondly, extracting the characteristics of the sentence voice acquired by the voice acquisition device in real time, and drawing a target sentence voice characteristic diagram; the sentence voice feature map in the optional historical sentence voice feature map library is defined as a reference image, the target sentence voice feature map is defined as a target image,

step three, the target image I^CCarrying out binarization processing, wherein a value of 1 is defined as having voice characteristics, and a value of 0 is defined as having no voice characteristics; dividing the feature map after binarization processing into a grid map by adopting a unit grid, defining the initial point (x1, y1) of the grid map as an origin, defining the search matching step length as L, searching along the direction of x from the origin, recording the position and the value of the point if the searched value is 1, labeling in sequence, otherwise, continuing to search for matching;

step four, updating the point (x1, y1+ N L) as an original point, returning to execute the step three until the retrieval and matching in the x direction and the y direction are finished, and finishing the preliminary positioning retrieval and matching, wherein N is an integer, and L is a constant;

step five, sequentially taking out the points with the value of 1, updating the 1-value point taken out at the current time to the original point, updating the retrieval matching step length to be L/2, sequentially performing retrieval matching along the x direction, not performing retrieval matching on the points which are retrieved and matched before, automatically reducing the retrieval matching step length by half when the retrieval matching exceeds the range, continuing the retrieval matching until the step length is reduced to the minimum, defining the new 1-value point which appears in the retrieval matching process as a new point which needs to be subjected to y-direction retrieval matching, and executing the step six, otherwise, executing the step seven;

step six, the step length of retrieval matching is L/2 unchanged, retrieval matching is carried out in sequence along the y direction, points which are retrieved and matched before are not retrieved and matched, if the retrieval matching exceeds the range, the step length of the retrieval matching is automatically halved, the retrieval matching is continued until the step length is minimized, a new 1-value point which appears in the retrieval matching process is defined as a new point which needs to be retrieved and matched in the x direction, step five is executed, and otherwise, step seven is executed;

step seven, until no new point needs to be searched and matched, ending the search and matching, and collecting the areas of the searched and matched 1-value points as an effective target image;

step eight, carrying out consistency matching identification on the effective target image in a historical sentence voice feature gallery;

and step nine, outputting the recognition result through a voice feedback and interaction result output device to complete interaction.

The working principle of the invention is as follows: the speech feature recognition in the present invention is the overall recognition of the diagnostic image, and can have higher recognition efficiency. Meanwhile, the part of the interactor not speaking is provided in the voice characteristic diagram acquired in the time sequence, so that the efficiency is improved.

To increase the accuracy, the eighth step further includes an image checking process, including:

step 1, defining effective target image as

Defining a reference image in the speech feature map library of the optional historical statement as I^C；

Step 2, defining a reference image I^CAnd the target image after polar coordinate transformation

The relationship is as follows:

wherein alpha is_zIn order to be a scale-shift parameter,

is a rotational offset parameter;

step 3, calculating a reference image I^CProjection in radial direction in polar coordinate system

Target image

Projection in radial direction

Will K^C(i) And

taking logarithm to obtain LK^C(i) And

will LK^C(i) And

as a scale deviation parameter alpha_z；

Is K_i＝K_maxThe number of samples in the angular direction, ce () representing the smallest integer greater than or equal to the value in parentheses, fl () representing the largest integer less than or greater than the value in parentheses; size of the target image is 2K_max× 2K_max，n_r＝K_maxNumber of samples taken in radial direction, n_φ＝8K_iThe number of samples in the angle direction;

step 4, calculating a reference image I according to the scale deviation parameters in the step 3^CAnd a target image

In the radial direction and angleProjection of (2):

to pair

And

carrying out normalized calculation to calculate the translation amount of the highest point

According to

Calculating a rotational offset parameter

Step 5, rotating the offset parameter phi_zAnd a scale shift parameter α_zCarrying out correction on the target image according to the step A

Calculate e_zPosition point corresponding to minimum value

And finishing the image proofreading processing for the central point of the target image.

Further, the eight-step consistency matching identification further comprises:

step A, target image is processed

Taking the central point as a center to make a concentric circle, dividing the fingerprint image into B annular areas, and finally dividing each annular area into K fan-shaped areas, wherein K and B are predefined constants;

step B, calculating each sector S_sqSector fingerprint feature value V_sqθAs Code 1;

wherein, F_sqθ(x, y) is a sector area S_sqOf each pixel, P_sqθRepresenting a sector area S_sqAverage value of inner pixel gray values, n_sqIs a ring-shaped area S_sqThe number of columns, 0 < sq ≦ bxk-1, {0 °, (360 °/K), 2 ° (360 °/K), 3 ° (360 °/K),. or.. ≦ 180 ° };

step C, after the fingerprint image is rotated (180 degrees/K), repeating the step B, and extracting each sector S_sqSector fingerprint feature value V_sqθAs Code 2;

step E, respectively rotating Code1 and Code2 by R × (360 °/K) (R ═ 0,1,2.. K-1) to obtain Code1 'and Code 2';

and F, inputting the Code1 and the Code2, and the Code1 'and the Code 2' in the step E into a historical statement voice feature gallery for matching.

The invention also provides a voice feedback and interaction method, which is based on the voice feedback and interaction system, and comprises the following steps:

step one, a voice interactor outputs real-time sentence voice, and a voice acquisition device acquires the real-time sentence voice of the interactor; drawing a target sentence voice feature graph; the sentence voice feature map in the optional historical sentence voice feature map library is defined as a reference image, the target sentence voice feature map is defined as a target image,

step two, the processing unit calls a preset historical sentence voice feature map library stored in the storage unit, the historical voice feature map is a sentence voice feature map drawn by extracting features of previously input or historical recorded sentence voice, and the sentence voice feature map comprises character, word and sentence feature maps;

step six, the step length of the retrieval matching is unchanged at L/2, the retrieval matching is sequentially carried out along the y direction, the points which are retrieved and matched before are not retrieved and matched, if the retrieval matching exceeds the range, the step length of the retrieval matching is automatically halved, the retrieval matching is continued until the step length is minimized, and if a new 1-value point appears in the retrieval matching process, the new point which needs to be retrieved and matched in the x direction is defined as a new point, the step five is executed, otherwise, the step seven is executed;

and step nine, outputting the recognition result through a voice feedback and interaction result output device to complete voice interaction and feedback.

Further, the eighth step further includes an image collation process including:

step 1, defining effective target image as

The relationship is as follows:

wherein alpha is_zIn order to be a scale-shift parameter,

is a rotational offset parameter;

Target image

Projection in radial direction

Will K^C(i) And

taking logarithm to obtain LK^C(i) And

will LK^C(i) And

as a scale deviation parameter alpha_z；

Is K_i＝K_maxThe number of samples in the angular direction, ce () representing the smallest integer greater than or equal to the value in parentheses, fl () representing the largest integer less than or greater than the value in parentheses; size of the target image is 2K_max× 2K_max，n_r＝K_maxNumber of samples in radial direction, n_φ＝8K_iThe number of samples in the angle direction;

Projection in radial and angular directions:

to pair

And

According to

Calculating a rotational offset parameter

Calculate e_zPosition point corresponding to minimum value

Further, the eight-step consistency matching identification further comprises:

step A, target image is processed

wherein, F_sqθ(x, y) is a sector area S_sqOf each pixel, P_sqθIndicates a sector area S_sqAverage value of inner pixel gray values, n_sqIs a ring-shaped area S_sqThe number of columns, 0 < sq ≦ bxk-1, {0 °, (360 °/K), 2 ° (360 °/K), 3 ° (360 °/K),. or.. ≦ 180 ° };

step C, after rotating the fingerprint image (180 degrees/K), repeating the step B, and extracting each sector S_sqSector fingerprint feature value V_sqθAs Code 2;

The invention has the beneficial effects that: the invention converts the speech feature recognition into the integral recognition of the feature map, and has higher recognition efficiency. Meanwhile, the part of the interactor not speaking is provided in the voice characteristic diagram acquired in the time sequence, so that the efficiency is improved. In addition, the accuracy of feedback and interaction is improved through pre-intersection proofreading and positioning processing of the characteristic images.

Drawings

The invention is further illustrated by the following examples in conjunction with the drawings.

FIG. 1, a schematic diagram of a voice feedback and interaction system.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Example 1

The present embodiment provides a voice feedback and interaction system, as shown in fig. 1, the voice feedback and interaction system includes: the voice recognition device is connected with the voice acquisition device, and the voice feedback and interaction result output device is connected with the voice recognition device;

To increase the accuracy, preferably, the eighth step further includes an image checking process including:

step 1, defining effective target image as

The relationship is as follows:

wherein the content of the first and second substances,α_zin order to be a scale-shift parameter,

is a rotational offset parameter;

Target image

Projection in radial direction

Will K^C(i) And

taking logarithm to obtain LK^C(i) And

LK is to be^C(i) And

as a scale deviation parameter alpha_z；

Is K_i＝K_maxThe number of samples in the angular direction, ce () representing the smallest integer greater than or equal to the value in parentheses, fl () representing the largest integer less than or greater than the value in parentheses; size of the target image is 2K_max× 2K_max，n_r＝K_maxNumber of samples taken in radial direction, n_φ＝8K_iThe number of samples taken in the angular direction;

Projection in radial and angular directions:

to pair

And

According to

Calculating a rotational offset parameter

Step 5, offsetting the rotationParameter phi_zAnd a scale shift parameter α_zCarrying out correction on the target image according to the step A

Calculate e_zPosition point corresponding to minimum value

Preferably, the eight-consistency matching identification step further comprises:

step A, target image is processed

The embodiment also provides a voice feedback and interaction method, which is based on the voice feedback and interaction system, and the voice feedback and interaction method includes:

Preferably, the eighth step further includes an image collation process including:

step 1, defining effective target image as

The relationship is as follows:

wherein alpha is_zIn order to be a scale-shift parameter,

is a rotational offset parameter;

Target image

Projection in radial direction

Will K^C(i) And

taking logarithm to obtain LK^C(i) And

will LK^C(i) And

as a scale deviation parameter alpha_z；

Is K_i＝K_maxThe number of samples in the angular direction, ce () representing the smallest integer greater than or equal to the value in parentheses, fl () representing the largest integer less than or greater than the value in parentheses; eyes of a userThe size of the target image is 2K_max× 2K_max，n_r＝K_maxNumber of samples taken in radial direction, n_φ＝8K_iThe number of samples in the angle direction;

Projection in radial and angular directions:

to pair

And

carrying out normalization calculation to calculate the translation amount of the highest point

According to

Calculating a rotational offset parameter

Calculate e_zPosition point corresponding to minimum value

step A, target image is processed

The feature recognition of the speech in this embodiment is the overall recognition of the diagnostic graph, and can have higher recognition efficiency. Meanwhile, the part of the interactor not speaking is provided in the voice characteristic diagram acquired in the time sequence, so that the efficiency is improved. Meanwhile, the accuracy of feedback and interaction is improved by pre-intersection proofreading and positioning processing of the characteristic images.

Although the illustrative embodiments of the present invention have been described above to enable those skilled in the art to understand the present invention, the present invention is not limited to the scope of the embodiments, and it is apparent to those skilled in the art that all the inventive concepts using the present invention are protected as long as they can be changed within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims

1. A voice feedback and interaction system, characterized by: the voice feedback and interaction system comprises: the voice recognition device is connected with the voice acquisition device, and the voice feedback and interaction result output device is connected with the voice recognition device;

step two, performing feature extraction on the sentence voice acquired by the voice acquisition device in real time, and drawing a target sentence voice feature map; the sentence voice feature map in the optional historical sentence voice feature map library is defined as a reference image, the target sentence voice feature map is defined as a target image,

2. The voice feedback and interaction system of claim 1, wherein: the eighth step further includes an image collation process including:

step 1, defining effective target image as

The relationship is as follows:

wherein alpha is_zIn order to be a scale-shift parameter,

is a rotational offset parameter;

Target image

Projection in radial direction

Will K^C(i) And

taking logarithm to obtain LK^C(i) And

will LK^C(i) And

as a scale deviation parameter alpha_z；

Is K_i＝K_maxThe number of samples in the angular direction, ce () representing the smallest integer greater than or equal to the value in parentheses, fl () representing the largest integer less than or greater than the value in parentheses; size of the target image is 2K_max×2K_max，n_r＝K_maxNumber of samples taken in radial direction, n_φ＝8K_iThe number of samples in the angle direction;

Projection in radial and angular directions:

for is to

And

According to

Calculating a rotational offset parameter

Calculate e_zPosition point corresponding to minimum value

3. The voice feedback and interaction system of claim 1, wherein: the eight-consistency matching identification step further comprises:

step A, target image is processed

Using the central point as the center to make concentric circle, dividing the fingerprint image into B annular regions, and finally dividing each annular region into K fan-shaped regions, where K and B are both presetA constant of sense;

step B, calculating each sector S_sqSector fingerprint feature value V_sqθAs Codel;

4. A voice feedback and interaction method, characterized by: the voice feedback and interaction method is based on the voice feedback and interaction system of any one of claims 1 to 3, and the voice feedback and interaction method comprises the following steps:

step two, the processing unit calls a preset historical sentence voice feature map library stored in the storage unit, the historical voice feature map is a sentence voice feature map which is drawn by extracting features of the pre-input or historical recorded sentence voice, and the sentence voice feature map comprises character, word and sentence feature maps;

step five, sequentially taking out the points with the value of 1, updating the 1-value point taken out at the current time to the original point, updating the search matching step length to be L/2, sequentially performing search matching along the x direction, not performing search matching on the points which are searched and matched before, automatically halving the search matching step length when the search matching exceeds the range, continuing search matching until the step length is minimized, defining the new 1-value point in the search matching process as a new point which needs to be subjected to y-direction search matching, and executing the step six, otherwise, executing the step seven;

5. The voice feedback and interaction method of claim 4, wherein: the eighth step further includes an image collation process including:

step 1, defining effective target image as

The relationship is as follows:

wherein alpha is_zIn order to be a scale-shift parameter,

is a rotational offset parameter;

Target image

Projection in radial direction

Will K^C(i) And

taking logarithm to obtain LK^C(i) And

will LK^C(i) And

as a scale deviation parameter alpha_z；

Projection in radial and angular directions:

to pair

And

According to

Calculating a rotational offset parameter

Calculate e_zPosition point corresponding to minimum value

6. The voice feedback and interaction method of claim 4, wherein: the eight-consistency matching identification step further comprises:

step A, target image is processed

wherein, F_sqθ(x, y) is a sector area S_sqOf each pixel, P_sqθRepresenting a sector area S_sqAverage value of inner pixel gray values, n_sqIs a ring-shaped region S_sqThe number of columns, 0 < sq ≦ bxk-1, {0 °, (360 °/K), 2 ° (360 °/K), 3 ° (360 °/K),. or.. ≦ 180 ° };

and F, inputting the Code1 and the Code2, and the Code1 'and the Code 2' in the step E into a historical sentence voice feature gallery for matching.