CN114613361A - Voice feedback and interaction system and method - Google Patents
Voice feedback and interaction system and method Download PDFInfo
- Publication number
- CN114613361A CN114613361A CN202210134308.XA CN202210134308A CN114613361A CN 114613361 A CN114613361 A CN 114613361A CN 202210134308 A CN202210134308 A CN 202210134308A CN 114613361 A CN114613361 A CN 114613361A
- Authority
- CN
- China
- Prior art keywords
- voice
- matching
- value
- point
- target image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 51
- 238000000034 method Methods 0.000 title claims abstract description 32
- 230000001915 proofreading effect Effects 0.000 claims description 8
- 238000002372 labelling Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 abstract description 10
- 230000002452 interceptive effect Effects 0.000 description 2
- 241000238558 Eucarida Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a voice feedback and interaction system and a method, which solve the technical problems of low efficiency and low accuracy, and the voice feedback and interaction system comprises the following components: the voice recognition device is connected with the voice acquisition device, and the voice feedback and interaction result output device is connected with the voice recognition device; the voice recognition device comprises a voice characteristic library storage unit and a processing unit, wherein the processing unit is used for establishing a historical statement voice characteristic diagram library, acquiring a target statement voice characteristic diagram of an interaction person in real time, acquiring an effective area of the target statement voice characteristic diagram, and then carrying out consistency comparison matching.
Description
Technical Field
The invention relates to the field of voice feedback and interaction, in particular to a voice feedback and interaction system and method.
Background
With the popularization of computer technology, people's lives have gradually entered the intelligent era nowadays. Not only computer, cell-phone, PAD, people's clothing and eating the square of walking all begin to use the intelligent technology that appears soon, smart television, intelligent navigation, intelligent house etc. and the intelligent technology will provide convenient and fast service in each aspect of people's life.
The intelligent voice interaction is a new generation interaction mode based on voice input, and a feedback result can be obtained by speaking. Typical application scenario-voice assistant. Since the introduction of SIRI from iPhone 4S, intelligent voice interactive applications have been rapidly developed. The typical intelligent voice interactive application in Chinese is as follows: wormhole voice assistants, news flying spots, have gained increasing user acceptance.
The existing voice feedback and interaction system and method have the problems of low efficiency and low precision. The invention provides a voice feedback and interaction system and a voice feedback and interaction method, which can solve the problem.
Disclosure of Invention
The invention aims to solve the technical problems of low efficiency and low precision in the prior art. The novel voice feedback and interaction system is provided, and has the characteristics of high efficiency and high precision.
In order to solve the technical problems, the technical scheme is as follows:
a voice feedback and interaction system, the voice feedback and interaction system comprising: the voice recognition device is connected with the voice acquisition device, and the voice feedback and interaction result output device is connected with the voice recognition device;
the voice recognition device comprises a voice characteristic database storage unit and a processing unit, wherein the processing unit executes the following steps:
step one, establishing a historical sentence voice feature map library, wherein the historical voice feature map is a sentence voice feature map drawn by extracting features of previously input or historically recorded sentence voice, and the sentence voice feature map comprises character, word and sentence feature maps;
secondly, extracting the characteristics of the sentence voice acquired by the voice acquisition device in real time, and drawing a target sentence voice characteristic diagram; the sentence voice feature map in the optional historical sentence voice feature map library is defined as a reference image, the target sentence voice feature map is defined as a target image,
step three, the target image ICCarrying out binarization processing, wherein a value of 1 is defined as having voice characteristics, and a value of 0 is defined as having no voice characteristics; dividing the feature map after binarization processing into a grid map by adopting a unit grid, defining the initial point (x1, y1) of the grid map as an origin, defining the search matching step length as L, searching along the direction of x from the origin, recording the position and the value of the point if the searched value is 1, labeling in sequence, otherwise, continuing to search for matching;
step four, updating the point (x1, y1+ N L) as an original point, returning to execute the step three until the retrieval and matching in the x direction and the y direction are finished, and finishing the preliminary positioning retrieval and matching, wherein N is an integer, and L is a constant;
step five, sequentially taking out the points with the value of 1, updating the 1-value point taken out at the current time to the original point, updating the retrieval matching step length to be L/2, sequentially performing retrieval matching along the x direction, not performing retrieval matching on the points which are retrieved and matched before, automatically reducing the retrieval matching step length by half when the retrieval matching exceeds the range, continuing the retrieval matching until the step length is reduced to the minimum, defining the new 1-value point which appears in the retrieval matching process as a new point which needs to be subjected to y-direction retrieval matching, and executing the step six, otherwise, executing the step seven;
step six, the step length of retrieval matching is L/2 unchanged, retrieval matching is carried out in sequence along the y direction, points which are retrieved and matched before are not retrieved and matched, if the retrieval matching exceeds the range, the step length of the retrieval matching is automatically halved, the retrieval matching is continued until the step length is minimized, a new 1-value point which appears in the retrieval matching process is defined as a new point which needs to be retrieved and matched in the x direction, step five is executed, and otherwise, step seven is executed;
step seven, until no new point needs to be searched and matched, ending the search and matching, and collecting the areas of the searched and matched 1-value points as an effective target image;
step eight, carrying out consistency matching identification on the effective target image in a historical sentence voice feature gallery;
and step nine, outputting the recognition result through a voice feedback and interaction result output device to complete interaction.
The working principle of the invention is as follows: the speech feature recognition in the present invention is the overall recognition of the diagnostic image, and can have higher recognition efficiency. Meanwhile, the part of the interactor not speaking is provided in the voice characteristic diagram acquired in the time sequence, so that the efficiency is improved.
To increase the accuracy, the eighth step further includes an image checking process, including:
step 1, defining effective target image asDefining a reference image in the speech feature map library of the optional historical statement as IC;
Step 2, defining a reference image ICAnd the target image after polar coordinate transformationThe relationship is as follows:
step 3, calculating a reference image ICProjection in radial direction in polar coordinate system Target imageProjection in radial directionWill KC(i) Andtaking logarithm to obtain LKC(i) Andwill LKC(i) Andas a scale deviation parameter alphaz;
Is Ki=KmaxThe number of samples in the angular direction, ce () representing the smallest integer greater than or equal to the value in parentheses, fl () representing the largest integer less than or greater than the value in parentheses; size of the target image is 2Kmax× 2Kmax,nr=KmaxNumber of samples taken in radial direction, nφ=8KiThe number of samples in the angle direction;
step 4, calculating a reference image I according to the scale deviation parameters in the step 3CAnd a target imageIn the radial direction and angleProjection of (2):
to pairAndcarrying out normalized calculation to calculate the translation amount of the highest pointAccording toCalculating a rotational offset parameter
Step 5, rotating the offset parameter phizAnd a scale shift parameter αzCarrying out correction on the target image according to the step ACalculate ezPosition point corresponding to minimum valueAnd finishing the image proofreading processing for the central point of the target image.
Further, the eight-step consistency matching identification further comprises:
step A, target image is processedTaking the central point as a center to make a concentric circle, dividing the fingerprint image into B annular areas, and finally dividing each annular area into K fan-shaped areas, wherein K and B are predefined constants;
step B, calculating each sector SsqSector fingerprint feature value VsqθAs Code 1;
wherein, Fsqθ(x, y) is a sector area SsqOf each pixel, PsqθRepresenting a sector area SsqAverage value of inner pixel gray values, nsqIs a ring-shaped area SsqThe number of columns, 0 < sq ≦ bxk-1, {0 °, (360 °/K), 2 ° (360 °/K), 3 ° (360 °/K),. or.. ≦ 180 ° };
step C, after the fingerprint image is rotated (180 degrees/K), repeating the step B, and extracting each sector SsqSector fingerprint feature value VsqθAs Code 2;
step E, respectively rotating Code1 and Code2 by R × (360 °/K) (R ═ 0,1,2.. K-1) to obtain Code1 'and Code 2';
and F, inputting the Code1 and the Code2, and the Code1 'and the Code 2' in the step E into a historical statement voice feature gallery for matching.
The invention also provides a voice feedback and interaction method, which is based on the voice feedback and interaction system, and comprises the following steps:
step one, a voice interactor outputs real-time sentence voice, and a voice acquisition device acquires the real-time sentence voice of the interactor; drawing a target sentence voice feature graph; the sentence voice feature map in the optional historical sentence voice feature map library is defined as a reference image, the target sentence voice feature map is defined as a target image,
step two, the processing unit calls a preset historical sentence voice feature map library stored in the storage unit, the historical voice feature map is a sentence voice feature map drawn by extracting features of previously input or historical recorded sentence voice, and the sentence voice feature map comprises character, word and sentence feature maps;
step three, the target image ICCarrying out binarization processing, wherein a value of 1 is defined as having voice characteristics, and a value of 0 is defined as having no voice characteristics; dividing the feature map after binarization processing into a grid map by adopting a unit grid, defining the initial point (x1, y1) of the grid map as an origin, defining the search matching step length as L, searching along the direction of x from the origin, recording the position and the value of the point if the searched value is 1, labeling in sequence, otherwise, continuing to search for matching;
step four, updating the point (x1, y1+ N L) as an original point, returning to execute the step three until the retrieval and matching in the x direction and the y direction are finished, and finishing the preliminary positioning retrieval and matching, wherein N is an integer, and L is a constant;
step five, sequentially taking out the points with the value of 1, updating the 1-value point taken out at the current time to the original point, updating the retrieval matching step length to be L/2, sequentially performing retrieval matching along the x direction, not performing retrieval matching on the points which are retrieved and matched before, automatically reducing the retrieval matching step length by half when the retrieval matching exceeds the range, continuing the retrieval matching until the step length is reduced to the minimum, defining the new 1-value point which appears in the retrieval matching process as a new point which needs to be subjected to y-direction retrieval matching, and executing the step six, otherwise, executing the step seven;
step six, the step length of the retrieval matching is unchanged at L/2, the retrieval matching is sequentially carried out along the y direction, the points which are retrieved and matched before are not retrieved and matched, if the retrieval matching exceeds the range, the step length of the retrieval matching is automatically halved, the retrieval matching is continued until the step length is minimized, and if a new 1-value point appears in the retrieval matching process, the new point which needs to be retrieved and matched in the x direction is defined as a new point, the step five is executed, otherwise, the step seven is executed;
step seven, until no new point needs to be searched and matched, ending the search and matching, and collecting the areas of the searched and matched 1-value points as an effective target image;
step eight, carrying out consistency matching identification on the effective target image in a historical sentence voice feature gallery;
and step nine, outputting the recognition result through a voice feedback and interaction result output device to complete voice interaction and feedback.
Further, the eighth step further includes an image collation process including:
step 1, defining effective target image asDefining a reference image in the speech feature map library of the optional historical statement as IC;
Step 2, defining a reference image ICAnd the target image after polar coordinate transformationThe relationship is as follows:
step 3, calculating a reference image ICProjection in radial direction in polar coordinate system Target imageProjection in radial directionWill KC(i) Andtaking logarithm to obtain LKC(i) Andwill LKC(i) Andas a scale deviation parameter alphaz;
Is Ki=KmaxThe number of samples in the angular direction, ce () representing the smallest integer greater than or equal to the value in parentheses, fl () representing the largest integer less than or greater than the value in parentheses; size of the target image is 2Kmax× 2Kmax,nr=KmaxNumber of samples in radial direction, nφ=8KiThe number of samples in the angle direction;
step 4, calculating a reference image I according to the scale deviation parameters in the step 3CAnd a target imageProjection in radial and angular directions:
to pairAndcarrying out normalized calculation to calculate the translation amount of the highest pointAccording toCalculating a rotational offset parameter
Step 5, rotating the offset parameter phizAnd a scale shift parameter αzCarrying out correction on the target image according to the step ACalculate ezPosition point corresponding to minimum valueAnd finishing the image proofreading processing for the central point of the target image.
Further, the eight-step consistency matching identification further comprises:
step A, target image is processedTaking the central point as a center to make a concentric circle, dividing the fingerprint image into B annular areas, and finally dividing each annular area into K fan-shaped areas, wherein K and B are predefined constants;
step B, calculating each sector SsqSector fingerprint feature value VsqθAs Code 1;
wherein, Fsqθ(x, y) is a sector area SsqOf each pixel, PsqθIndicates a sector area SsqAverage value of inner pixel gray values, nsqIs a ring-shaped area SsqThe number of columns, 0 < sq ≦ bxk-1, {0 °, (360 °/K), 2 ° (360 °/K), 3 ° (360 °/K),. or.. ≦ 180 ° };
step C, after rotating the fingerprint image (180 degrees/K), repeating the step B, and extracting each sector SsqSector fingerprint feature value VsqθAs Code 2;
step E, respectively rotating Code1 and Code2 by R × (360 °/K) (R ═ 0,1,2.. K-1) to obtain Code1 'and Code 2';
and F, inputting the Code1 and the Code2, and the Code1 'and the Code 2' in the step E into a historical statement voice feature gallery for matching.
The invention has the beneficial effects that: the invention converts the speech feature recognition into the integral recognition of the feature map, and has higher recognition efficiency. Meanwhile, the part of the interactor not speaking is provided in the voice characteristic diagram acquired in the time sequence, so that the efficiency is improved. In addition, the accuracy of feedback and interaction is improved through pre-intersection proofreading and positioning processing of the characteristic images.
Drawings
The invention is further illustrated by the following examples in conjunction with the drawings.
FIG. 1, a schematic diagram of a voice feedback and interaction system.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1
The present embodiment provides a voice feedback and interaction system, as shown in fig. 1, the voice feedback and interaction system includes: the voice recognition device is connected with the voice acquisition device, and the voice feedback and interaction result output device is connected with the voice recognition device;
the voice recognition device comprises a voice characteristic database storage unit and a processing unit, wherein the processing unit executes the following steps:
step one, establishing a historical sentence voice feature map library, wherein the historical voice feature map is a sentence voice feature map drawn by extracting features of previously input or historically recorded sentence voice, and the sentence voice feature map comprises character, word and sentence feature maps;
secondly, extracting the characteristics of the sentence voice acquired by the voice acquisition device in real time, and drawing a target sentence voice characteristic diagram; the sentence voice feature map in the optional historical sentence voice feature map library is defined as a reference image, the target sentence voice feature map is defined as a target image,
step three, the target image ICCarrying out binarization processing, wherein a value of 1 is defined as having voice characteristics, and a value of 0 is defined as having no voice characteristics; dividing the feature map after binarization processing into a grid map by adopting a unit grid, defining the initial point (x1, y1) of the grid map as an origin, defining the search matching step length as L, searching along the direction of x from the origin, recording the position and the value of the point if the searched value is 1, labeling in sequence, otherwise, continuing to search for matching;
step four, updating the point (x1, y1+ N L) as an original point, returning to execute the step three until the retrieval and matching in the x direction and the y direction are finished, and finishing the preliminary positioning retrieval and matching, wherein N is an integer, and L is a constant;
step five, sequentially taking out the points with the value of 1, updating the 1-value point taken out at the current time to the original point, updating the retrieval matching step length to be L/2, sequentially performing retrieval matching along the x direction, not performing retrieval matching on the points which are retrieved and matched before, automatically reducing the retrieval matching step length by half when the retrieval matching exceeds the range, continuing the retrieval matching until the step length is reduced to the minimum, defining the new 1-value point which appears in the retrieval matching process as a new point which needs to be subjected to y-direction retrieval matching, and executing the step six, otherwise, executing the step seven;
step six, the step length of retrieval matching is L/2 unchanged, retrieval matching is carried out in sequence along the y direction, points which are retrieved and matched before are not retrieved and matched, if the retrieval matching exceeds the range, the step length of the retrieval matching is automatically halved, the retrieval matching is continued until the step length is minimized, a new 1-value point which appears in the retrieval matching process is defined as a new point which needs to be retrieved and matched in the x direction, step five is executed, and otherwise, step seven is executed;
step seven, until no new point needs to be searched and matched, ending the search and matching, and collecting the areas of the searched and matched 1-value points as an effective target image;
step eight, carrying out consistency matching identification on the effective target image in a historical sentence voice feature gallery;
and step nine, outputting the recognition result through a voice feedback and interaction result output device to complete interaction.
The working principle of the invention is as follows: the speech feature recognition in the present invention is the overall recognition of the diagnostic image, and can have higher recognition efficiency. Meanwhile, the part of the interactor not speaking is provided in the voice characteristic diagram acquired in the time sequence, so that the efficiency is improved.
To increase the accuracy, preferably, the eighth step further includes an image checking process including:
step 1, defining effective target image asDefining a reference image in the speech feature map library of the optional historical statement as IC;
Step 2, defining a reference image ICAnd the target image after polar coordinate transformationThe relationship is as follows:
wherein the content of the first and second substances,αzin order to be a scale-shift parameter,is a rotational offset parameter;
step 3, calculating a reference image ICProjection in radial direction in polar coordinate system Target imageProjection in radial directionWill KC(i) Andtaking logarithm to obtain LKC(i) AndLK is to beC(i) Andas a scale deviation parameter alphaz;
Is Ki=KmaxThe number of samples in the angular direction, ce () representing the smallest integer greater than or equal to the value in parentheses, fl () representing the largest integer less than or greater than the value in parentheses; size of the target image is 2Kmax× 2Kmax,nr=KmaxNumber of samples taken in radial direction, nφ=8KiThe number of samples taken in the angular direction;
step 4, calculating a reference image I according to the scale deviation parameters in the step 3CAnd a target imageProjection in radial and angular directions:
to pairAndcarrying out normalized calculation to calculate the translation amount of the highest pointAccording toCalculating a rotational offset parameter
Step 5, offsetting the rotationParameter phizAnd a scale shift parameter αzCarrying out correction on the target image according to the step ACalculate ezPosition point corresponding to minimum valueAnd finishing the image proofreading processing for the central point of the target image.
Preferably, the eight-consistency matching identification step further comprises:
step A, target image is processedTaking the central point as a center to make a concentric circle, dividing the fingerprint image into B annular areas, and finally dividing each annular area into K fan-shaped areas, wherein K and B are predefined constants;
step B, calculating each sector SsqSector fingerprint feature value VsqθAs Code 1;
wherein, Fsqθ(x, y) is a sector area SsqOf each pixel, PsqθRepresenting a sector area SsqAverage value of inner pixel gray values, nsqIs a ring-shaped area SsqThe number of columns, 0 < sq ≦ bxk-1, {0 °, (360 °/K), 2 ° (360 °/K), 3 ° (360 °/K),. or.. ≦ 180 ° };
step C, after rotating the fingerprint image (180 degrees/K), repeating the step B, and extracting each sector SsqSector fingerprint feature value VsqθAs Code 2;
step E, respectively rotating Code1 and Code2 by R × (360 °/K) (R ═ 0,1,2.. K-1) to obtain Code1 'and Code 2';
and F, inputting the Code1 and the Code2, and the Code1 'and the Code 2' in the step E into a historical statement voice feature gallery for matching.
The embodiment also provides a voice feedback and interaction method, which is based on the voice feedback and interaction system, and the voice feedback and interaction method includes:
step one, a voice interactor outputs real-time sentence voice, and a voice acquisition device acquires the real-time sentence voice of the interactor; drawing a target sentence voice feature graph; the sentence voice feature map in the optional historical sentence voice feature map library is defined as a reference image, the target sentence voice feature map is defined as a target image,
step two, the processing unit calls a preset historical sentence voice feature map library stored in the storage unit, the historical voice feature map is a sentence voice feature map drawn by extracting features of previously input or historical recorded sentence voice, and the sentence voice feature map comprises character, word and sentence feature maps;
step three, the target image ICCarrying out binarization processing, wherein a value of 1 is defined as having voice characteristics, and a value of 0 is defined as having no voice characteristics; dividing the feature map after binarization processing into a grid map by adopting a unit grid, defining the initial point (x1, y1) of the grid map as an origin, defining the search matching step length as L, searching along the direction of x from the origin, recording the position and the value of the point if the searched value is 1, labeling in sequence, otherwise, continuing to search for matching;
step four, updating the point (x1, y1+ N L) as an original point, returning to execute the step three until the retrieval and matching in the x direction and the y direction are finished, and finishing the preliminary positioning retrieval and matching, wherein N is an integer, and L is a constant;
step five, sequentially taking out the points with the value of 1, updating the 1-value point taken out at the current time to the original point, updating the retrieval matching step length to be L/2, sequentially performing retrieval matching along the x direction, not performing retrieval matching on the points which are retrieved and matched before, automatically reducing the retrieval matching step length by half when the retrieval matching exceeds the range, continuing the retrieval matching until the step length is reduced to the minimum, defining the new 1-value point which appears in the retrieval matching process as a new point which needs to be subjected to y-direction retrieval matching, and executing the step six, otherwise, executing the step seven;
step six, the step length of retrieval matching is L/2 unchanged, retrieval matching is carried out in sequence along the y direction, points which are retrieved and matched before are not retrieved and matched, if the retrieval matching exceeds the range, the step length of the retrieval matching is automatically halved, the retrieval matching is continued until the step length is minimized, a new 1-value point which appears in the retrieval matching process is defined as a new point which needs to be retrieved and matched in the x direction, step five is executed, and otherwise, step seven is executed;
step seven, until no new point needs to be searched and matched, ending the search and matching, and collecting the areas of the searched and matched 1-value points as an effective target image;
step eight, carrying out consistency matching identification on the effective target image in a historical sentence voice feature gallery;
and step nine, outputting the recognition result through a voice feedback and interaction result output device to complete voice interaction and feedback.
Preferably, the eighth step further includes an image collation process including:
step 1, defining effective target image asDefining a reference image in the speech feature map library of the optional historical statement as IC;
Step 2, defining a reference image ICAnd the target image after polar coordinate transformationThe relationship is as follows:
step 3, calculating a reference image ICProjection in radial direction in polar coordinate system Target imageProjection in radial directionWill KC(i) Andtaking logarithm to obtain LKC(i) Andwill LKC(i) Andas a scale deviation parameter alphaz;
Is Ki=KmaxThe number of samples in the angular direction, ce () representing the smallest integer greater than or equal to the value in parentheses, fl () representing the largest integer less than or greater than the value in parentheses; eyes of a userThe size of the target image is 2Kmax× 2Kmax,nr=KmaxNumber of samples taken in radial direction, nφ=8KiThe number of samples in the angle direction;
step 4, calculating a reference image I according to the scale deviation parameters in the step 3CAnd a target imageProjection in radial and angular directions:
to pairAndcarrying out normalization calculation to calculate the translation amount of the highest pointAccording toCalculating a rotational offset parameter
Step 5, rotating the offset parameter phizAnd a scale shift parameter αzCarrying out correction on the target image according to the step ACalculate ezPosition point corresponding to minimum valueAnd finishing the image proofreading processing for the central point of the target image.
Preferably, the eight-consistency matching identification step further comprises:
step A, target image is processedTaking the central point as a center to make a concentric circle, dividing the fingerprint image into B annular areas, and finally dividing each annular area into K fan-shaped areas, wherein K and B are predefined constants;
step B, calculating each sector SsqSector fingerprint feature value VsqθAs Code 1;
wherein, Fsqθ(x, y) is a sector area SsqOf each pixel, PsqθIndicates a sector area SsqAverage value of inner pixel gray values, nsqIs a ring-shaped area SsqThe number of columns, 0 < sq ≦ bxk-1, {0 °, (360 °/K), 2 ° (360 °/K), 3 ° (360 °/K),. or.. ≦ 180 ° };
step C, after rotating the fingerprint image (180 degrees/K), repeating the step B, and extracting each sector SsqSector fingerprint feature value VsqθAs Code 2;
step E, respectively rotating Code1 and Code2 by R × (360 °/K) (R ═ 0,1,2.. K-1) to obtain Code1 'and Code 2';
and F, inputting the Code1 and the Code2, and the Code1 'and the Code 2' in the step E into a historical statement voice feature gallery for matching.
The feature recognition of the speech in this embodiment is the overall recognition of the diagnostic graph, and can have higher recognition efficiency. Meanwhile, the part of the interactor not speaking is provided in the voice characteristic diagram acquired in the time sequence, so that the efficiency is improved. Meanwhile, the accuracy of feedback and interaction is improved by pre-intersection proofreading and positioning processing of the characteristic images.
Although the illustrative embodiments of the present invention have been described above to enable those skilled in the art to understand the present invention, the present invention is not limited to the scope of the embodiments, and it is apparent to those skilled in the art that all the inventive concepts using the present invention are protected as long as they can be changed within the spirit and scope of the present invention as defined and defined by the appended claims.
Claims (6)
1. A voice feedback and interaction system, characterized by: the voice feedback and interaction system comprises: the voice recognition device is connected with the voice acquisition device, and the voice feedback and interaction result output device is connected with the voice recognition device;
the voice recognition device comprises a voice characteristic database storage unit and a processing unit, wherein the processing unit executes the following steps:
step one, establishing a historical sentence voice feature map library, wherein the historical voice feature map is a sentence voice feature map drawn by extracting features of previously input or historically recorded sentence voice, and the sentence voice feature map comprises character, word and sentence feature maps;
step two, performing feature extraction on the sentence voice acquired by the voice acquisition device in real time, and drawing a target sentence voice feature map; the sentence voice feature map in the optional historical sentence voice feature map library is defined as a reference image, the target sentence voice feature map is defined as a target image,
step three, the target image ICCarrying out binarization processing, wherein a value of 1 is defined as having voice characteristics, and a value of 0 is defined as having no voice characteristics; dividing the feature map after binarization processing into a grid map by adopting a unit grid, defining the initial point (x1, y1) of the grid map as an origin, defining the search matching step length as L, searching along the direction of x from the origin, recording the position and the value of the point if the searched value is 1, labeling in sequence, otherwise, continuing to search for matching;
step four, updating the point (x1, y1+ N L) as an original point, returning to execute the step three until the retrieval and matching in the x direction and the y direction are finished, and finishing the preliminary positioning retrieval and matching, wherein N is an integer, and L is a constant;
step five, sequentially taking out the points with the value of 1, updating the 1-value point taken out at the current time to the original point, updating the retrieval matching step length to be L/2, sequentially performing retrieval matching along the x direction, not performing retrieval matching on the points which are retrieved and matched before, automatically reducing the retrieval matching step length by half when the retrieval matching exceeds the range, continuing the retrieval matching until the step length is reduced to the minimum, defining the new 1-value point which appears in the retrieval matching process as a new point which needs to be subjected to y-direction retrieval matching, and executing the step six, otherwise, executing the step seven;
step six, the step length of retrieval matching is L/2 unchanged, retrieval matching is carried out in sequence along the y direction, points which are retrieved and matched before are not retrieved and matched, if the retrieval matching exceeds the range, the step length of the retrieval matching is automatically halved, the retrieval matching is continued until the step length is minimized, a new 1-value point which appears in the retrieval matching process is defined as a new point which needs to be retrieved and matched in the x direction, step five is executed, and otherwise, step seven is executed;
step seven, until no new point needs to be searched and matched, ending the search and matching, and collecting the areas of the searched and matched 1-value points as an effective target image;
step eight, carrying out consistency matching identification on the effective target image in a historical sentence voice feature gallery;
and step nine, outputting the recognition result through a voice feedback and interaction result output device to complete interaction.
2. The voice feedback and interaction system of claim 1, wherein: the eighth step further includes an image collation process including:
step 1, defining effective target image asDefining a reference image in the speech feature map library of the optional historical statement as IC;
Step 2, defining a reference image ICAnd the target image after polar coordinate transformationThe relationship is as follows:
step 3, calculating a reference image ICProjection in radial direction in polar coordinate system Target imageProjection in radial directionWill KC(i) Andtaking logarithm to obtain LKC(i) Andwill LKC(i) Andas a scale deviation parameter alphaz;
Is Ki=KmaxThe number of samples in the angular direction, ce () representing the smallest integer greater than or equal to the value in parentheses, fl () representing the largest integer less than or greater than the value in parentheses; size of the target image is 2Kmax×2Kmax,nr=KmaxNumber of samples taken in radial direction, nφ=8KiThe number of samples in the angle direction;
step 4, calculating a reference image I according to the scale deviation parameters in the step 3CAnd a target imageProjection in radial and angular directions:
for is toAndcarrying out normalized calculation to calculate the translation amount of the highest pointAccording to Calculating a rotational offset parameter
3. The voice feedback and interaction system of claim 1, wherein: the eight-consistency matching identification step further comprises:
step A, target image is processedUsing the central point as the center to make concentric circle, dividing the fingerprint image into B annular regions, and finally dividing each annular region into K fan-shaped regions, where K and B are both presetA constant of sense;
step B, calculating each sector SsqSector fingerprint feature value VsqθAs Codel;
wherein, Fsqθ(x, y) is a sector area SsqOf each pixel, PsqθRepresenting a sector area SsqAverage value of inner pixel gray values, nsqIs a ring-shaped area SsqThe number of columns, 0 < sq ≦ bxk-1, {0 °, (360 °/K), 2 ° (360 °/K), 3 ° (360 °/K),. or.. ≦ 180 ° };
step C, after rotating the fingerprint image (180 degrees/K), repeating the step B, and extracting each sector SsqSector fingerprint feature value VsqθAs Code 2;
step E, respectively rotating Code1 and Code2 by R × (360 °/K) (R ═ 0,1,2.. K-1) to obtain Code1 'and Code 2';
and F, inputting the Code1 and the Code2, and the Code1 'and the Code 2' in the step E into a historical statement voice feature gallery for matching.
4. A voice feedback and interaction method, characterized by: the voice feedback and interaction method is based on the voice feedback and interaction system of any one of claims 1 to 3, and the voice feedback and interaction method comprises the following steps:
step one, a voice interactor outputs real-time sentence voice, and a voice acquisition device acquires the real-time sentence voice of the interactor; drawing a target sentence voice feature graph; the sentence voice feature map in the optional historical sentence voice feature map library is defined as a reference image, the target sentence voice feature map is defined as a target image,
step two, the processing unit calls a preset historical sentence voice feature map library stored in the storage unit, the historical voice feature map is a sentence voice feature map which is drawn by extracting features of the pre-input or historical recorded sentence voice, and the sentence voice feature map comprises character, word and sentence feature maps;
step three, the target image ICCarrying out binarization processing, wherein a value of 1 is defined as having voice characteristics, and a value of 0 is defined as having no voice characteristics; dividing the feature map after binarization processing into a grid map by adopting a unit grid, defining the initial point (x1, y1) of the grid map as an origin, defining the search matching step length as L, searching along the direction of x from the origin, recording the position and the value of the point if the searched value is 1, labeling in sequence, otherwise, continuing to search for matching;
step four, updating the point (x1, y1+ N L) as an original point, returning to execute the step three until the retrieval and matching in the x direction and the y direction are finished, and finishing the preliminary positioning retrieval and matching, wherein N is an integer, and L is a constant;
step five, sequentially taking out the points with the value of 1, updating the 1-value point taken out at the current time to the original point, updating the search matching step length to be L/2, sequentially performing search matching along the x direction, not performing search matching on the points which are searched and matched before, automatically halving the search matching step length when the search matching exceeds the range, continuing search matching until the step length is minimized, defining the new 1-value point in the search matching process as a new point which needs to be subjected to y-direction search matching, and executing the step six, otherwise, executing the step seven;
step six, the step length of retrieval matching is L/2 unchanged, retrieval matching is carried out in sequence along the y direction, points which are retrieved and matched before are not retrieved and matched, if the retrieval matching exceeds the range, the step length of the retrieval matching is automatically halved, the retrieval matching is continued until the step length is minimized, a new 1-value point which appears in the retrieval matching process is defined as a new point which needs to be retrieved and matched in the x direction, step five is executed, and otherwise, step seven is executed;
step seven, until no new point needs to be searched and matched, ending the search and matching, and collecting the areas of the searched and matched 1-value points as an effective target image;
step eight, carrying out consistency matching identification on the effective target image in a historical sentence voice feature gallery;
and step nine, outputting the recognition result through a voice feedback and interaction result output device to complete voice interaction and feedback.
5. The voice feedback and interaction method of claim 4, wherein: the eighth step further includes an image collation process including:
step 1, defining effective target image asDefining a reference image in the speech feature map library of the optional historical statement as IC;
Step 2, defining a reference image ICAnd the target image after polar coordinate transformationThe relationship is as follows:
step 3, calculating a reference image ICProjection in radial direction in polar coordinate system Target imageProjection in radial directionWill KC(i) Andtaking logarithm to obtain LKC(i) Andwill LKC(i) Andas a scale deviation parameter alphaz;
Is Ki=KmaxThe number of samples in the angular direction, ce () representing the smallest integer greater than or equal to the value in parentheses, fl () representing the largest integer less than or greater than the value in parentheses; size of the target image is 2Kmax×2Kmax,nr=KmaxNumber of samples taken in radial direction, nφ=8KiThe number of samples in the angle direction;
step 4, calculating a reference image I according to the scale deviation parameters in the step 3CAnd a target imageProjection in radial and angular directions:
to pairAndcarrying out normalized calculation to calculate the translation amount of the highest pointAccording to Calculating a rotational offset parameter
6. The voice feedback and interaction method of claim 4, wherein: the eight-consistency matching identification step further comprises:
step A, target image is processedTaking the central point as a center to make a concentric circle, dividing the fingerprint image into B annular areas, and finally dividing each annular area into K fan-shaped areas, wherein K and B are predefined constants;
step B, calculating each sector SsqSector fingerprint feature value VsqθAs Code 1;
wherein, Fsqθ(x, y) is a sector area SsqOf each pixel, PsqθRepresenting a sector area SsqAverage value of inner pixel gray values, nsqIs a ring-shaped region SsqThe number of columns, 0 < sq ≦ bxk-1, {0 °, (360 °/K), 2 ° (360 °/K), 3 ° (360 °/K),. or.. ≦ 180 ° };
step C, after rotating the fingerprint image (180 degrees/K), repeating the step B, and extracting each sector SsqSector fingerprint feature value VsqθAs Code 2;
step E, respectively rotating Code1 and Code2 by R × (360 °/K) (R ═ 0,1,2.. K-1) to obtain Code1 'and Code 2';
and F, inputting the Code1 and the Code2, and the Code1 'and the Code 2' in the step E into a historical sentence voice feature gallery for matching.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210134308.XA CN114613361B (en) | 2022-02-14 | 2022-02-14 | Voice feedback and interaction system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210134308.XA CN114613361B (en) | 2022-02-14 | 2022-02-14 | Voice feedback and interaction system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114613361A true CN114613361A (en) | 2022-06-10 |
CN114613361B CN114613361B (en) | 2024-05-28 |
Family
ID=81858839
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210134308.XA Active CN114613361B (en) | 2022-02-14 | 2022-02-14 | Voice feedback and interaction system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114613361B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11657803B1 (en) | 2022-11-02 | 2023-05-23 | Actionpower Corp. | Method for speech recognition by using feedback information |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08314494A (en) * | 1995-05-19 | 1996-11-29 | Matsushita Electric Ind Co Ltd | Information retrieving device |
CN104505091A (en) * | 2014-12-26 | 2015-04-08 | 湖南华凯文化创意股份有限公司 | Human-machine voice interaction method and human-machine voice interaction system |
CN107133612A (en) * | 2017-06-06 | 2017-09-05 | 河海大学常州校区 | Based on image procossing and the intelligent ward of speech recognition technology and its operation method |
CN108073875A (en) * | 2016-11-14 | 2018-05-25 | 广东技术师范学院 | A kind of band noisy speech identifying system and method based on monocular cam |
-
2022
- 2022-02-14 CN CN202210134308.XA patent/CN114613361B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08314494A (en) * | 1995-05-19 | 1996-11-29 | Matsushita Electric Ind Co Ltd | Information retrieving device |
CN104505091A (en) * | 2014-12-26 | 2015-04-08 | 湖南华凯文化创意股份有限公司 | Human-machine voice interaction method and human-machine voice interaction system |
CN108073875A (en) * | 2016-11-14 | 2018-05-25 | 广东技术师范学院 | A kind of band noisy speech identifying system and method based on monocular cam |
CN107133612A (en) * | 2017-06-06 | 2017-09-05 | 河海大学常州校区 | Based on image procossing and the intelligent ward of speech recognition technology and its operation method |
Non-Patent Citations (1)
Title |
---|
仲琛;肖南峰;: "指纹和声音自动识别系统的设计与实现", 计算技术与自动化, no. 02, 30 June 2006 (2006-06-30) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11657803B1 (en) | 2022-11-02 | 2023-05-23 | Actionpower Corp. | Method for speech recognition by using feedback information |
Also Published As
Publication number | Publication date |
---|---|
CN114613361B (en) | 2024-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108509915B (en) | Method and device for generating face recognition model | |
JP2020135852A (en) | Method, apparatus, electronic device, computer-readable storage medium, and computer program for image-based data processing | |
CN110837550A (en) | Knowledge graph-based question and answer method and device, electronic equipment and storage medium | |
US10192098B2 (en) | Palm print image matching techniques | |
CN104199842B (en) | A kind of similar pictures search method based on local feature neighborhood information | |
US20030185445A1 (en) | Method for extracting and matching gesture features of image | |
KR20140079262A (en) | Image retrieval method, real-time drawing prompting method and devices thereof | |
JP2006190191A (en) | Information processing device and method, and program | |
US20180373946A1 (en) | Multi-stage tattoo matching techniques | |
CN110781284A (en) | Knowledge graph-based question and answer method, device and storage medium | |
CN111488468A (en) | Geographic information knowledge point extraction method and device, storage medium and computer equipment | |
CN110727769B (en) | Corpus generation method and device and man-machine interaction processing method and device | |
JP2023527615A (en) | Target object detection model training method, target object detection method, device, electronic device, storage medium and computer program | |
CN114880505A (en) | Image retrieval method, device and computer program product | |
CN114613361A (en) | Voice feedback and interaction system and method | |
Wu et al. | Iterative closest point registration for fast point feature histogram features of a volume density optimization algorithm | |
CN113723056A (en) | ICD (interface control document) coding conversion method, device, computing equipment and storage medium | |
CN111782856A (en) | Human body image processing method, device, equipment and medium | |
CN115497125B (en) | Fingerprint identification method, system, computer equipment and computer readable storage medium | |
CN111144374A (en) | Facial expression recognition method and device, storage medium and electronic equipment | |
CN115631370A (en) | Identification method and device of MRI (magnetic resonance imaging) sequence category based on convolutional neural network | |
CN111444319B (en) | Text matching method and device and electronic equipment | |
CN111310442B (en) | Method for mining shape-word error correction corpus, error correction method, device and storage medium | |
CN112801078A (en) | Point of interest (POI) matching method and device, electronic equipment and storage medium | |
CN111291155A (en) | Method and system for identifying homonymous cells based on text similarity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |