CN115063895A - Ticket selling method and system based on voice recognition - Google Patents

Ticket selling method and system based on voice recognition Download PDF

Info

Publication number
CN115063895A
CN115063895A CN202210655359.7A CN202210655359A CN115063895A CN 115063895 A CN115063895 A CN 115063895A CN 202210655359 A CN202210655359 A CN 202210655359A CN 115063895 A CN115063895 A CN 115063895A
Authority
CN
China
Prior art keywords
user
voice
waveform
voice signal
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210655359.7A
Other languages
Chinese (zh)
Inventor
王晨光
周帅
杨国荣
晏承彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhiyuanlian Technology Co ltd
Original Assignee
Shenzhen Zhiyuanlian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhiyuanlian Technology Co ltd filed Critical Shenzhen Zhiyuanlian Technology Co ltd
Priority to CN202210655359.7A priority Critical patent/CN115063895A/en
Publication of CN115063895A publication Critical patent/CN115063895A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07BTICKET-ISSUING APPARATUS; FARE-REGISTERING APPARATUS; FRANKING APPARATUS
    • G07B5/00Details of, or auxiliary devices for, ticket-issuing machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Abstract

The invention provides a ticket selling method and a ticket selling system based on voice recognition, wherein a camera is used for detecting whether a face image with the size, the direction and the retention time meeting preset conditions exists in a shot picture, the face image comprises a mouth shape image, a voice signal input by a user is obtained through a microphone, the starting point and the end point of the voice signal input by the user are determined according to the change of the mouth shape image, the voice signal is subjected to denoising, voice data of the user are extracted from the denoised voice signal, the voice recognition is carried out on the voice data to obtain text content corresponding to the voice data, and bill information is generated and displayed according to the text content, so that the problem of low success rate of the voice recognition caused by environmental noise in public occasions is solved.

Description

Ticket selling method and system based on voice recognition
Technical Field
The invention relates to the technical field of self-service ticketing, in particular to a ticketing method and a ticketing system based on voice recognition.
Background
In order to relieve the ticket selling pressure of an artificial ticket selling window, solve the problems of crowded flow and queuing and improve the ticket selling efficiency, a large number of self-service ticket selling equipment is arranged in a large number of public service places such as movie theaters, subway stations, railway stations, long-distance bus stations and the like and is provided for people to purchase tickets in a self-service mode. The self-service ticket machine has the advantages that the ticket machine can be used for buying and buying without the problem of ticket refunding caused by temporary change of travel, and meanwhile, consultation and assistance request can be provided for field workers at any time. However, the self-service ticket machine has the problems that the character input is difficult, and the operation is required to be conducted step by step, so that the convenience is greatly reduced. For some groups of older age or less literate, it is difficult to purchase tickets by using a self-service ticket vending machine. The self-service ticket vending machine with the voice recognition function in recent years solves the pain, key information of tickets required to be purchased, such as admission or inbound time, movies or destination sites desired to be watched and the like can be input through voice to quickly locate the tickets or the movie tickets required to be purchased, the operation is simple, and the time consumed for ticket purchasing is shortened. However, since the self-service ticket vending machine is generally installed in public places with large pedestrian flow, the environmental noise is very serious, the voice recognition effect is greatly influenced, the user sometimes needs to repeat the voice for a plurality of times to successfully recognize, and the advantage of high input efficiency is lost.
Disclosure of Invention
The invention provides a ticket selling method and a ticket selling system based on voice recognition based on the problems, and solves the problem of low success rate of voice recognition caused by environmental noise in public places.
In view of this, a first aspect of the present invention provides a method for ticketing based on voice recognition, including:
detecting whether a face image with the size, the orientation and the retention time meeting preset conditions exists in a shot picture through a camera, wherein the face image comprises a mouth-shaped image;
acquiring a voice signal input by a user through a microphone;
determining a starting point and an end point of a voice signal input by a user according to the change of the mouth-shaped image;
performing denoising on the voice signal;
extracting sound data of a user from the denoised voice signal;
performing voice recognition on the voice data to obtain text contents corresponding to the voice data;
and generating and displaying bill information according to the text content.
Further, in the above ticket selling method based on voice recognition, before the step of acquiring the voice signal input by the user through the microphone, the method further includes:
continuously monitoring environmental sound information through the microphone;
analyzing whether periodic noise and echo noise corresponding to the periodic noise exist in the environmental sound information;
if the periodic noise exists, extracting the time characteristic and the spectrum characteristic of the periodic noise;
and if the echo noise corresponding to the periodic noise exists, extracting the time characteristic and the spectrum characteristic of the echo noise.
Further, in the ticket selling method based on speech recognition, the step of performing denoising on the speech signal specifically includes:
if the periodic noise exists, generating a first inverse waveform corresponding to the periodic noise according to the time characteristic and the spectrum characteristic of the periodic noise;
if the echo noise corresponding to the periodic noise exists, generating a second anti-phase waveform corresponding to the echo noise according to the time characteristic and the spectrum characteristic of the echo noise;
inputting the voice signal, the first inverse waveform and/or the second inverse waveform into an inverse noise reduction function to cancel the periodic noise and/or the echo noise.
Further, in the above ticket selling method based on voice recognition, the step of determining the start point and the end point of the voice signal input by the user according to the change of the mouth shape image specifically includes:
extracting a mouth-shaped image from each frame of picture image with a face image shot by the camera;
comparing the mouth-shaped image in each frame of picture image with the mouth-shaped image in the picture image with the face image at the previous interval frame number of n, wherein n is a positive integer greater than or equal to 1;
judging whether the mouth shape of the user is changed from a closed state to an open state;
taking a time point corresponding to the picture image of the user's mouth shape changed from the closed state to the open state for the first time as a starting point of the voice signal;
judging whether the mouth shape of the user is changed from an open state to a closed state and the time for keeping the closed state exceeds a preset first threshold value;
and taking the time point of the last time the mouth shape of the user changes from the open state to the closed state as the end point of the voice signal.
Further, in the ticket selling method based on speech recognition, the step of denoising the speech signal further includes:
acquiring time points of the mouth shape of the user changing from a closed state to an open state and from the open state to the closed state every time so as to acquire time periods corresponding to the open state and the closed state of the mouth shape of the user;
and eliminating sound signals with energy exceeding a preset second threshold value within a short time period when the mouth of the user is in a closed state from the voice signals.
Further, in the ticket selling method based on speech recognition, the step of extracting the user's voice data from the denoised speech signal specifically includes:
inputting the denoised voice signal into a clipping function to obtain a quasi-fundamental period of user voice;
inputting the quasi-fundamental period and the voice signal into a voice waveform fitting function to fit a voice waveform of a user;
and extracting the sound data of the user from the voice signal according to the fitting result.
Further, in the ticket selling method based on voice recognition, the quasi-pitch period is composed of a pitch period value and an error range value thereof.
Further, in the ticket selling method based on speech recognition, the step of inputting the denoised speech signal into the clipping function to obtain the quasi-fundamental period of the user's voice specifically includes:
determining a maximum peak in a sound waveform of the user's sound data;
taking the average of the maximum peak value and the peak value average value of the sound waveform in the time period when the mouth shape of the user is in an open state as a clipping level;
inputting the sound waveform and the clipping level into a clipping function to obtain a peak waveform;
determining whether there is a waveform of the peak waveform that does not conform to a periodic law with an error value not greater than a third threshold;
if yes, the clipping level fine-tuning is carried out upwards, and the step of inputting the sound waveform and the clipping level into a clipping function to obtain a peak waveform is continuously carried out;
and if the determination result is negative, calculating the pitch period value and the error range value thereof according to the peak waveform.
Further, in the ticket selling method based on voice recognition, the step of inputting the quasi-fundamental pitch period and the voice signal into a voice waveform fitting function to fit the voice waveform of the user specifically includes:
inputting the pitch period value to an autocorrelation function;
adjusting coefficients of the autocorrelation function and fine tuning the pitch period value within the error value range to generate corresponding harmonic waveforms;
matching the harmonic waveform with a sound waveform of the speech signal;
and determining the sound waveform of the user according to the matching result.
A second aspect of the present invention provides a ticketing system, including a camera for acquiring a face image and a mouth image of a user, a microphone for monitoring environmental noise and a user voice signal, a touch screen for receiving a touch input from the user, a memory for storing a computer program, and a processor, where the processor is configured to execute the computer program to implement the method for ticketing based on voice recognition in any one of the first aspect.
The invention provides a ticket selling method and a ticket selling system based on voice recognition, wherein a camera is used for detecting whether a face image with the size, the orientation and the retention time meeting preset conditions exists in a shooting picture, the face image comprises a mouth shape image, a voice signal input by a user is obtained through a microphone, the starting point and the end point of the voice signal input by the user are determined according to the change of the mouth shape image, the voice signal is subjected to de-noising, voice data of the user are extracted from the de-noised voice signal, the voice recognition is carried out on the voice data to obtain text content corresponding to the voice data, bill information is generated and displayed according to the text content, and the problem of low success rate of the voice recognition caused by environmental noise in public occasions is solved.
Drawings
Fig. 1 is a schematic flow chart of a ticket selling method based on voice recognition according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart diagram of a periodic noise reduction method according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart diagram of a method for determining a start point and an end point of a speech signal according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart of a speech signal denoising method according to an embodiment of the present invention;
FIG. 5 is a schematic flow chart diagram of a user voice data extraction method according to an embodiment of the present invention;
FIG. 6 is a schematic flow chart diagram of a pitch period calculation method according to an embodiment of the present invention;
fig. 7 is a schematic flow chart of a user sound waveform fitting method according to an embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced otherwise than as specifically described herein, and thus the scope of the present invention is not limited by the specific embodiments disclosed below.
In the description of the present invention, the terms "plurality" or "a plurality" refer to two or more, and unless otherwise specifically limited, the terms "upper", "lower", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are merely for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention. The terms "connected," "mounted," "secured," and the like are to be construed broadly and include, for example, fixed connections, removable connections, or integral connections; may be directly connected or indirectly connected through an intermediate. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations. Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless otherwise specified.
In the description herein, reference to the term "one embodiment," "some embodiments," "specific examples," or the like, means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The technical scheme of the invention is applied to a self-service ticket machine, in particular to a self-service ticket machine installed in a public place, a ticket selling system running on the self-service ticket machine realizes the ticket selling method based on voice recognition, namely, the self-service ticket machine detects whether a face image with the size, the orientation and the retention time meeting preset conditions exists in a shooting picture through a camera, the face image comprises a mouth-shaped image, a voice signal input by a user is obtained through a microphone, the starting point and the end point of the voice signal input by the user are determined according to the change of the mouth-shaped image, the voice signal is denoised, voice data of the user are extracted from the denoised voice signal, voice recognition is carried out on the voice data to obtain text content corresponding to the voice data, and bill information is generated and displayed according to the text content, the problem of low speech recognition success rate caused by environmental noise in public places is solved. Specifically, a second aspect of the present invention provides a ticketing system, including a camera for acquiring a face image and a mouth image of a user, a microphone for monitoring environmental noise and a user voice signal, a touch screen for receiving a user touch input, a memory for storing a computer program, and a processor, where the processor is configured to execute the computer program to implement any one of the ticket selling methods based on voice recognition in the first aspect.
A ticket selling method and a ticket selling system based on voice recognition according to some embodiments of the present invention are described below with reference to fig. 1 to 7.
As shown in fig. 1, a first aspect of the present invention provides a method for ticketing based on voice recognition, including:
s100: whether a face image with the size, the orientation and the retention time meeting preset conditions exists in a shot picture is detected through a camera, and the face image comprises a mouth-shaped image. According to the technical scheme, the face image of the user is detected through the camera arranged on the self-service ticket vending machine so as to determine whether the user purchases tickets through the self-service ticket vending machine. Since the self-service ticket vending machine is installed in a public place where people come and go, a large number of face images of pedestrians passing by can appear in a shooting picture of the camera, and whether the shot face images are face images of users with an intention of ticket purchasing or face images of temporarily-passing pedestrians needs to be judged according to the size, orientation and residence time of faces in the shooting picture. When the size, the orientation and the retention time of the face in the shot picture meet preset conditions, for example, the area proportion of the face image in the shot picture is more than 40%, the deflection angle of the face towards the self-service ticket vending machine is not more than 30 degrees, and the retention time meeting the two conditions exceeds 5 seconds, the user corresponding to the face image can be considered as the user with the ticket purchasing intention.
S200: and acquiring a voice signal input by a user through a microphone. According to the technical scheme, a voice signal for inquiring or purchasing tickets sent by a user is acquired through a microphone arranged on a self-service ticket vending machine. The microphone collects voice signals of a user, and simultaneously collects some environmental sounds such as broadcast sounds, sounds of pedestrians, sounds of vehicles and sounds generated by collision or friction when other objects move into the voice signals, so that the voice signals also contain a large amount of noise signals.
S300: and determining the starting point and the end point of the voice signal input by the user according to the change of the mouth shape image. When the face image meeting the conditions exists in the shooting picture of the camera, the mouth shape image is extracted from the face image so as to determine the mouth shape change condition corresponding to the voice signal sent by the user according to the change of the mouth shape image, and therefore the starting time and the ending time of the voice signal input by the user can be determined according to the time corresponding to the mouth shape change of the user.
S400: performing denoising on the speech signal. Removing more significant noise, such as some periodic noise, echo noise, from the speech signal. In public places such as movie theaters and subway stations, periodic noise is one of the more dominant noise forms, such as periodic broadcast sound, or sound of subway station entering and exiting. Meanwhile, because movie theaters, subway stations and other occasions are generally closed, echo noise is one of the more main noise forms.
S500: and extracting the sound data of the user from the denoised voice signal. Since the user is located at a position close to the self-service ticket vending machine when inputting the voice signal, that is, the position where the user makes a sound is close to the microphone on the self-service ticket vending machine, after removing the periodic noise and the echo noise, the voice signal does not have the sound information with the average short-time energy value larger than the persistence of the user sound, that is, in the sound waveform of the voice signal after denoising, most of the peak waveform can be confirmed as the sound waveform of the user, so that the possibility of extracting the sound data of the user from the speech signal after denoising is provided. However, the denoised speech signal still contains other possible noises, such as the voice of pedestrian talking, etc., and it is necessary to recognize the voice data of the user from the denoised speech signal and extract the voice data to improve the accuracy of speech recognition. Specifically, the voice data of the user is voice waveform information of the user extracted from the voice signal.
S600: and performing voice recognition on the voice data to obtain the text content corresponding to the voice data. And inputting the extracted voice data of the user into a local or cloud pre-trained voice recognition model so as to recognize the voice data of the user and convert the voice data into corresponding text.
S700: and generating and displaying bill information according to the text content. For example, the text identified from the user's voice data is "search for tickets going to XXX station within one hour", or "buy an earliest ticket going to XXX", etc., and the self-service ticket machine performs corresponding actions of generating and displaying ticket information according to the text, so that the user can perform subsequent operations of paying and printing tickets according to the ticket information.
As shown in fig. 2, in the ticket selling method based on voice recognition, before the step of acquiring the voice signal input by the user through the microphone, the method further includes:
s110: and continuously monitoring the environmental sound information through the microphone. In this embodiment, the microphone provided in the self-service ticket machine continuously monitors the ambient sound even when the face image of the user meeting the condition is not detected.
S120: and analyzing whether periodic noise and echo noise corresponding to the periodic noise exist in the environmental sound information. According to the analysis of the environmental sound for a long time, the time characteristic and the spectrum characteristic of the periodic noise and the echo noise corresponding to the periodic noise can be found and extracted.
S130: and if the periodic noise exists, extracting the time characteristic and the spectrum characteristic of the periodic noise. The time characteristic refers to the time when the periodic noise appears and the time period thereof, for example, a certain periodically played sound prompting the pedestrians to pay attention to the safe broadcast, the time of the first playing every day, the time interval of the playing, and the like. The frequency spectrum characteristic refers to a sound waveform which is extracted by the ticketing system and corresponds to the periodic noise according to comparison of the sound waveforms extracted from a plurality of time periods.
S140: and if the echo noise corresponding to the periodic noise exists, extracting the time characteristic and the spectrum characteristic of the echo noise. When the self-service ticket vending machine is installed in an outdoor environment, the volume of echoes generated indoors by the periodic noise is very high, and the echoes corresponding to the periodic noise are periodic and can be easily identified and extracted. Especially for some building sizes or indoor environments with interior article furnishings without special de-noising design, the periodic noise and echoes thereof of a plurality of periods are even likely to generate standing wave guide to greatly enhance the sound waveform amplitude. When the self-service ticket vending machine is installed in a very open outdoor environment, no echo can be generated or the volume of the echo is small and is covered by other sound, so that the recognition of the voice signal cannot be adversely affected.
With reference to fig. 2, in the ticket selling method based on speech recognition, the step of performing denoising on the speech signal specifically includes:
s410: if the periodic noise exists, generating a first inverse waveform corresponding to the periodic noise according to the time characteristic and the spectrum characteristic of the periodic noise;
s420: if the echo noise corresponding to the periodic noise exists, generating a second anti-phase waveform corresponding to the echo noise according to the time characteristic and the spectrum characteristic of the echo noise;
s430: inputting the voice signal, the first inverse waveform and/or the second inverse waveform into an inverse noise reduction function to cancel the periodic noise and/or the echo noise.
In the embodiment, when the periodic noise and the echo noise thereof exist, the corresponding reverse-phase waveform is generated according to the frequency spectrum characteristics of the periodic noise and the echo noise thereof, and when the camera detects that a user inquires or purchases a ticket, the reverse-phase waveform and the signal of the voice signal corresponding time period are superposed in the reverse-phase noise reduction function according to the time characteristics of the periodic noise and the echo noise thereof so as to offset the periodic noise and the echo noise thereof of the corresponding time period.
As shown in fig. 3, in the ticket selling method based on voice recognition, the step of determining the start point and the end point of the voice signal input by the user according to the change of the mouth-shaped image specifically includes:
s310: and extracting a mouth-shaped image from each frame of picture image with the face image shot by the camera.
S320: and comparing the mouth-shaped image in each frame of picture image with the mouth-shaped image in the picture image with the face image at the previous interval frame number of n, wherein n is a positive integer greater than or equal to 1. Preferably, the mouth image comparison is performed at an interval of n-m frame numbers, where m is a sampling frame rate of the imaging device, that is, the imaging device performs image acquisition at a frame rate of m frames/second, and m is a positive integer. In this embodiment, after the qualified face image is detected, from the m-th frame of the detected face image, the mouth-shaped image of the user in each frame of image shot by the camera is compared with the mouth-shaped image in the previous frame or another frame of image at the preset interval, that is, the mouth-shaped image of each frame of image is compared with the mouth-shaped image in the shot picture at the previous interval of 1 second.
S330: it is determined whether the mouth shape of the user changes from the closed state to the open state. By comparing the mouth shape images in the front and back shooting pictures, whether the mouth shape of the user changes or not can be judged, whether the mouth shape changes from the closed state to the open state or from the open state to the closed state is judged according to the change state of the mouth shape, and the time corresponding to the change process is determined. The user mouth is in a closed state, namely the upper lip and the lower lip of the user are in a completely closed state, and the user mouth is in an open state, namely the upper lip and the lower lip of the user are in a non-completely closed state.
S340: and taking the time point corresponding to the picture image of which the mouth shape of the user is changed from the closed state to the open state for the first time as the starting point of the voice signal. And when the mouth shape of the user is changed from a closed state to an open state for the first time from the time when the face image meeting the condition is detected in the shooting picture, taking the time point corresponding to the first frame shooting picture image when the mouth shape of the user begins to appear the open characteristic as the starting point of the voice signal.
S350: and judging whether the mouth shape of the user is changed from the open state to the closed state and the time for keeping the closed state exceeds a preset first threshold value. Detecting a change of the user's mouth shape from the open state to the closed state from a time point corresponding to a first change of the user's mouth shape from the closed state to the open state. That is, after the user starts speaking, if the time that the user's mouth is in the closed state exceeds the preset first threshold, for example, the user's mouth is opened again for more than 3 seconds or 5 seconds, it means that the user has stopped inputting the voice signal.
S360: and taking the time point of the last time the mouth shape of the user changes from the open state to the closed state as the end point of the voice signal. And when the mouth shape of the user changes to the closed state and the duration of the closed state is greater than the preset first threshold, namely the user stops inputting the voice signal, taking the time point of the last time the mouth shape of the user changes from the open state to the closed state before the time point as the end point of the voice signal.
As shown in fig. 4, in the ticket selling method based on voice recognition, the step of performing denoising on the voice signal further includes:
s410: and acquiring the time point of changing the mouth shape of the user from the closed state to the open state and from the open state to the closed state every time so as to acquire the time period corresponding to the open state and the closed state of the mouth shape of the user. Dividing the time of inputting the voice signal by the user according to the mouth shape state of the user, wherein the time can be divided into a time period of the mouth shape opening state and a time period of the mouth shape closing state, and dividing the time of inputting the voice signal by the user into a time period corresponding to the mouth shape of the user being in the opening state and a time period corresponding to the mouth shape of the user being in the closing state according to the mouth shape image in the shooting picture in the period of inputting the voice signal by the user.
S420: and eliminating sound signals with energy exceeding a preset second threshold value within a short time period when the mouth of the user is in a closed state from the voice signals. While the user is speaking, there may be partial pronunciations that are uttered while the user's mouth is in the closed state, but it is determined that the average short-time energy of the speech signal input during a period of time corresponding to the user's mouth being in the open state is greater than the average short-time energy of the speech signal input during a period of time corresponding to the user's mouth being in the closed state. When a sound signal with average short-term energy exceeding a preset second threshold value appears in a time period when the mouth of the user is in a closed state, the sound signal can be determined to be an abnormal signal or a noise signal, and needs to be removed from the voice signal.
As shown in fig. 5, in the ticket selling method based on voice recognition, the step of extracting the voice data of the user from the denoised voice signal specifically includes:
s510: and inputting the denoised voice signal into a clipping function to obtain a quasi-pitch period of the user voice. Further, in the ticket selling method based on voice recognition, the quasi-pitch period is composed of a pitch period value and an error range value thereof. The pitch periods in the user's voice are not exactly the same time period in the strict sense, and may change within a certain error range, and by inputting the denoised voice signal into the clipping function, the quasi-pitch period in the voice signal of the user may be obtained, and the quasi-pitch period is composed of a pitch period value and an error range value thereof.
S520: and inputting the quasi-pitch period and the voice signal into a voice waveform fitting function to fit the voice waveform of the user.
S530: and extracting the sound data of the user from the voice signal according to the fitting result.
As shown in fig. 6, in the ticket selling method based on speech recognition, the step of inputting the denoised speech signal into the clipping function to obtain the quasi-fundamental pitch period of the user's voice specifically includes:
s511: determining a maximum peak in a sound waveform of the user's sound data;
s512: taking the average of the maximum peak value and the peak value average value of the sound waveform in the time period when the mouth shape of the user is in an open state as a clipping level;
s513: inputting the sound waveform and the clipping level into a clipping function to obtain a peak waveform;
s514: determining whether the peak waveform has a waveform that does not conform to a periodic rule that the error value is not greater than a third threshold;
s515: if yes, the clipping level fine-tuning is carried out upwards, and the step of inputting the sound waveform and the clipping level into a clipping function to obtain a peak waveform is continuously carried out;
s516: and if the pitch period value is not the same as the pitch period value, calculating the pitch period value and the error range value according to the peak waveform.
As described above, in the step of performing denoising on the voice signal, noise data with high short-term energy in the voice signal has been removed, and in the remaining voice data, where the short-term energy value of the voice data of the user is high, the peak waveform in the voice signal may be determined to be mostly the voice waveform in the voice data of the user. When the peak waveform obtained by using a certain clipping level value still contains a waveform which does not conform to the periodic regularity in the error range, the peak waveform also contains noise data, and the clipping level needs to be increased to obtain the peak waveform again.
As shown in fig. 7, in the ticket selling method based on voice recognition, the step of inputting the quasi-fundamental pitch period and the voice signal into a voice waveform fitting function to fit the voice waveform of the user specifically includes:
s521: inputting the pitch period value to an autocorrelation function;
s522: adjusting coefficients of the autocorrelation function and fine-tuning the pitch period value within the error value range to generate corresponding harmonic waveforms;
s523: matching the harmonic waveform with a sound waveform of the speech signal;
s524: and determining the sound waveform of the user according to the matching result.
In the technical solution of the above embodiment, by dynamically adjusting the pitch period value within an error range and adapting coefficients of different autocorrelation functions, a corresponding harmonic waveform is generated based on a fundamental waveform in the speech signal, and when a matching degree between the harmonic waveform and waveform signals on both sides of the corresponding fundamental waveform in the speech signal meets a preset condition, for example, when a peak-to-peak time point offset value of a peak-to-peak time point of a corresponding position in the harmonic waveform is not greater than a preset fourth threshold and is greater than a preset fifth threshold, the harmonic waveform is determined as a sound waveform of the user, so that sound data of the user can be extracted from the speech signal based on the fundamental waveform and the harmonic waveform.
A second aspect of the present invention provides a ticketing system, including a camera for obtaining a face image and a mouth image of a user, a microphone for monitoring environmental noise and a user voice signal, a touch screen for receiving a touch input from the user, a memory for storing a computer program, and a processor, where the processor is configured to execute the computer program to implement the method for ticketing based on voice recognition in any one of the first aspect.
The invention provides a ticket selling method and a ticket selling system based on voice recognition, wherein a camera is used for detecting whether a face image with the size, the orientation and the retention time meeting preset conditions exists in a shooting picture, the face image comprises a mouth shape image, a voice signal input by a user is obtained through a microphone, the starting point and the end point of the voice signal input by the user are determined according to the change of the mouth shape image, the voice signal is subjected to de-noising, voice data of the user are extracted from the de-noised voice signal, the voice recognition is carried out on the voice data to obtain text content corresponding to the voice data, bill information is generated and displayed according to the text content, and the problem of low success rate of the voice recognition caused by environmental noise in public occasions is solved.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
In accordance with embodiments of the present invention, as set forth above, these embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. The invention is limited only by the claims and their full scope and equivalents.

Claims (10)

1. A ticket selling method based on voice recognition is characterized by comprising the following steps:
detecting whether a face image with the size, the orientation and the retention time meeting preset conditions exists in a shot picture through a camera, wherein the face image comprises a mouth-shaped image;
acquiring a voice signal input by a user through a microphone;
determining a starting point and an end point of a voice signal input by a user according to the change of the mouth-shaped image;
performing denoising on the voice signal;
extracting sound data of a user from the denoised voice signal;
performing voice recognition on the voice data to obtain text contents corresponding to the voice data;
and generating and displaying bill information according to the text content.
2. The method for selling tickets based on voice recognition according to claim 1, characterized in that before the step of acquiring the voice signal input by the user through the microphone, it further comprises:
continuously monitoring environmental sound information through the microphone;
analyzing whether periodic noise and echo noise corresponding to the periodic noise exist in the environmental sound information;
if the periodic noise exists, extracting the time characteristic and the spectrum characteristic of the periodic noise;
and if the echo noise corresponding to the periodic noise exists, extracting the time characteristic and the spectrum characteristic of the echo noise.
3. The ticket selling method based on voice recognition as claimed in claim 2, wherein the step of denoising the voice signal specifically comprises:
if the periodic noise exists, generating a first inverse waveform corresponding to the periodic noise according to the time characteristic and the spectrum characteristic of the periodic noise;
if the echo noise corresponding to the periodic noise exists, generating a second anti-phase waveform corresponding to the echo noise according to the time characteristic and the spectrum characteristic of the echo noise;
inputting the voice signal, the first inverse waveform and/or the second inverse waveform into an inverse noise reduction function to cancel the periodic noise and/or the echo noise.
4. A method for selling tickets based on speech recognition according to claim 3, wherein the step of determining the start and end points of the speech signal input by the user according to the variation of the mouth-shaped image comprises:
extracting a mouth-shaped image from each frame of picture image with a face image shot by the camera;
comparing the mouth-shaped image in each frame of picture image with the mouth-shaped image in the picture image which has the face image and has the previous interval frame number of n, wherein n is a positive integer greater than or equal to 1;
judging whether the mouth shape of the user is changed from a closed state to an open state;
taking a time point corresponding to the picture image of the user's mouth shape changed from the closed state to the open state for the first time as a starting point of the voice signal;
judging whether the mouth shape of the user is changed from an open state to a closed state and the time for keeping the closed state exceeds a preset first threshold value;
and taking the time point of the last time the mouth shape of the user changes from the open state to the closed state as the end point of the voice signal.
5. The speech recognition-based ticketing method of claim 4, wherein the step of performing denoising on said speech signal further comprises:
acquiring time points of the mouth shape of the user changing from a closed state to an open state and from the open state to the closed state every time so as to acquire time periods corresponding to the open state and the closed state of the mouth shape of the user;
and eliminating sound signals with energy exceeding a preset second threshold value within a short time period when the mouth of the user is in a closed state from the voice signals.
6. The ticket selling method based on voice recognition as claimed in claims 1 to 5, wherein the step of extracting the voice data of the user from the denoised voice signal comprises:
inputting the denoised voice signal into a clipping function to obtain a quasi-fundamental period of user voice;
inputting the quasi-fundamental period and the voice signal into a voice waveform fitting function to fit a voice waveform of a user;
and extracting the sound data of the user from the voice signal according to the fitting result.
7. The method of claim 6, wherein the quasi-pitch period is composed of a pitch period value and an error range value thereof.
8. The ticket selling method based on speech recognition of claim 7, wherein the step of inputting the denoised speech signal into the clipping function to obtain the quasi-fundamental period of the user's voice includes:
determining a maximum peak in a sound waveform of the user's sound data;
taking the average of the maximum peak value and the peak value average value of the sound waveform in the time period when the mouth shape of the user is in an open state as a clipping level;
inputting the sound waveform and the clipping level into a clipping function to obtain a peak waveform;
determining whether there is a waveform of the peak waveform that does not conform to a periodic law with an error value not greater than a third threshold;
if yes, the clipping level fine-tuning is carried out upwards, and the step of inputting the sound waveform and the clipping level into a clipping function to obtain a peak waveform is continuously carried out;
and if the pitch period value is not the same as the pitch period value, calculating the pitch period value and the error range value according to the peak waveform.
9. The ticket selling method based on speech recognition according to claim 6 to 8, wherein the step of inputting the quasi-pitch period and the speech signal into a sound waveform fitting function to fit the sound waveform of the user specifically comprises:
inputting the pitch period value to an autocorrelation function;
adjusting coefficients of the autocorrelation function and fine-tuning the pitch period value within the error value range to generate corresponding harmonic waveforms;
matching the harmonic waveform with a sound waveform of the speech signal;
and determining the sound waveform of the user according to the matching result.
10. A ticketing system comprising a camera for capturing facial images and mouth images of a user, a microphone for listening to ambient noise and user speech signals, a touch screen for receiving touch input from the user, a memory for storing a computer program, and a processor for executing the computer program to implement a method of ticketing based on speech recognition as claimed in any one of claims 1 to 9.
CN202210655359.7A 2022-06-10 2022-06-10 Ticket selling method and system based on voice recognition Pending CN115063895A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210655359.7A CN115063895A (en) 2022-06-10 2022-06-10 Ticket selling method and system based on voice recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210655359.7A CN115063895A (en) 2022-06-10 2022-06-10 Ticket selling method and system based on voice recognition

Publications (1)

Publication Number Publication Date
CN115063895A true CN115063895A (en) 2022-09-16

Family

ID=83199833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210655359.7A Pending CN115063895A (en) 2022-06-10 2022-06-10 Ticket selling method and system based on voice recognition

Country Status (1)

Country Link
CN (1) CN115063895A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440872A (en) * 2013-08-15 2013-12-11 大连理工大学 Transient state noise removing method
CN105427387A (en) * 2015-11-09 2016-03-23 上海语知义信息技术有限公司 System and method for controlling ticket vending machine by voice
CN107705802A (en) * 2017-09-11 2018-02-16 厦门美图之家科技有限公司 Phonetics transfer method, device, electronic equipment and readable storage medium storing program for executing
CN109036381A (en) * 2018-08-08 2018-12-18 平安科技(深圳)有限公司 Method of speech processing and device, computer installation and readable storage medium storing program for executing
CN110322891A (en) * 2019-07-03 2019-10-11 南方科技大学 A kind of processing method of voice signal, device, terminal and storage medium
CN111048066A (en) * 2019-11-18 2020-04-21 云知声智能科技股份有限公司 Voice endpoint detection system assisted by images on child robot
CN111833554A (en) * 2019-04-17 2020-10-27 阿里巴巴集团控股有限公司 Ticket selling machine, ticket selling machine system, ticket selling method and ticket selling device
CN114463903A (en) * 2021-12-17 2022-05-10 广州新科佳都科技有限公司 Ticket machine interaction method and device, ticket selling terminal and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440872A (en) * 2013-08-15 2013-12-11 大连理工大学 Transient state noise removing method
CN105427387A (en) * 2015-11-09 2016-03-23 上海语知义信息技术有限公司 System and method for controlling ticket vending machine by voice
CN107705802A (en) * 2017-09-11 2018-02-16 厦门美图之家科技有限公司 Phonetics transfer method, device, electronic equipment and readable storage medium storing program for executing
CN109036381A (en) * 2018-08-08 2018-12-18 平安科技(深圳)有限公司 Method of speech processing and device, computer installation and readable storage medium storing program for executing
CN111833554A (en) * 2019-04-17 2020-10-27 阿里巴巴集团控股有限公司 Ticket selling machine, ticket selling machine system, ticket selling method and ticket selling device
CN110322891A (en) * 2019-07-03 2019-10-11 南方科技大学 A kind of processing method of voice signal, device, terminal and storage medium
CN111048066A (en) * 2019-11-18 2020-04-21 云知声智能科技股份有限公司 Voice endpoint detection system assisted by images on child robot
CN114463903A (en) * 2021-12-17 2022-05-10 广州新科佳都科技有限公司 Ticket machine interaction method and device, ticket selling terminal and storage medium

Similar Documents

Publication Publication Date Title
CN106486131B (en) A kind of method and device of speech de-noising
RU2373584C2 (en) Method and device for increasing speech intelligibility using several sensors
RU2376722C2 (en) Method for multi-sensory speech enhancement on mobile hand-held device and mobile hand-held device
EP1891624B1 (en) Multi-sensory speech enhancement using a speech-state model
US7613611B2 (en) Method and apparatus for vocal-cord signal recognition
US10074384B2 (en) State estimating apparatus, state estimating method, and state estimating computer program
EP1891627B1 (en) Multi-sensory speech enhancement using a clean speech prior
CN100356446C (en) Noise reduction and audio-visual speech activity detection
JP2011191423A (en) Device and method for recognition of speech
CN111776536A (en) Intelligent garbage classification putting system and method
WO2006114101A1 (en) Detection of speech present in a noisy signal and speech enhancement making use thereof
EP1199712A2 (en) Noise reduction method
CN115063895A (en) Ticket selling method and system based on voice recognition
Han et al. Reverberation and noise robust feature compensation based on IMM
Anderson et al. Robust tri-modal automatic speech recognition for consumer applications
Uhle et al. Speech enhancement of movie sound
Liu et al. Wavoice: A mmWave-assisted Noise-resistant Speech Recognition System
CN114125128A (en) Anti-eavesdropping recording method, device and terminal
JP7052008B2 (en) Reduced complexity of voiced voice detection and pitch estimation
CN114512133A (en) Sound object recognition method, sound object recognition device, server and storage medium
Kumar et al. Speech Enhancement Using Modified Wiener Filtering With Silence Removal
Rajavel et al. A novel algorithm for acoustic and visual classifiers decision fusion in audio-visual speech recognition system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination