CN113241078A - Attendance machine-based voice recognition method and system - Google Patents

Attendance machine-based voice recognition method and system Download PDF

Info

Publication number
CN113241078A
CN113241078A CN202110505558.5A CN202110505558A CN113241078A CN 113241078 A CN113241078 A CN 113241078A CN 202110505558 A CN202110505558 A CN 202110505558A CN 113241078 A CN113241078 A CN 113241078A
Authority
CN
China
Prior art keywords
name
voice
user
voice signal
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110505558.5A
Other languages
Chinese (zh)
Inventor
郭迦
龙华伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Moredian Technology Co ltd
Original Assignee
Hangzhou Moredian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Moredian Technology Co ltd filed Critical Hangzhou Moredian Technology Co ltd
Priority to CN202110505558.5A priority Critical patent/CN113241078A/en
Publication of CN113241078A publication Critical patent/CN113241078A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07CTIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
    • G07C1/00Registering, indicating or recording the time of events or elapsed time, e.g. time-recorders for work people
    • G07C1/10Registering, indicating or recording the time of events or elapsed time, e.g. time-recorders for work people together with the recording, indicating or registering of other data, e.g. of signs of identity
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07CTIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
    • G07C9/00Individual registration on entry or exit
    • G07C9/30Individual registration on entry or exit not involving the use of a pass
    • G07C9/32Individual registration on entry or exit not involving the use of a pass in combination with an identity check
    • G07C9/37Individual registration on entry or exit not involving the use of a pass in combination with an identity check using biometric data, e.g. fingerprints, iris scans or voice recognition
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07CTIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
    • G07C9/00Individual registration on entry or exit
    • G07C9/30Individual registration on entry or exit not involving the use of a pass
    • G07C9/38Individual registration on entry or exit not involving the use of a pass with central registration
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0324Details of processing therefor
    • G10L21/034Automatic adjustment

Abstract

The application relates to a voice recognition method and system based on an attendance machine, wherein a name hot word bank and a name database are constructed and used as training corpora to train a model to obtain a voice recognition model; the face recognition module acquires face information of a user, if the face information is successfully matched, the face recognition module is switched to an entrance guard mode, and if the face information is not successfully matched, the face recognition module is switched to a visitor mode; in the access control mode, an access control is opened for a user, in the visitor mode, a voice acquisition module acquires a voice signal of the user, acquires a second text of the voice signal according to a voice recognition model, and extracts a target name in the second text; the attendance machine is used for searching in the name database according to the name of the target person, and calling is started after the searching result is confirmed by a user, so that the problems that the attendance machine does not support voice recognition, a visitor to be searched by a visitor needs to be searched manually by a foreground, the efficiency is low, the degree of freedom of character arrangement and combination of the name is high, and the accuracy of name recognition is low are solved.

Description

Attendance machine-based voice recognition method and system
Technical Field
The application relates to the technical field of voice recognition, in particular to a voice recognition method and system based on an attendance machine.
Background
The voice is the acoustic expression of the language, is the most natural, most effective and most convenient means for human to communicate information, and is also a support for human thinking. Automatic Speech Recognition (ASR) generally refers to a process of a computer or other device converting a human spoken language into a corresponding output text or command by recognizing and understanding the Speech. In the related art, the attendance machine scene usually does not support voice recognition, can not carry out human-computer interaction, needs the foreground to look for the people that the visitor is looking for manually, consumes manpower, and is low in efficiency, and because the character arrangement combination degree of freedom of the name of a person is very high, the accuracy of the name recognition of the person is low.
At present, no effective solution is provided for the problems that the attendance machine scene in the related technology does not support human-computer interaction, the efficiency is low, and the accuracy rate of name identification is low.
Disclosure of Invention
The embodiment of the application provides a voice recognition method and system based on an attendance machine, and aims to at least solve the problems that in the related technology, the attendance machine scene does not support human-computer interaction, the efficiency is low, and the accuracy of name recognition is low.
In a first aspect, an embodiment of the present application provides a method for voice recognition based on an attendance machine, which is applied to an attendance machine scenario, and the method includes:
the method comprises the steps that a local server constructs a name hot word bank and a name database, and a cloud server trains a model by taking the name hot word bank and the name database as training corpora to obtain a voice recognition model;
the face recognition module acquires face information of a user, if the face information is successfully matched, the face recognition module is switched to an entrance guard mode, and if the face information is not successfully matched, the face recognition module is switched to a visitor mode;
in the access control mode, an access control is opened for the user, in the visitor mode, a voice acquisition module acquires a first voice signal of the user, acquires the position of the first voice signal according to TDOA positioning, amplifies an audio signal in an area where the position of the first voice signal is located, suppresses an audio signal outside the area to obtain a second voice signal, and sends the second voice signal to the cloud server;
the cloud server acquires a second text of the second voice signal according to the voice recognition model, and extracts a target name in the second text;
and the local server searches in the name database according to the target name, and the search result is confirmed by the user and then calls, wherein the search result is at least one name.
In some embodiments, after the extracting the target person name in the second text, the method further includes:
and establishing a pinyin tone database according to the name database, scoring the names in the name database according to the pinyin tone database by taking the target names as scoring standards, and taking the names with scoring results larger than a preset value as the target names.
In some embodiments, after the local server constructs the name hot word library and the name database, the method further comprises:
the local server updates the name hotword and the name data, and the cloud server trains the voice recognition model according to the updated name hotword and the updated name data.
In some embodiments, after the sending the second voice signal to the cloud server, the method further comprises:
and the local server simultaneously sends a voice recognition token to the cloud server, wherein the voice recognition token is used for acquiring the authority of voice recognition.
In some embodiments, before the voice acquiring module acquires the first voice signal of the user, the method further includes:
and acquiring the verification information of the user, authenticating the identity of the user according to the verification information, and acquiring a first voice signal of the user by the voice acquisition module if the verification is passed.
In a second aspect, the embodiment of the application provides a system for voice recognition based on a attendance machine, which is applied to an attendance machine scene, the system comprises a cloud server, a local server and the attendance machine, the attendance machine comprises a face recognition module, a voice acquisition module and a control module,
the local server is used for constructing a name hot word library and a name database;
the cloud server takes the name hot word database and the name database as training corpora to train a model, and a voice recognition model is obtained;
the face recognition module is used for recognizing face information of a user, if the face information is successfully matched, the face recognition module is switched to an entrance guard mode, and if the face information is not successfully matched, the face recognition module is switched to a visitor mode;
in the access control mode, the control module opens an access control for the user;
in the visitor mode, the voice acquisition module acquires a first voice signal of the user, acquires the position of the first voice signal according to TDOA positioning, amplifies an audio signal in an area where the position of the first voice signal is located, and suppresses an audio signal outside the area to obtain a second voice signal;
the cloud server acquires a second text of the second voice signal according to the voice recognition model, and extracts a target name in the second text;
and the local server searches in the name database according to the target name, and the search result is confirmed by the user and then calls, wherein the search result is at least one name.
In some embodiments, after the cloud server extracts the target name in the second text, the local server is further configured to create a pinyin tone database according to the name database, score the name in the name database according to the pinyin tone database with the target name as a scoring standard, and take the name with the scoring result greater than a preset value as the target name.
In some embodiments, after the local server constructs the name hot word library and the name database, the local server is further configured to update the name hot words and the name data, and the cloud server trains the speech recognition model according to the updated name hot words and the updated name data.
In some embodiments, after the sending of the second voice signal to the cloud server, the local server is further configured to send a voice recognition token to the cloud server, where the voice recognition token is used to obtain a voice recognition right.
In some embodiments, the attendance machine further includes a verification module, before the voice obtaining module obtains the first voice signal of the user, the verification module is configured to obtain verification information of the user, perform identity verification on the user according to the verification information, and if the verification is passed, the voice obtaining module obtains the first voice signal of the user.
Compared with the related art, the attendance machine-based voice recognition method provided by the embodiment of the application has the advantages that the name hot word bank and the name database are built through the local server, and the cloud server trains the model by taking the name hot word bank and the name database as training corpora to obtain the voice recognition model; the face recognition module acquires face information of a user, if the face information is successfully matched, the face recognition module is switched to an entrance guard mode, and if the face information is not successfully matched, the face recognition module is switched to a visitor mode; in the access control mode, an access control is opened for a user, in the visitor mode, a voice acquisition module acquires a first voice signal of the user, acquires the position of the first voice signal according to TDOA positioning, amplifies an audio signal in an area where the position of the first voice signal is located, suppresses an audio signal outside the area, obtains a second voice signal, and sends the second voice signal to a cloud server; the cloud server acquires a second text of the second voice signal according to the voice recognition model, and extracts a target person name in the second text; the local server searches in the name database according to the name of the target person, the search result is confirmed by a user and then calls, wherein the search result is at least one name, and the problems that the attendance checking airport scene does not support voice recognition, man-machine interaction cannot be carried out, a foreground needs to manually search for a person to be searched by a visitor, manpower is consumed, efficiency is low, and the accuracy of name recognition is low due to the fact that the degree of freedom of character arrangement and combination of the names is very high are solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a flow chart of a method of attendance machine-based voice recognition according to an embodiment of the application;
fig. 2 is a block diagram of a system for attendance machine-based voice recognition according to an embodiment of the present application;
fig. 3 is a block diagram of another system for attendance machine-based voice recognition according to an embodiment of the present application;
FIG. 4 is a block diagram of a hardware architecture according to an embodiment of the present application;
fig. 5 is a block diagram of a software architecture according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The present embodiment provides a method for voice recognition based on a attendance machine, and fig. 1 is a flowchart of a method for voice recognition based on an attendance machine according to an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:
step S101, a local server constructs a name hot word bank and a name database, and a cloud server trains a model by taking the name hot word bank and the name database as training corpora to obtain a voice recognition model; in this embodiment, the local server constructs a name database through a company OA, and constructs name hot words according to a person finding scene, for example, the name hot words in the person finding scene include words such as "i find", "i want to find", and "i want to find"; because the degree of freedom of character arrangement and combination of the names is very high, the model obtained only through name data training has low accuracy of name recognition, so the model needs to be trained through a name hot word library and a name database, the voice recognition model learns where the name is intercepted, and the accuracy of name recognition is improved, illustratively, if the name hot word is "I want to find" and the name is "Zhang three", then it can be judged that "I want to find" the latter word "Zhang three" in the sentence of "I want to find three" is just one name, and for example, if the name hot word is "I want to find … and …", then it can be judged that "I want to find" the latter word "Zhang three" and "the latter word" Li four "in the sentence of" I want to find "and" Li four "are names, and the model after being trained through the name hot word library and the name database is the voice recognition model, a plurality of names in a sentence can be accurately identified;
step S102, a face recognition module acquires face information of a user, if the face information is successfully matched, the mode is switched to an entrance guard mode, and if the face information is not successfully matched, the mode is switched to a visitor mode; in the related technology, the attendance machine is mainly used for recording the time of punching a card by a staff and opening an access control for the staff, if the staff is not the staff of the company, the attendance machine cannot interact with the attendance machine, the staff to be found by a visitor needs to be manually found by a foreground, the labor is consumed, and the efficiency is low;
step S103, in an access control mode, opening an access control for a user, in a visitor mode, acquiring a first voice signal of the user by a voice acquisition module, acquiring the position of the first voice signal according to TDOA positioning, amplifying an audio signal in an area where the position of the first voice signal is located, suppressing an audio signal outside the area to obtain a second voice signal, and sending the second voice signal to a cloud server; in the embodiment, the face information of the user is successfully matched, which indicates that the user is a company employee and can open an access control for the user, and in the visitor mode, the attendance machine interacts with the user by acquiring the first voice signal of the user; because various noises exist in the environment of the attendance machine scene generally, the signal to noise ratio of the acquired first voice signal is low, and the recognition rate of the voice to text is low, so that the position of the first voice signal is acquired according to the TDOA location, the audio signal in the area where the position of the first voice signal is located is amplified, the audio signal outside the area is suppressed, the quality of the first voice signal is improved, the first voice signal can be closer to the original signal through a filtering algorithm, the location precision is improved by improving the signal to noise ratio of a single voice signal, whether the first voice signal is near field or not is judged, if the judgment result is yes, the optimal path for locating the audio signal is found through multiple searching and screening, and the problem that the near field sound source is difficult to locate is solved, wherein the TDOA location is a method for locating by using time difference;
step S104, the cloud server acquires a second text of the second voice signal according to the voice recognition model, and extracts a target name in the second text; for example, if the acquired second speech signal is "Wo (3) yao (4) zhao (3) jiang (1) ge (1)", where numerals in parentheses represent tones and "(3)" represents a third sound, the speech recognition model acquires that the second text of the second speech signal is "Wo (3) yao (4) zhao (3) jiang (1) ge (1)", and extracts the name of the target person in the second text as "jiang (1) ge (1)";
and step S105, the local server searches in the name database according to the target name, and calls after the search result is confirmed by the user, wherein the search result is at least one name. Illustratively, a target name 'jiang (1) ge (1)' is searched in a name database, if two names meeting the 'jiang (1) ge (1)' in the name database are respectively 'jiaoge' and 'zingg', the user is inquired which one is to be searched, calling is carried out according to confirmation information of the user, if only the 'zingg' can be searched in the name database, the user is inquired whether the 'zingg' needs to be called, if the 'zingg' needs to be searched, the user starts to call the zingg, and the target name is searched in the name database, so that the searching speed is high, and the accuracy is higher.
Compared with the prior art, the attendance machine scenes usually do not support voice recognition, human-computer interaction cannot be carried out, people to be found by visitors need to be manually found by a foreground, manpower is consumed, efficiency is low, and the visitor can hardly and accurately identify the name due to the very high degree of freedom of character arrangement and combination of the name, and if a plurality of names occur in a sentence, the problem that all the names cannot be completely extracted can also occur, in the scheme, the visitor can interact with the attendance machine by arranging the entrance guard mode and the visitor mode in the attendance machine, the required people can be found by the attendance machine, efficiency is improved, and the voice recognition model learns where the name is intercepted by the attendance machine by constructing the name hot word bank and the name database and training the voice recognition model by the name hot word bank and the name database, so that the accuracy of name recognition is improved, and can once recognize many names in a sentence, can clarify the name of the person the user needs to find without many conversations, improve the man-machine interaction effect, and obtain the position of the first voice signal through TDOA location, amplify the audio signal in the area of the position of the first voice signal, inhibit the audio signal outside the area, improve the quality of the first voice signal, make the voice recognition more accurate, after recognizing the name of the target person, not directly match to all the characters corresponding to the pinyin tone of the name of the target person, for example, after recognizing the name of the target person as "li (3) si (4)," li (i) (, but the name corresponding to li (3) si (4) is searched in the name database, if only 'lie' exists in the name database, only the lie is matched, and the range and the number of name matching are effectively reduced.
In some embodiments, after the target person name in the second text is extracted, a pinyin tone database is created according to the person name database, the target person name is used as a scoring standard, the person names in the person name database are scored according to the pinyin tone database, and the person names with the scoring result larger than a preset value are used as the target person name.
Illustratively, if the names of the persons in the person name database have "zhang (1) san (1)", "li (3) si (4)", and "li (3) shi (2)", which are created according to the person name database, wherein the numbers in the brackets represent tones, and if the target person name is identified as "zhang (1) san (1)", the target person name is used as a scoring standard, the names of the persons in the person name database are scored according to the pinyin tone database, illustratively, all of the pinyins are in accordance with the standard of 50 minutes, all of the tones are in accordance with the standard of 50 minutes, all of "zhang (1) san (1)" in the pinyin tone database is in accordance with the target person name, and therefore, 100 minutes, "zhang (1)" is not in accordance with the target person name of the pinyin in the flat tongue part of the target person name, the tone parts all accord with the 'zhang (1) san (1)', so the score is 90, the 'li (3) si (4)' and the 'li (3) shi (2)' do not accord with the 'zhang (1) san (1)', so the score is lower, and if the preset value is set to be 100, the 'zhang (1) san (1)' corresponding to the 'zhang (1)' in the name database is used as the name of the target person; if the preset value is 90, taking Zhangi corresponding to zhang (1) san (1) and Zhang mountain corresponding to Zhang (1) san (1) as target names, inquiring whether the name of a person to be called by the user is Zhangi or Zhang mountain, if the user confirms that the user needs to call Zhang mountain, starting to call Zhang mountain, solving the problem that the user is not distinguished by flat and warped tongue, and when the Zhang (1) san (1) is said to be Zhang (1) san (1), accurately identifying the name of the person the user wants to find.
In some embodiments, after the local server constructs the name hot word library and the name database, the local server updates the name hot words and the name data, and the cloud server trains the voice recognition model according to the updated name hot words and the updated name data. In this embodiment, for an organization, mobility of people is high, names of people are updated at any time, name hot words are also updated at any time, and after the local server updates the name hot words and the name data, the cloud server trains the voice recognition model according to the updated name hot words and the updated name data, so that the problem that the accuracy of voice recognition is reduced because the new name hot words and the new name data cannot be updated in time is prevented.
In some embodiments, after the second voice signal is sent to the cloud server, the local server simultaneously sends a voice recognition token to the cloud server, wherein the voice recognition token is used for acquiring the authority of voice recognition. In this embodiment, the cloud server determines whether the device requesting voice recognition is an authorized device according to the voice recognition token, and only the authorized device can use the voice recognition service of the cloud server, thereby preventing other devices that are not authorized from also performing a voice recognition request.
In some embodiments, before the voice obtaining module obtains the first voice signal of the user, the voice obtaining module obtains authentication information of the user, performs identity authentication on the user according to the authentication information, and if the authentication passes, the voice obtaining module obtains the first voice signal of the user. In this embodiment, before the voice obtaining module obtains the first voice signal of the visitor, the identity information of the visitor is to be verified, illustratively, the voice interaction interface of the attendance machine inquires whether the visitor has made an appointment, if so, the user is prompted to input an appointment code, if the appointment code is correct, the verification is passed, if the user has no appointment, the user contacts the foreground, and the identity information of the visitor is verified by the foreground staff, so that the problem that the work efficiency of the staff is affected when irrelevant staff contacts the staff of the company through the attendance machine is solved.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The embodiment also provides a system for voice recognition based on an attendance machine, which is used for implementing the above embodiments and preferred embodiments, and the description of the system that has been already made is omitted. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 2 is a block diagram of a structure of a system for speech recognition based on an attendance machine according to an embodiment of the present application, which is applied to an attendance machine scene, and as shown in fig. 2, the system includes a cloud server 23, a local server 22 and an attendance machine 21, the attendance machine 21 includes a face recognition module 210, a speech acquisition module 211 and a control module 212, the local server 22 is configured to construct a name hot lexicon and a name database, and the cloud server 23 trains a model by using the name hot lexicon and the name database as training corpora to obtain a speech recognition model 230; the face recognition module 210 is configured to recognize face information of a user, and switch to an access control mode if the face information is successfully matched, and switch to a visitor mode if the face information is not successfully matched; in the access mode, the control module 212 opens an access for the user; in the visitor mode, the voice obtaining module 211 obtains a first voice signal of a user, obtains the position of the first voice signal according to TDOA positioning, amplifies an audio signal in an area where the position of the first voice signal is located, and suppresses an audio signal outside the area to obtain a second voice signal; the cloud server 23 acquires a second text of the second voice signal according to the voice recognition model 230, and extracts a target person name in the second text; the local server 22 searches in the name database according to the target name, and the search result is confirmed by the user and then calls, wherein the search result is at least one name, so that the problems that the attendance checking airport scene does not support human-computer interaction, the efficiency is low, and the accuracy of name identification is low are solved.
Fig. 3 is a block diagram of another system for voice recognition based on a attendance machine according to an embodiment of the present application, as shown in fig. 3,
step S301, a microphone array acquires a speech signal, and a positioning method of Time Difference of Arrival (TDOA) estimation is used, the TDOA-based method generally includes two steps, first, Time Difference (Time delay estimation) of Arrival of a sound source signal at the microphone array is calculated, and then a sound source positioning model is established by a geometric shape of the microphone array and solved to obtain position information (positioning estimation);
step S302, the ALSA driver is a standard audio architecture under the Linux system and provides operation for audio equipment and audio data;
step S303, the Tinycap _ svc is background service, recording is started for upper application through Socket, a Socket (Socket) is used for connecting an application process in an upper connection mode and a network protocol stack in a lower connection mode, the interface is used for communication of an application program through a network protocol, and the interface is used for interaction of the application program and a network protocol root;
step S304, the Damo multi-mode SDK provides the algorithm capability of image and voice AI processing, the 4+1ch PCM means 4 microphone arrays and 1 echo cancellation reference channel, 5 sound channels are provided in total, and audio data of 1 channel is generated by a voice front-end processing algorithm in the Damo multi-mode SDK for voice recognition;
step S305, providing the voice recognition capability by the Damo voice recognition SDK;
step S306, the cloud carries out voice recognition on the voice signals, the Damo voice recognition SDK sends the audio data (1ch PCM) of the channel 1 to the cloud server through an UpLink (UpLink, UL for short) for carrying out the cloud voice recognition, and the cloud server returns the original Text (Raw Text) through a downlink (DpLink, DL for short);
step S307, after the original Text (Raw Text) is processed by the semantic processing module, a person name (Key Word) is obtained;
step S308, the voice interaction interface is realized through an APP UI, and mainly prompts a user to start speaking, identify, confirm a person name, start calling and the like;
step S309, the local server obtains the voice recognition token of the terminal and the model ID corresponding to each sentence to obtain the authority of voice recognition, the name of the target person is recognized and then the name of the target person is searched, and the search result is returned to the voice interaction interface.
Fig. 4 is a block diagram of a hardware architecture according to an embodiment of the present invention, as shown in fig. 4, four microphones MIC1, MIC2, MIC3 and MIC4 constitute a microphone array, MT8183 is a voice signal processor, AC108 is a high performance four-channel data switch ADC for converting an Analog voice signal (Analog) into a digital signal for voice signal processing, and the digital signal is amplified by an amplifier (PA) through an IIS bus and finally output by a Speaker (Speaker), where Ref signal is an echo cancellation reference channel.
Fig. 5 is a block diagram of a software architecture according to an embodiment of the present application, and as shown in fig. 5, a microphone Array (MIC Array) calls tinyalsa to collect Voice, then calls a JNI-encapsulated magic Voice sdk (moredian Voice sdk), and finally presents the result in a Voice interaction application.
The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.
In addition, in combination with the method for voice recognition based on the attendance machine in the above embodiments, the embodiments of the present application may provide a storage medium to implement. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any of the above-described embodiments of a method for attendance machine-based voice recognition.
In one embodiment, a computer device is provided, which may be a terminal. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method for attendance machine based speech recognition. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A voice recognition method based on a attendance machine is applied to attendance machine scenes, and comprises the following steps:
the method comprises the steps that a local server constructs a name hot word bank and a name database, and a cloud server trains a model by taking the name hot word bank and the name database as training corpora to obtain a voice recognition model;
the face recognition module acquires face information of a user, if the face information is successfully matched, the face recognition module is switched to an entrance guard mode, and if the face information is not successfully matched, the face recognition module is switched to a visitor mode;
in the access control mode, an access control is opened for the user, in the visitor mode, a voice acquisition module acquires a first voice signal of the user, acquires the position of the first voice signal according to TDOA positioning, amplifies an audio signal in an area where the position of the first voice signal is located, suppresses an audio signal outside the area to obtain a second voice signal, and sends the second voice signal to the cloud server;
the cloud server acquires a second text of the second voice signal according to the voice recognition model, and extracts a target name in the second text;
and the local server searches in the name database according to the target name, and the search result is confirmed by the user and then calls, wherein the search result is at least one name.
2. The method of claim 1, wherein after extracting the name of the target person in the second text, the method further comprises:
and establishing a pinyin tone database according to the name database, scoring the names in the name database according to the pinyin tone database by taking the target names as scoring standards, and taking the names with scoring results larger than a preset value as the target names.
3. The method of claim 1, wherein after the local server builds the name thesaurus and the name database, the method further comprises:
the local server updates the name hotword and the name data, and the cloud server trains the voice recognition model according to the updated name hotword and the updated name data.
4. The method of claim 1, wherein after the sending the second voice signal to the cloud server, the method further comprises:
and the local server simultaneously sends a voice recognition token to the cloud server, wherein the voice recognition token is used for acquiring the authority of voice recognition.
5. The method of claim 1, wherein before the voice capture module captures a first voice signal of a user, the method further comprises:
and acquiring the verification information of the user, authenticating the identity of the user according to the verification information, and acquiring a first voice signal of the user by the voice acquisition module if the verification is passed.
6. A system based on voice recognition of an attendance machine is applied to an attendance machine scene and comprises a cloud server, a local server and the attendance machine, wherein the attendance machine comprises a face recognition module, a voice acquisition module and a control module,
the local server is used for constructing a name hot word library and a name database;
the cloud server takes the name hot word database and the name database as training corpora to train a model, and a voice recognition model is obtained;
the face recognition module is used for recognizing face information of a user, if the face information is successfully matched, the face recognition module is switched to an entrance guard mode, and if the face information is not successfully matched, the face recognition module is switched to a visitor mode;
in the access control mode, the control module opens an access control for the user;
in the visitor mode, the voice acquisition module acquires a first voice signal of the user, acquires the position of the first voice signal according to TDOA positioning, amplifies an audio signal in an area where the position of the first voice signal is located, and suppresses an audio signal outside the area to obtain a second voice signal;
the cloud server acquires a second text of the second voice signal according to the voice recognition model, and extracts a target name in the second text;
and the local server searches in the name database according to the target name, and the search result is confirmed by the user and then calls, wherein the search result is at least one name.
7. The system according to claim 6, wherein after the cloud server extracts the target name in the second text, the local server is further configured to create a pinyin tone database according to the name database, score the names in the name database according to the pinyin tone database with the target name as a scoring standard, and take the name with the scoring result larger than a preset value as the target name.
8. The system according to claim 6, wherein after the local server constructs the name hot word bank and the name database, the local server is further configured to update the name hot words and the name data, and the cloud server trains the speech recognition model according to the updated name hot words and the updated name data.
9. The system of claim 6, wherein after the sending of the second voice signal to the cloud server, the local server is further configured to send a voice recognition token to the cloud server, wherein the voice recognition token is configured to obtain the authority of voice recognition.
10. The system of claim 6, wherein the attendance machine further comprises an authentication module, before the voice acquisition module acquires the first voice signal of the user, the authentication module is configured to acquire authentication information of the user, perform identity authentication on the user according to the authentication information, and if the authentication is passed, the voice acquisition module acquires the first voice signal of the user.
CN202110505558.5A 2021-05-10 2021-05-10 Attendance machine-based voice recognition method and system Pending CN113241078A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110505558.5A CN113241078A (en) 2021-05-10 2021-05-10 Attendance machine-based voice recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110505558.5A CN113241078A (en) 2021-05-10 2021-05-10 Attendance machine-based voice recognition method and system

Publications (1)

Publication Number Publication Date
CN113241078A true CN113241078A (en) 2021-08-10

Family

ID=77132853

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110505558.5A Pending CN113241078A (en) 2021-05-10 2021-05-10 Attendance machine-based voice recognition method and system

Country Status (1)

Country Link
CN (1) CN113241078A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114140925A (en) * 2021-11-30 2022-03-04 重庆紫光华山智安科技有限公司 Intelligent access control system and method based on multiple authentication modes

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103458218A (en) * 2013-08-14 2013-12-18 厦门狄耐克电子科技有限公司 Networking type building visual walky-talky system based on 3G technology
CN105913507A (en) * 2016-05-03 2016-08-31 深圳市商汤科技有限公司 Attendance checking method and system
CN108154579A (en) * 2017-12-22 2018-06-12 深圳市天和荣科技有限公司 A kind of intelligent access control system and exchange method that can be interacted with visitor
CN109688271A (en) * 2019-01-16 2019-04-26 深圳壹账通智能科技有限公司 The method, apparatus and terminal device of contact information input
CN109934973A (en) * 2017-12-19 2019-06-25 郑州灵珑信息科技有限公司 Community's entrance guard management system and method with vocal print and face identity recognition function
CN110246244A (en) * 2019-05-16 2019-09-17 珠海华园信息技术有限公司 Intelligent foreground management system based on recognition of face
CN112364212A (en) * 2020-11-04 2021-02-12 北京致远互联软件股份有限公司 Voice name recognition method based on approximate voice recognition

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103458218A (en) * 2013-08-14 2013-12-18 厦门狄耐克电子科技有限公司 Networking type building visual walky-talky system based on 3G technology
CN105913507A (en) * 2016-05-03 2016-08-31 深圳市商汤科技有限公司 Attendance checking method and system
CN109934973A (en) * 2017-12-19 2019-06-25 郑州灵珑信息科技有限公司 Community's entrance guard management system and method with vocal print and face identity recognition function
CN108154579A (en) * 2017-12-22 2018-06-12 深圳市天和荣科技有限公司 A kind of intelligent access control system and exchange method that can be interacted with visitor
CN109688271A (en) * 2019-01-16 2019-04-26 深圳壹账通智能科技有限公司 The method, apparatus and terminal device of contact information input
CN110246244A (en) * 2019-05-16 2019-09-17 珠海华园信息技术有限公司 Intelligent foreground management system based on recognition of face
CN112364212A (en) * 2020-11-04 2021-02-12 北京致远互联软件股份有限公司 Voice name recognition method based on approximate voice recognition

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114140925A (en) * 2021-11-30 2022-03-04 重庆紫光华山智安科技有限公司 Intelligent access control system and method based on multiple authentication modes
CN114140925B (en) * 2021-11-30 2024-01-30 重庆紫光华山智安科技有限公司 Intelligent access control system and method based on multiple authentication modes

Similar Documents

Publication Publication Date Title
ES2880006T3 (en) Method and apparatus of model construction of the voice print of a user
US5917890A (en) Disambiguation of alphabetic characters in an automated call processing environment
US5917889A (en) Capture of alphabetic or alphanumeric character strings in an automated call processing environment
CN110661927B (en) Voice interaction method and device, computer equipment and storage medium
US20200227027A1 (en) Updating a voice template
CN107895578A (en) Voice interactive method and device
US9817809B2 (en) System and method for treating homonyms in a speech recognition system
US20030065504A1 (en) Instant verbal translator
WO2016194740A1 (en) Speech recognition device, speech recognition system, terminal used in said speech recognition system, and method for generating speaker identification model
CN107240405B (en) Sound box and alarm method
US20170178632A1 (en) Multi-user unlocking method and apparatus
CN107463636B (en) Voice interaction data configuration method and device and computer readable storage medium
CN104104664A (en) Method, server, client and system for verifying verification code
US20200211560A1 (en) Data Processing Device and Method for Performing Speech-Based Human Machine Interaction
CN106973160A (en) A kind of method for secret protection, device and equipment
US11776543B2 (en) Authentication system, authentication method, and, non-transitory computer-readable information recording medium for recording program
WO2014173325A1 (en) Gutturophony recognition method and device
TW200304638A (en) Network-accessible speaker-dependent voice models of multiple persons
CN113241078A (en) Attendance machine-based voice recognition method and system
CN111783481A (en) Earphone control method, translation method, earphone and cloud server
WO2014000658A1 (en) Method and device for eliminating noise, and mobile terminal
CN105718781A (en) Method for operating terminal equipment based on voiceprint recognition and terminal equipment
JP2009086207A (en) Minute information generation system, minute information generation method, and minute information generation program
CN111785280A (en) Identity authentication method and device, storage medium and electronic equipment
KR20160112856A (en) Financial security system and method using speaker verification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210810