CN116935859A - Voiceprint recognition processing method and system - Google Patents

Voiceprint recognition processing method and system Download PDF

Info

Publication number
CN116935859A
CN116935859A CN202310900334.3A CN202310900334A CN116935859A CN 116935859 A CN116935859 A CN 116935859A CN 202310900334 A CN202310900334 A CN 202310900334A CN 116935859 A CN116935859 A CN 116935859A
Authority
CN
China
Prior art keywords
voiceprint
user
information
cloud system
wearable device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310900334.3A
Other languages
Chinese (zh)
Inventor
崔晓飞
石磊
刘岁成
于海波
尹学海
石科峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei Huawang Computer Technology Co ltd
Original Assignee
Hebei Huawang Computer Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei Huawang Computer Technology Co ltd filed Critical Hebei Huawang Computer Technology Co ltd
Priority to CN202310900334.3A priority Critical patent/CN116935859A/en
Publication of CN116935859A publication Critical patent/CN116935859A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • H04L65/1101Session protocols
    • H04L65/1108Web based protocols, e.g. webRTC

Abstract

The application relates to a voiceprint recognition processing method, which comprises the following steps: establishing communication connection between the wearable device and a cloud system; the voice unlocking wearable equipment acquires voice information of a user to form user voiceprint information and uploads the user voiceprint information to the cloud system, the cloud system extracts voiceprint features in the user voiceprint information and performs one comparison with all voiceprint features in a voiceprint feature library of the cloud system, and if the voiceprint features in the user voiceprint information exist in the voiceprint feature library, the wearable equipment is unlocked; if the wearable device is unlocked, the wearable device acquires environmental voice in the current environment to form environmental voice information and uploads the environmental voice information to the cloud system, and the cloud system analyzes the environmental voice information and makes different instructions. According to the method, the wearable equipment is carried on the constructor, then the wearable equipment is connected with the cloud system, and then the cloud system is utilized to distinguish sounds in the construction environment, so that the safety of the constructor is effectively improved.

Description

Voiceprint recognition processing method and system
Technical Field
The application relates to the field of voiceprint recognition and processing, in particular to a voiceprint recognition processing method and a voiceprint recognition processing system.
Background
In the current globalized digital information age, emerging technologies such as the internet, big data, artificial intelligence, etc. have become an integral part of real life. The development of these technologies has also led to more creativity and opportunities. Wherein, intelligent wearable equipment is a fast developing's intelligent mobile device, and it combines together traditional wearing equipment's function and intelligent technique, provides more convenient, high-efficient, intelligent use experience. Voiceprint recognition is one of the very important and popular research directions in the field of artificial intelligence in recent years. It originates from the basis of speech recognition, but is more complex than speech recognition because voiceprints contain not only the accent features of the speaker, but also various attributes of their pronunciation, intonation, mood, etc. These attributes represent information such as culture, individuals, regions, etc., making voiceprints an effective means of judging the identity of a person or group. Voiceprint recognition, however, also faces a number of technical challenges.
First, the quality of the voiceprint is often affected by factors such as noise, the quality of the data transmitted by the network, and the quality of the recording device, which can lead to degradation of the voiceprint recognition model. Second, the variation in voiceprints is very large, making it very difficult to quantitatively classify and label voiceprints. Finally, since the voiceprint contains multiple attributes, if only a single feature is used for extraction, the characteristics of the voiceprint cannot be comprehensively reflected.
Although voiceprint recognition has many technical challenges, development progress is made in many directions, such as progress in feature extraction, model construction, algorithm optimization and the like, in the aspect of feature extraction, voiceprint recognition is started to be regarded as a time sequence, and features of the voiceprint are extracted by using methods such as spectrum analysis, sound velocity analysis, intonation analysis and the like; meanwhile, since the voiceprint includes various attributes, it is necessary to combine various features for extraction. In the aspect of model construction, a voiceprint recognition model is structured, the voiceprint recognition model is constructed in a deep learning mode, a support vector machine mode and the like, and the performance of the model is optimized through training. In the aspect of algorithm optimization, the accuracy and the robustness of the voiceprint recognition model are improved by improving the algorithm.
In the future, voiceprint recognition will be more mature, and the application scene will be gradually expanded. Besides the fields of voice recognition, emotion analysis and the like which are widely applied at present, voiceprint recognition can be applied to a plurality of scenes such as identity authentication, intelligent home, security systems and the like, for example, voiceprint recognition is used for verifying the identity of intelligent wearable equipment.
In recent years, with the development of artificial intelligence technology, applications of wearable equipment such as smart watches, smart headphones and the like for voice interaction are also becoming wider and wider. These devices implement voice interaction with a computer or other device using voice recognition and voice synthesis techniques.
In daily life, the wearable device voice interaction has a wide application scene, such as voice control, and the most obvious characteristic of the wearable device voice interaction is voice control. Through devices such as a smart watch or a smart headset, a user can control related devices through voice instructions, such as adjusting volume, real-time voice communication, navigation, sending positioning data, and the like. The voice control mode is very convenient, the trouble of manually operating the equipment can be avoided, the efficiency of a user can be improved, remote reporting can be carried out even under the condition of serious injury or loss of activity, and precious time is saved for rescue. For example, with a smart watch, a user may view today's schedules, send files, etc. through voice instructions. In addition to voice control, the wearable device voice interaction may also act as a voice assistant. Through devices such as the intelligent watch or the intelligent earphone, a user can acquire local time, weather information, health data and the like. In addition, these devices may also provide customized services, such as recommending certain types of food or sports, etc., according to the user's preferences and needs. The voice assistant mode is very convenient, and can help the user at any time. The user can conduct voice chat with other people without manually entering text. The voice chat mode is very convenient, can enable users to express ideas more freely, and can also enhance social connection. For example, through a smart watch, a user may conduct remote voice chat with family and friends. The user may name a certain file, batch certain content, etc. by voice instructions. The voice naming mode is very convenient, the trouble of manual operation can be reduced, and meanwhile, the efficiency of a user can be improved. For example, with a smart watch, a user may name a certain photo by voice instructions.
In summary, the voice interaction of the wearable device is a very convenient and efficient voice interaction mode, and has been widely used in daily life. In the future, with the continuous development of technology, the application prospect of wearable equipment voice interaction is also becoming wider and wider.
Various potential safety hazards exist in the engineering construction process, particularly in some tunnel construction processes, but in the construction process, constructors can hardly distinguish whether sounds emitted in the surrounding working environment are abnormal sounds, so that the safety of the constructors can not be effectively guaranteed.
Disclosure of Invention
In order to improve the safety of constructors in the construction process, the application provides a voiceprint recognition processing method and a voiceprint recognition processing system, which effectively improve the safety of constructors by carrying wearable equipment on the constructors, enabling the wearable equipment to be connected with a cloud system, and distinguishing sounds in a construction environment by using the cloud system.
In a first aspect, the present application provides a voiceprint recognition processing method, which adopts the following technical scheme:
a voiceprint recognition processing method comprises the following steps:
establishing communication connection, starting the wearable equipment, and establishing communication connection between the wearable equipment and a cloud system;
the voice unlocking wearable device acquires voice information of a user to form user voiceprint information and uploads the user voiceprint information to the cloud system, the cloud system extracts voiceprint features in the user voiceprint information and performs one comparison with all voiceprint features in a voiceprint feature library of the cloud system, if the voiceprint features in the user voiceprint information exist in the voiceprint feature library, the cloud system sends a wearable device unlocking code to the wearable device, and if the voiceprint features in the user voiceprint information do not exist in the voiceprint feature library, the cloud system sends a retry code to the wearable device;
if the wearing equipment receives the unlocking code, unlocking the wearing equipment;
if the wearable device receives the retry code, the wearable device retries the voice broadcast to prompt the user to perform voice unlocking again;
if the wearable device is unlocked, the wearable device acquires environmental voice in the current environment to form environmental voice information and uploads the environmental voice information to the cloud system, and the cloud system analyzes the environmental voice information and makes different instructions.
Preferably, a voiceprint feature library is built, the cloud system builds the voiceprint feature library in advance according to the user requirements, the wearable device collects voice information of the user in advance, then the voice information of the user is extracted as voiceprint feature information and uploaded to the cloud system, and the cloud system builds the voiceprint feature library according to the voiceprint features in the voiceprint feature information uploaded by the extracted wearable device.
Preferably, an upper limit number of voice unlocking times M is set on the cloud system, when the number of times of voice unlocking of the wearable device is greater than or equal to the upper limit number of times M, the cloud system sends a warning code to the wearable device, and if the wearable device receives the warning code, the wearable device performs warning voice broadcasting to prompt the user to be an illegal user.
Preferably, a use permission is established, when the wearable device prompts that the user is an illegal user, the user can apply for the use permission to the cloud system through the wearable device, wherein the use permission comprises that voiceprint features in voiceprint feature information of the user are stored in a voiceprint feature library, if the cloud system agrees to establish the use permission, voiceprint features in the voiceprint information of the user are stored in the voiceprint feature library, and if the cloud system does not agree to establish the use permission, the voiceprint information of the user is deleted.
Preferably, the cloud system analyzes the environmental voiceprint information to form an analysis result, wherein the analysis result comprises abnormal sound classification and user sound classification, a cloud system operator sends corresponding abnormal instructions to the wearing equipment through the cloud system according to the form of the abnormal sound classification, and the wearing equipment receives the abnormal instructions and converts the abnormal instructions into voice broadcasting and transmits the voice broadcasting to the wearing equipment user.
Preferably, the abnormal sound classification includes noise interference, harmful sound waves, quarrying sound waves, and preset sound waves.
In a second aspect, the present application provides a voiceprint recognition processing system, which adopts the following technical scheme:
a voiceprint recognition processing system comprising:
a cloud system;
the wearable device is used for wearing a user body, establishing communication connection with the cloud system, and acquiring sound, uploading the sound to the cloud system and making corresponding information according to a received instruction of the cloud system;
the voiceprint feature extraction module is used for extracting voiceprint features in the voiceprint information of the user and voiceprint features in the environmental voiceprint information;
the comparison judging module is used for comparing the voiceprint characteristics in the voiceprint information of the user with the voiceprint characteristics in the voiceprint characteristic library and judging whether the voiceprint characteristics in the voiceprint information of the user exist in the voiceprint characteristic library;
and the classification module is used for classifying the voiceprint characteristics in the environmental voiceprint information extracted by the voiceprint characteristic extraction module.
Preferably, the cloud system includes:
the cloud receiving module is used for receiving the electric signals sent by the wearable equipment;
the cloud sending module is used for sending the electric signals to the wearable equipment.
Preferably, the wearable device includes:
the wearable device receiving module is used for receiving the electric signals sent by the cloud sending module;
the wearable device sending module is used for sending an electric signal to the cloud system.
Preferably, the wearable device further comprises:
the voice acquisition module is used for acquiring voice information of a user and voice information in an environment where the user is located;
and the voice broadcasting module is used for broadcasting corresponding prompt broadcasting information.
In summary, the application has the following beneficial technical effects:
1. according to the method, the wearable equipment is carried on the constructor, then the wearable equipment is connected with the cloud system, and then the cloud system is utilized to distinguish sounds in the construction environment, so that the safety of the constructor is effectively improved.
2. The wearable device has the advantages of convenient operation, time and energy saving, suitability for various application scenes and humanized design. The wearable equipment realizes the function transfer of manual operation in a controller mode, so that a user can complete the execution of a task only by indicating through the equipment without directly contacting an object. Therefore, the reasons of human operation failure can be greatly reduced, and meanwhile, the working efficiency can be improved.
3. The wearable device can improve working efficiency and save time and energy. For example, the number of times that human beings operate in dangerous areas can be reduced, personnel safety is protected, personnel are prevented from being exposed to humid, high-temperature, low-temperature environments and the like to work, and working efficiency is improved. The wearable device has wide application fields, and can be applied to a plurality of fields such as constructional engineering, military operations, rescue operations, traffic management and the like. For example, in rescue operations, the wearable device may be used for rescue operations in natural disasters such as earthquakes, fires, floods, and the like. The working efficiency can be improved, time and energy can be saved, and most importantly, the rest times of personnel can be reduced, and the safety of the personnel can be protected. In addition, the design of the wearable equipment is very focused on humanized design, so that the wearable equipment is more convenient to use.
Drawings
FIG. 1 is a flow chart of an embodiment of the present application.
FIG. 2 is a flow chart of voiceprint feature identification in an embodiment of the present application.
FIG. 3 is another flow chart of voiceprint feature identification in an embodiment of the present application.
Fig. 4 is a flowchart of speech information synthesis in an embodiment of the application.
FIG. 5 is a flow chart of the WebRTC architecture in an embodiment of the application.
Detailed Description
The present application will be described in further detail with reference to the accompanying drawings.
Modifications of the embodiments which do not creatively contribute to the application may be made by those skilled in the art after reading the present specification, but are protected by patent laws within the scope of the claims of the present application.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In addition, the term "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In this context, unless otherwise specified, the term "/" generally indicates that the associated object is an "or" relationship.
Embodiments of the application are described in further detail below with reference to the drawings.
The embodiment of the application provides a voiceprint recognition processing method, which is executed by a wearable device and a cloud system, wherein the wearable device can be a server or a terminal device, the cloud system can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server for providing cloud computing service. The terminal device may be a smart helmet, a smart watch, a smart miner lamp, etc., but is not limited thereto, and the terminal device and the server may be directly or indirectly connected through a wired or wireless communication manner.
S1, establishing communication connection;
the user starts the wearable device, communication connection between the wearable device and the cloud system is established, and specifically, the wearable device is preferably connected with the cloud system through wireless communication, including but not limited to WIFI connection and Bluetooth connection. The wearable device can manually establish communication connection with the cloud system through a user, and can automatically search and establish communication connection with the cloud system after being started.
S2, voice unlocking the wearable device;
after the wearable device is started, a user inputs voice information to the wearable device, the wearable device collects the voice information of the user and forms user voiceprint information, then the wearable device uploads the user voiceprint information to a cloud system, the cloud system extracts voiceprint features in the user voiceprint information, and then the voiceprint features in the extracted user voiceprint information are compared with voiceprint features in a voiceprint feature library. If voiceprint features in the voiceprint information of the user exist in the voiceprint feature library, the cloud system can further verify the use permission of the current user, if the user has corresponding use permission, the cloud system can send an unlocking code to the wearable device, and after the wearable device receives the unlocking code, the wearable device automatically completes unlocking. If the voiceprint characteristics in the voiceprint information of the current user do not exist in the voiceprint characteristic library, or the current user does not have the corresponding use permission of the wearable device, the cloud system sends a retry code to the wearable device, after the wearable device receives the retry code, the wearable device sends out a voice broadcast to prompt the current user to unlock again, and the voice broadcast can be prompt messages such as "please retry", "the current user does not have the use permission", "the current user is not registered", and the like.
The voice print feature library is registered in advance by a user, namely, the wearing equipment collects voice information of the user and forms voice print information of the user, then the wearing equipment uploads the voice print information of the user to the cloud system, the cloud system extracts voice print features in the voice print information of the user, then the cloud system sets up the voice print features of all the collected users into the voice print feature library, and when the voice print feature library is used, the cloud system compares the voice print features of the user of the current wearing equipment with all the voice print features in the voice print feature library, and then corresponding instructions are sent out.
Setting a voice unlocking upper limit number M on a cloud system;
when the current user of the wearable device cannot open the wearable device, the upper limit frequency of the unlocking of the wearable device, which can be tried to be opened by the current user, is set to be M, when the user does not have the effect of attempting to open the wearable device for M times, the wearable device can automatically enter a locking state, and the cloud system sends a warning code to the current wearable device, and the current wearable device performs voice broadcasting after receiving the warning code to prompt the current user that the current user cannot continuously attempt to open the wearable device, specifically, the voice broadcasting can be that the current user is an illegal user or the current user does not have the use permission;
establishing a use authority;
when the user can not open the wearing equipment, the user can apply for the use permission to the cloud system through the wearing equipment, and specifically, the situation that the wearing equipment can not be opened includes, but is not limited to, the user using the wearing equipment for the first time, the wearing equipment reminding the user to open the wearing equipment again, the wearing equipment reminding the user that the wearing equipment can not be opened, the current user is an illegal user, and the like.
The user permission may be that the voiceprint feature in the voiceprint feature information of the user is stored in a voiceprint feature library, if the cloud system agrees to establish the user permission, the voiceprint feature in the voiceprint information of the user is stored in the voiceprint feature library, and if the cloud system does not agree to establish the user permission, the voiceprint information of the user is deleted.
The usage right may be whether the current user has a right to use the wearable device, and whether the current user has a right to achieve a certain specific function or purpose with the current wearable device.
The usage rights can also be other rights or applications, such as realizing other voice broadcasting, voice collection, voice conversation with other wearable devices, managing the cloud system and opening other application rights by using the wearable device.
When the current user does not have a certain use right, the current user can apply for establishing a corresponding use right to the cloud system through the current wearable device, if the cloud system agrees that the current user has the corresponding use right, the cloud system opens the corresponding use right for the current user, and if the cloud system does not agree that the current user has the corresponding use right, the cloud system refuses and refuses the request of the current user for opening the corresponding use right.
By the design, the operation of a user is simple and convenient, and the safety is improved. In the age of rapid development of digital and information technology nowadays, identity authentication has become an indispensable part of daily life. As people pay more and more attention to personal information security, the traditional identity authentication method cannot meet the modern requirements. Therefore, development of a novel voiceprint recognition technology as a non-contact, efficient and safe identity authentication method based on feature extraction and a machine learning algorithm is in the research prospect. With the rise of neural networks and natural language processing technologies, voiceprint recognition technology based on deep learning has received a great deal of attention. Compared with the traditional voice feature extraction method, the novel method can more effectively extract features in voice prints, and meanwhile, automatic selection and structural modeling of the features can be realized. Voiceprint recognition technology based on deep learning has been widely used in a variety of fields. As a wearable smart device, it is of course necessary for identity security verification.
The voiceprint feature library is the information of the wearable device stored in the cloud and the voiceprint features registered by the corresponding wearable device, the user registers reserved voiceprint features with the cloud service through initial setting of the wearable device, the cloud service adopts authorized confirmation or unauthorized rejection options according to the registered voiceprint features, if the cloud service adopts authorized confirmation, the voiceprint features of the user are registered in the voiceprint feature library and can be used for verifying the user permission of the wearable device, otherwise, if the cloud service adopts unauthorized options, the user does not use the permission of the wearable device, and the registered user voiceprint features can be maintained at the cloud, such as deleting an expired user or configuring a user permission period.
By the design, the safety of data storage and a system is ensured. In the embodiment, the voiceprint feature is stored in the cloud, and the cloud storage data provides a reliable data storage solution, so that a user does not need to worry about the problem of data loss or damage. Cloud data storage is also more secure because it uses highly secure technology to protect the user's data. This means that the data is not easily stolen or tampered with by the outside when stored in the cloud. The cloud data storage can also provide real-time data access and sharing, so that identity verification and data sharing are safer and more convenient. Cloud data storage is generally more cost effective than local storage because it does not require significant hardware and software resources.
The process of comparing the voiceprint features in the extracted voiceprint feature information of the user with the voiceprint features in the voiceprint feature library can be realized through a voiceprint recognition algorithm model. Calculating the distance between the user voiceprint feature and the registered reserved storage voiceprint feature of the wearable device corresponding to the voiceprint feature library, if the distance is smaller than a set threshold, considering that the registered reserved storage voiceprint features of the wearable device corresponding to the user voiceprint feature library are similar, judging that the user verification is successful, and if the distance between the user voiceprint feature and the registered reserved storage voiceprint feature of the wearable device corresponding to the voiceprint feature library is larger than the set threshold, considering that the user verification is failed.
By the design, the application of the algorithm model improves the identity verification safety. The audio is preprocessed before voiceprint recognition. Including removing noise, extracting voiceprint information, etc. The latter is the conversion of audio into a time series pattern, where each pixel represents the energy of one sound wave. Useful voiceprint features are extracted from the original voiceprint information to facilitate modeling and classification. Features include energy, phase and velocity of the voiceprint. The voiceprint recognition model applied in this example is an algorithm model based on a neural network. The voiceprint recognition model can achieve higher precision and can accurately recognize voiceprint information. And can realize very high complexity, can handle complicated voiceprint information. The voiceprint recognition model is an advanced technology, has higher precision and complexity, and can be widely applied to a plurality of fields.
If the wearable device is unlocked, the wearable device acquires environmental voice in the current environment to form environmental voice information and uploads the environmental voice information to the cloud system, and the cloud system analyzes the environmental voice information and makes different instructions.
The wearable device can be in a working state after being successfully unlocked, and the wearable device is automatically set to be in a working mode. The wearable device can automatically collect external audio information in an operation mode, the wearable device can report real-time data to the cloud according to the collected external audio information, cloud service receives the data reported by the wearable device, and intelligent analysis is performed according to the audio information data sent by the wearable device. According to the result of intelligent analysis of the data reported by the wearable equipment, if the analysis result contains abnormal sound classification, the abnormal sound classification is analyzed, such as noise interference, harmful sound waves, appointed sound and the like, and the cloud service sends a corresponding voice prompt to the user of the wearable equipment to guide the user of the wearable equipment to complete corresponding instructions. If the analysis result contains the voice print characteristics of the user, the cloud service automatically analyzes the audio information reported by the wearable device, carries out voice recognition according to the audio information reported by the wearable device, understands the intention of the user, adopts various corresponding modes according to the understood intention of the user, such as sending the voice information to an assistant robot or an operator, and guides the user of the wearable device to complete corresponding instructions according to the voice intention sent by the cloud system.
By the design, the efficiency of the user of the wearable equipment can be remarkably improved, and data are collected and accumulated. With the rapid development of artificial intelligence technology, voiceprint analysis, sound classification and the like are also increasingly widely applied in various industries. In particular, in terms of speech recognition, artificial intelligence techniques have become a very powerful tool. The voice model used in the embodiment can analyze a large amount of voice data, and more efficient and accurate voice analysis is realized. With the development of modern technology, remote skill guidance is widely used in various industries. Through intelligent wearable equipment, transfer expertise and skill from a person or a set of people to the object in other places to realize improving work efficiency's purpose. Remote technical instruction may save time costs. Some work usually takes a lot of time and money, and by remote technical instruction, a lot of tasks can be completed in a small amount of time, and the cost can be reduced remarkably. In the embodiment, the work efficiency of the user can be guided by the remote technology through the wearable equipment.
The cloud service performs first intelligent analysis on the audio information reported by the wearable device according to the audio information reported by the wearable device, performs sound classification, integrates a sound classification model, and can identify abnormal sounds such as noise interference, help-seeking, quarrying, curse and the like, and if abnormal classification is found, the cloud service performs instant processing, and adopts voice to instruct a user of the wearable device or contact a cloud service operator or a robot assistant to alarm. After the first intelligent analysis is completed, second intelligent analysis can be performed, voice recognition and semantic understanding can be performed, cloud service integrates voice recognition and semantic understanding models, cloud service analyzes audio information reported by the wearable device, voice recognition is performed on the audio information reported by the wearable device, if user voiceprint is recognized, user intention is understood according to user expression information, if the user seeks guidance or encounters an emergency and cannot solve the problem of seeking cloud service assistance, cloud service is transmitted to an assistant robot or a cloud service operator according to user requirements, and the user of the wearable device can complete handling of the emergency or emergency according to the cloud service guidance. The WebRTC real-time video voice call can report the site conditions in real time, analyze the site conditions and provide valuable time for rescuing wounded persons or rush repair equipment.
The design comprises a cloud system and wearable equipment. The cloud system can realize unlimited expansibility, and can realize large-scale expansion only by adjusting resources according to business requirements because data are stored in the cloud. The cloud system ensures the safety and reliability of data and ensures the continuous operation of the service. Because the data in the cloud system is managed centrally, the operation becomes very simple, and only the operation is needed through a Web interface or a mobile application program without considering technical details. The cloud system supports data sharing among a plurality of users, and can reduce development and operation costs of repeated selection and establishment of infrastructure. The cloud system can realize version control of data, and is convenient for users to trace back and view historical data of data information. The cloud system may protect the security of user data by using a specialized encryption algorithm. The cloud system can realize regular data backup and prevent data loss. The intelligent wearable equipment at the edge end can realize the routing and the exchange of network traffic, thereby realizing the information exchange between the transmission nodes. The intelligent wearable device at the edge can increase the processing speed of the system by using efficient processor and memory components. The intelligent wearable device at the edge can realize configuration flexibility by adjusting hardware and software parameters. The intelligent wearable equipment at the edge end can improve fault tolerance. And the intelligent wearable equipment at the edge end can be convenient to install through miniaturization and standardization. The edge-side smart wearable device may reduce cost by using standardized devices and open source software. The intelligent wearable equipment at the edge end is used as a novel network management solution, and has wide application prospect and importance. The system provides the advantages of high autonomous control, high customizable, high safety, high flexibility, high expandability, high usability, high data support, high performance, low cost and the like, thereby meeting the requirements of different types and scales.
The wearable device can be a portable wearable device for personnel such as intelligent safety helmets, intelligent watches, intelligent miner lamps and the like. When the wearable device is in a working mode, external information needs to be automatically detected and identified or the wearable device needs to seek cloud service guidance or help.
By the design, user operation is reduced, and operation convenience is improved. The wearable edge device enables a user to control through hand touch or voice instructions by placing the intelligent controller and other electronic devices around the human body. The device provides the user with the capability of convenience, high efficiency and safety, and the wearable edge device can realize automatic control and better data protection. The wearable edge device of this example is a relatively prospective technology that can bring many convenience, safety and health benefits to the user. Wearable edge equipment is mature and intelligent, and a better life experience is created for users.
Voiceprint feature recognition can be achieved through the flow shown in fig. 2, and the specific steps are as follows:
1) After voice information is collected through the wearable equipment to form voiceprint information and then uploaded to the cloud system, a preprocessing stage is firstly carried out. The preprocessing comprises the links of endpoint detection, noise elimination and the like, the endpoint detection link analyzes the input audio stream, and automatically deletes invalid parts such as silence, non-human voice and the like in the audio, and valid voiceprint information is reserved. The noise elimination link filters background noise, and meets the use requirements of users in different environments.
2) The voice print information after pretreatment enters a voice print feature extraction stage, and spectral feature parameters capable of representing specific organ structures or behavioral habits of a speaker are extracted from voice print information of the speaker. The characteristic parameter has relative stability to the same speaker, does not change with time or environmental change, is consistent with different utterances of the same speaker, and has difficult imitation and strong noise immunity.
3) And the extracted individual voiceprint characteristic parameters are trained through the learning of a voiceprint recognition system to generate a voiceprint characteristic model special for the user, and the voiceprint characteristic model special for the user is stored in a voiceprint characteristic model database and corresponds to the user ID one by one.
4) When voiceprint recognition is needed, the voiceprint recognition system carries out pretreatment and voiceprint feature extraction on collected voiceprint information to obtain voiceprint feature parameters to be recognized, carries out similarity matching with a voiceprint feature model of a certain user or all voiceprint feature models in a voiceprint feature model database to obtain similarity distance measurement between voiceprint feature modes, and obtains and outputs recognition results by selecting proper distance measurement as a threshold value.
The voiceprint feature recognition implementation is shown in fig. 3, and the steps are as follows:
1) And (5) pretreatment.
The operation of the head-tail mute cut, which reduces interference, is commonly referred to as VAD. Sound framing, i.e. cutting the sound into small segments, each of which is called a frame, is achieved using a moving window function, rather than a simple cut, with overlapping frames.
2) And extracting voiceprint features.
The main algorithm has linear prediction cepstral coefficients and cepstral coefficients, and aims to change each frame of waveform into a multidimensional vector containing sound information;
3) Acoustic model.
The method comprises the steps of training voice data to obtain the voice data, wherein the input is a feature vector, and the output is phoneme information;
4) A dictionary.
The word or word corresponds to the phoneme, in short, chinese is the correspondence between phonetic transcription and Chinese character, english is the correspondence between phonetic transcription and word;
5) A language model.
Training a large amount of text information to obtain the probability of mutual association of single characters or words;
6) Decoding.
The voice data after the voiceprint features are extracted is subjected to text output through an acoustic model, a dictionary and a language model.
The implementation of the voice information synthesis is shown in fig. 4, and the steps are as follows:
1) Text regularization.
Text regularization is used to disambiguate non-standard words in pronunciation, converting written text words into spoken words, such as numbers, abbreviations, symbols, websites, etc.
2) Word segmentation and part of speech.
Word segmentation and part-of-speech prediction are also common tasks in the field of natural language processing, and in a speech synthesis system, the generation results of the two tasks are very important input information for the problems of phonetic transcription and prosody prediction in the front end, although the generation results are not directly put into the front end transcript.
3) And (5) phonetic notation.
Because of the advancement of end-to-end algorithms, the phonetic notation problem has been less important for English. But for chinese, phonetic notation is still a very important step in speech synthesis.
4) And predicting prosody.
The concept of prosody is relatively abstract and contains sentence-tuning, rereading, focus, prosody boundary and other information. The chinese prosody prediction problem here completes the prediction of the three-layer prosody level tree. The three-layer rhythm level tree is respectively a rhythm word, a rhythm phrase and a intonation phrase; different levels differ in rereading, dwell time, etc.
The WebRTC architecture described above is shown in fig. 5, where the techniques and principles used are as follows:
1) WebRTC c++ API layer.
The green part wraps the light purple WebRTCC++ API (Peer Connection) part, the part is mainly a C++ interface layer, the C++ API is provided by the C++ interface layer, the C++ API is mainly an API for a browser to support the WebRTC specification and call, and for example, a JNI function call layer API is required to be written when the WebRTC function is required to be realized on Android.
The main function of the layer is to expose the core functions of WebRTC, such as device management, audio and video stream data acquisition, etc., so that each software manufacturer can be conveniently integrated into home applications, such as browser manufacturers, etc.
Wherein the Peer Connection is the most core module of the layer, namely a Peer-to-Peer Connection module; many functions are implemented in the module, such as P2P wall-penetrating hole punching, establishment and preference of communication links, streaming data transmission, non-audio-video data transmission, transmission quality reporting and statistics, etc.
2) A Sessionmanagement layer;
the layer with the green part marked Session management/Abstract signaling (Session) is the Session management layer.
This layer provides session function management functions that can be performed to create sessions, manage context environments, and the like. This layer, in turn, involves various protocols, such as the SDP protocol of the signaling server, which is mainly used for signaling interaction and management of the connection state of RTCPeer Connection.
3) An engine layer;
this layer is the heaviest and most complex one of the WebRTC core layers. The layer is divided into three small modules, namely: voice Engine, video Engine, and Transport.
The first module, voice Engine, is a framework that includes a series of audio processing functions, such as audio acquisition, audio codec, audio optimization (including noise reduction, echo cancellation, etc.), etc.
The second module, video Engine, is a framework that includes a series of Video processing functions, such as Video acquisition, video encoding and decoding, dynamically modifying Video transmission quality according to network jitter, image processing, etc.
And a third module Transport (transmission module), wherein in WebRTC, data transmission can transmit other binary data such as files, texts, pictures and the like besides streaming media data such as audio and video, and the functions are provided by the module.
The embodiment of the application also discloses a voiceprint recognition processing system.
The voiceprint recognition processing system includes:
a cloud system;
the wearable device is used for wearing a user body, establishing communication connection with the cloud system, and acquiring sound, uploading the sound to the cloud system and making corresponding information according to a received instruction of the cloud system;
the voiceprint feature extraction module is used for extracting voiceprint features in the voiceprint information of the user and voiceprint features in the environmental voiceprint information;
the comparison judging module is used for comparing the voiceprint characteristics in the voiceprint information of the user with the voiceprint characteristics in the voiceprint characteristic library and judging whether the voiceprint characteristics in the voiceprint information of the user exist in the voiceprint characteristic library;
and the classification module is used for classifying the voiceprint characteristics in the environmental voiceprint information extracted by the voiceprint characteristic extraction module.
The voiceprint feature extraction module, the comparison judging module and the classification module are all installed on the cloud system, and the wearable equipment is in wireless communication connection with the cloud system.
The cloud system is also provided with a cloud receiving module and a cloud sending module, wherein the cloud receiving module is used for receiving an electric signal sent by the wearable equipment; the cloud sending module is used for sending an electric signal to the wearable device.
The cloud system further comprises a voiceprint feature database module, and the voiceprint feature database module is used for storing voiceprint features of users.
The wearable device is also provided with a wearable device receiving module and a wearable device sending module, wherein the wearable device receiving module is used for receiving the electric signals sent by the cloud sending module; the wearable device sending module is used for sending an electric signal to the cloud system.
The wearable device is also provided with a voice acquisition module and a voice broadcasting module, wherein the voice acquisition module is used for acquiring voice information of a user and voice information in an environment where the user is located, and the voice broadcasting module is used for broadcasting corresponding prompt broadcasting information.
The above embodiments are not intended to limit the scope of the present application, so: all equivalent changes in structure, shape and principle of the application should be covered in the scope of protection of the application.

Claims (10)

1. A voiceprint recognition processing method, comprising:
establishing communication connection, starting the wearable equipment, and establishing communication connection between the wearable equipment and a cloud system;
the voice unlocking wearable device acquires voice information of a user to form user voiceprint information and uploads the user voiceprint information to the cloud system, the cloud system extracts voiceprint features in the user voiceprint information and performs one comparison with all voiceprint features in a voiceprint feature library of the cloud system, if the voiceprint features in the user voiceprint information exist in the voiceprint feature library, the cloud system sends a wearable device unlocking code to the wearable device, and if the voiceprint features in the user voiceprint information do not exist in the voiceprint feature library, the cloud system sends a retry code to the wearable device;
if the wearing equipment receives the unlocking code, unlocking the wearing equipment;
if the wearable device receives the retry code, the wearable device retries the voice broadcast to prompt the user to perform voice unlocking again;
if the wearable device is unlocked, the wearable device acquires environmental voice in the current environment to form environmental voice information and uploads the environmental voice information to the cloud system, and the cloud system analyzes the environmental voice information and makes different instructions.
2. The voiceprint recognition processing method according to claim 1, wherein a voiceprint feature library is established, a cloud system establishes the voiceprint feature library in advance according to user requirements, a wearable device collects voice information of a user in advance, then extracts the voice information of the user as voiceprint feature information and uploads the voiceprint feature information to the cloud system, and the cloud system establishes the voiceprint feature library according to voiceprint features in the voiceprint feature information uploaded by the extracted wearable device.
3. The voiceprint recognition processing method of claim 1, wherein a voice unlocking upper limit number M is set on a cloud system, when the number of times of voice unlocking of a wearable device is greater than or equal to the upper limit number M, the cloud system sends a warning code to the wearable device, and if the wearable device receives the warning code, the wearable device plays warning voice to prompt the user to be an illegal user.
4. The voiceprint recognition processing method according to claim 3, wherein a use right is established, when the wearable device prompts the user to be an illegal user, the user can apply the use right to the cloud system through the wearable device, wherein the use right comprises storing voiceprint features in voiceprint feature information of the user in a voiceprint feature library, storing the voiceprint features in the voiceprint information of the user in the voiceprint feature library if the cloud system agrees to establish the use right, and deleting the voiceprint information of the user if the cloud system does not agree to establish the use right.
5. The voiceprint recognition processing method of claim 1, wherein the cloud system analyzes the environmental voiceprint information to form an analysis result, the analysis result comprises abnormal sound classification and user sound classification, a cloud system operator sends corresponding abnormal instructions to the wearing equipment through the cloud system according to the abnormal sound classification form, and the wearing equipment receives the abnormal instructions and converts the abnormal instructions into voice broadcasting and transmits the voice broadcasting to the wearing equipment user.
6. The method according to claim 5, wherein the abnormal sound classification includes noise disturbance, harmful sound waves, quarrying sound waves, and preset sound waves.
7. A voiceprint recognition processing system comprising:
a cloud system;
the wearable device is used for wearing a user body, establishing communication connection with the cloud system, and acquiring sound, uploading the sound to the cloud system and making corresponding information according to a received instruction of the cloud system;
the voiceprint feature extraction module is used for extracting voiceprint features in the voiceprint information of the user and voiceprint features in the environmental voiceprint information;
the comparison judging module is used for comparing the voiceprint characteristics in the voiceprint information of the user with the voiceprint characteristics in the voiceprint characteristic library and judging whether the voiceprint characteristics in the voiceprint information of the user exist in the voiceprint characteristic library;
and the classification module is used for classifying the voiceprint characteristics in the environmental voiceprint information extracted by the voiceprint characteristic extraction module.
8. The voiceprint recognition processing system of claim 7, wherein the cloud system comprises:
the cloud receiving module is used for receiving the electric signals sent by the wearable equipment;
the cloud sending module is used for sending the electric signals to the wearable equipment.
9. The voiceprint recognition processing system of claim 7, wherein the wearable device comprises:
the wearable device receiving module is used for receiving the electric signals sent by the cloud sending module;
the wearable device sending module is used for sending an electric signal to the cloud system.
10. The voiceprint recognition processing system of claim 9, wherein the wearable device further comprises:
the voice acquisition module is used for acquiring voice information of a user and voice information in an environment where the user is located;
and the voice broadcasting module is used for broadcasting corresponding prompt broadcasting information.
CN202310900334.3A 2023-07-21 2023-07-21 Voiceprint recognition processing method and system Pending CN116935859A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310900334.3A CN116935859A (en) 2023-07-21 2023-07-21 Voiceprint recognition processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310900334.3A CN116935859A (en) 2023-07-21 2023-07-21 Voiceprint recognition processing method and system

Publications (1)

Publication Number Publication Date
CN116935859A true CN116935859A (en) 2023-10-24

Family

ID=88389159

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310900334.3A Pending CN116935859A (en) 2023-07-21 2023-07-21 Voiceprint recognition processing method and system

Country Status (1)

Country Link
CN (1) CN116935859A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101529929A (en) * 2006-09-05 2009-09-09 Gn瑞声达A/S A hearing aid with histogram based sound environment classification
CN105893554A (en) * 2016-03-31 2016-08-24 广东小天才科技有限公司 Wearable device friend making method and system
CN207264779U (en) * 2017-08-30 2018-04-20 深圳金康特智能科技有限公司 A kind of intelligent wearable device with vocal print arousal function
CN111243603A (en) * 2020-01-09 2020-06-05 厦门快商通科技股份有限公司 Voiceprint recognition method, system, mobile terminal and storage medium
CN112542156A (en) * 2020-12-08 2021-03-23 山东航空股份有限公司 Civil aviation maintenance worker card system based on voiceprint recognition and voice instruction control
CN113077803A (en) * 2021-03-16 2021-07-06 联想(北京)有限公司 Voice processing method and device, readable storage medium and electronic equipment
US20210390959A1 (en) * 2020-06-15 2021-12-16 Samsung Electronics Co., Ltd. Electronic apparatus and controlling method thereof
CN114724566A (en) * 2022-04-18 2022-07-08 中国第一汽车股份有限公司 Voice processing method, device, storage medium and electronic equipment
CN114842843A (en) * 2022-03-29 2022-08-02 青岛海尔空调器有限总公司 Terminal device control method and device, electronic device and storage medium
CN114974255A (en) * 2022-05-16 2022-08-30 上海华客信息科技有限公司 Hotel scene-based voiceprint recognition method, system, equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101529929A (en) * 2006-09-05 2009-09-09 Gn瑞声达A/S A hearing aid with histogram based sound environment classification
CN105893554A (en) * 2016-03-31 2016-08-24 广东小天才科技有限公司 Wearable device friend making method and system
CN207264779U (en) * 2017-08-30 2018-04-20 深圳金康特智能科技有限公司 A kind of intelligent wearable device with vocal print arousal function
CN111243603A (en) * 2020-01-09 2020-06-05 厦门快商通科技股份有限公司 Voiceprint recognition method, system, mobile terminal and storage medium
US20210390959A1 (en) * 2020-06-15 2021-12-16 Samsung Electronics Co., Ltd. Electronic apparatus and controlling method thereof
CN112542156A (en) * 2020-12-08 2021-03-23 山东航空股份有限公司 Civil aviation maintenance worker card system based on voiceprint recognition and voice instruction control
CN113077803A (en) * 2021-03-16 2021-07-06 联想(北京)有限公司 Voice processing method and device, readable storage medium and electronic equipment
CN114842843A (en) * 2022-03-29 2022-08-02 青岛海尔空调器有限总公司 Terminal device control method and device, electronic device and storage medium
CN114724566A (en) * 2022-04-18 2022-07-08 中国第一汽车股份有限公司 Voice processing method, device, storage medium and electronic equipment
CN114974255A (en) * 2022-05-16 2022-08-30 上海华客信息科技有限公司 Hotel scene-based voiceprint recognition method, system, equipment and storage medium

Similar Documents

Publication Publication Date Title
WO2021082941A1 (en) Video figure recognition method and apparatus, and storage medium and electronic device
CN108428446A (en) Audio recognition method and device
CN110047481A (en) Method for voice recognition and device
WO2016112634A1 (en) Voice recognition system and method of robot system
US20200075024A1 (en) Response method and apparatus thereof
CN109074806A (en) Distributed audio output is controlled to realize voice output
CN112102850B (en) Emotion recognition processing method and device, medium and electronic equipment
US20200219384A1 (en) Methods and systems for ambient system control
CN108062212A (en) A kind of voice operating method and device based on scene
CN112151015A (en) Keyword detection method and device, electronic equipment and storage medium
KR20190096308A (en) electronic device
US20220238118A1 (en) Apparatus for processing an audio signal for the generation of a multimedia file with speech transcription
KR102312993B1 (en) Method and apparatus for implementing interactive message using artificial neural network
CN111916088B (en) Voice corpus generation method and device and computer readable storage medium
CN115457938A (en) Method, device, storage medium and electronic device for identifying awakening words
CN113129867A (en) Training method of voice recognition model, voice recognition method, device and equipment
CN106980640A (en) For the exchange method of photo, equipment and computer-readable recording medium
CN116935859A (en) Voiceprint recognition processing method and system
CN112150103B (en) Schedule setting method, schedule setting device and storage medium
US11809536B2 (en) Headphone biometric authentication
CN115098765A (en) Information pushing method, device and equipment based on deep learning and storage medium
CN114373443A (en) Speech synthesis method and apparatus, computing device, storage medium, and program product
KR101890704B1 (en) Simple message output device using speech recognition and language modeling and Method
CN113571063A (en) Voice signal recognition method and device, electronic equipment and storage medium
CN111582708A (en) Medical information detection method, system, electronic device and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination