CN110263653A - A kind of scene analysis system and method based on depth learning technology - Google Patents

A kind of scene analysis system and method based on depth learning technology Download PDF

Info

Publication number
CN110263653A
CN110263653A CN201910433837.8A CN201910433837A CN110263653A CN 110263653 A CN110263653 A CN 110263653A CN 201910433837 A CN201910433837 A CN 201910433837A CN 110263653 A CN110263653 A CN 110263653A
Authority
CN
China
Prior art keywords
deep learning
recognition
module
image
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910433837.8A
Other languages
Chinese (zh)
Inventor
王志宇
杨嘉欣
杨嘉烨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Dingyi Interconnection Technology Co ltd
Original Assignee
Guangdong Dingyi Interconnection Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Dingyi Interconnection Technology Co ltd filed Critical Guangdong Dingyi Interconnection Technology Co ltd
Priority to CN201910433837.8A priority Critical patent/CN110263653A/en
Publication of CN110263653A publication Critical patent/CN110263653A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Abstract

The scene analysis system and method based on depth learning technology that the invention discloses a kind of, the system include: data acquisition subsystem and cloud AI platform;Data acquisition subsystem acquires image and voice;Face recognition module carries out recognition of face to testing image according to depth learning technology in cloud AI platform;Facial Expression Analysis module people's facial expression analyzes and determines;Speech recognition module treats acoustic frequency according to depth learning technology and carries out speech recognition;Semanteme, the intonation of speech analysis module audio to be measured are analyzed and determined;Comprehensive analysis module carries out comprehensive analysis to facial Expression Analysis module and the obtained result of speech analysis module.The present invention can meet the identification of face and voice simultaneously, and according to depth learning technology, the recognition result of human face expression, the semanteme of voice and intonation is obtained, not only make recognition result more accurate but also ensure that recognition speed, further enriches scene analysis technology.

Description

A kind of scene analysis system and method based on deep learning technology
Technical field
The present invention relates to deep learning technology fields, more particularly to a kind of based on based on deep learning technology Scene analysis system and method.
Background technique
With being constantly progressive for modern science and technology, intellectualization times have arrived, wherein natural language processing and human face expression Identification technology also becomes the important topic of those skilled in the art's research already.
However, on the one hand, due to the limitation of traditional shallow Model, traditional Natural Language Processing Models need using A large amount of linguistic knowledge carrys out manual construction feature, and these are generally characterized by by concrete application guide, therefore not specific Wide applicability must construct new feature by hand again again if specific tasks change;
On the other hand, current face recognition technology is mainly based upon the feature extraction algorithm of hand-designed also to carry out reality It is existing, and in actual complex environment, human face data often there is the influence of various factors, such as illumination, block, posture becomes Change etc., in this case, the existing face identification method based on hand-designed feature extraction algorithm has poor robustness, It is poor to the anti-interference ability of above-mentioned influence factor, and these uncontrollable factors make the recognition of face based on existing method Performance sharply declines, it is difficult to which the effect for guaranteeing recognition of face has that face recognition accuracy rate is low.
And people different field explore image recognition, speech recognition, semantic analysis application, but by nature language Speech processing, recognition of face and human facial expression recognition, which combine, applies the application in scene analysis still less, even in hair The exhibition stage can not be accurately identified.
Therefore, develop that a kind of identification is accurate and the field of the natural language processing based on deep learning and facial expression recognition The problem of scape analysis system and method are those skilled in the art's urgent need to resolve.
Summary of the invention
In view of this, the present invention provides a kind of scene analysis system and method based on deep learning technology, pass through Deep learning technology identifies face or voice, and further to the expression of face and to the semanteme and intonation of voice It is analyzed, the accuracy of identification with analysis has been effectively ensured.
To achieve the goals above, the present invention adopts the following technical scheme:
A kind of scene analysis system based on deep learning technology, comprising: data acquisition subsystem, database and cloud AI are flat Platform;Wherein,
The data acquisition subsystem, the acquisition for image and voice;
The database, for storing data;
The cloud AI platform includes data preprocessing module, face recognition module, facial Expression Analysis module, speech recognition Module, speech analysis module and comprehensive analysis module;
The data preprocessing module, it is pre- for being carried out to data acquisition subsystem institute's acquired image and voice Processing;
The face recognition module, for carrying out recognition of face to testing image according to deep learning technology, and according to institute It states the data in database and differentiates whether the face in testing image has existed, and constantly carry out recognition of face deep learning;
The facial Expression Analysis module, for being analyzed and determined to people's facial expression in testing image, and constantly Carry out facial Expression Analysis deep learning;
The speech recognition module carries out speech recognition for treating acoustic frequency, voice content is converted to word content, Semantic analysis is carried out to voice content, and constantly carries out speech recognition deep learning;
The speech analysis module, semanteme, intonation for audio to be measured are analyzed and determined;
The comprehensive analysis module, for being carried out to facial Expression Analysis module and the obtained result of speech analysis module Comprehensive analysis.
Preferably, the pretreatment content includes: to carry out dimension-reduction treatment to image, carries out noise reduction process and text to audio This output.
Preferably, the data acquisition subsystem includes image capture module and audio collection module,
Described image acquisition module and the audio collection module, are respectively used to be acquired image and audio, and will The collected described image of institute and the audio are sent to the data preprocessing module.
Preferably, the face recognition module includes fisrt feature extraction unit, the first deep learning model and first Match and recognition unit;
The fisrt feature extraction unit, for according to the first deep learning model by pretreated image zooming-out face Image feature vector;
First matching and recognition unit, for by the facial image feature vector extracted and the database In facial image matched, obtain the first recognition result, and by first recognition result be sent to the database into Row storage, the first deep learning model are constantly updated according to the update of database.
Preferably, the speech recognition module includes second feature extraction unit, the second deep learning model and second Match and recognition unit;
The second feature extraction unit, for according to the second deep learning model by pretreated audio extraction audio Feature vector;
Second matching and recognition unit, for will be in the audio feature vector that extracted and the database Audio data is matched, and obtains the second recognition result, and second recognition result is sent to the database and is deposited Storage, the second deep learning model are constantly updated according to the update of database.
Preferably, the speech analysis module includes semantic analysis unit and intonation analytical unit;
What the semantic analysis unit and the intonation analytical unit were recognized according to the voice recognition unit respectively Voice carries out the analysis of semantic and intonation.
A kind of scene analysis method based on deep learning technology, comprising the following steps:
(1) acquisition of image and voice;
(2) institute's acquired image and voice are pre-processed;
(3) recognition of face is carried out to testing image according to deep learning technology, judged in database with the presence or absence of to mapping Face as in, and the people's facial expression recognized is analyzed and determined;
(4) acoustic frequency is treated according to deep learning technology and carries out speech recognition, convert speech into word content, and to knowledge Semanteme, the intonation for the voice being clipped to are analyzed and determined;
(5) the analytical judgment result of step (3) and step (4) carries out comprehensive analysis.
Preferably, the detailed process of recognition of face are as follows:
According to the first deep learning model by pretreated image zooming-out facial image feature vector;
The facial image feature vector extracted is matched with the facial image in database, obtains the first identification knot Fruit, and first recognition result is sent to the database and is stored, the first deep learning model is according to data The update in library and constantly update.
Preferably, the detailed process of speech recognition are as follows:
According to the second deep learning model by pretreated audio extraction audio feature vector;
The audio feature vector extracted is matched with the audio data in database, obtains the second identification knot Fruit, and second recognition result is sent to the database and is stored, the second deep learning model is according to data The update in library and constantly update.
It can be seen via above technical scheme that compared with prior art, the present disclosure provides one kind to be based on depth The scene analysis system and method for habit technology, wherein the system can meet the identification of face and voice, and root simultaneously first According to deep learning technology, the recognition result of human face expression, the semanteme of voice and intonation is obtained, not only makes recognition result more accurate And ensure that recognition speed, scene analysis technology is further enriched, secondly, deep learning model during use can Enough continuous iteration update, and further ensure that the accuracy of recognition result.The present invention can be used for service trade, smart city etc. In field, there is timely insight into customer mood can preferably meet the advantages such as the needs of client.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 attached drawing is structural schematic diagram provided by the invention;
Fig. 2 attached drawing is cloud AI platform interior structural schematic diagram provided by the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
The scene analysis system based on deep learning technology that the embodiment of the invention discloses a kind of, as shown in Figure 1, comprising: Data acquisition subsystem, database and cloud AI platform;Wherein,
Data acquisition subsystem, the acquisition for image and voice;
Database, for storing data;
As shown in Fig. 2, cloud AI platform includes data preprocessing module, face recognition module, facial Expression Analysis module, language Sound identification module, speech analysis module and comprehensive analysis module;
Data preprocessing module, for being pre-processed to data acquisition subsystem institute's acquired image and voice;
Face recognition module, for carrying out recognition of face to testing image according to deep learning technology, and according to database Interior data differentiate whether the face in testing image has existed, and constantly carry out recognition of face deep learning;
Facial Expression Analysis module for analyzing and determining to people's facial expression in testing image, and constantly carries out Facial Expression Analysis deep learning;
Speech recognition module carries out speech recognition for treating acoustic frequency, voice content is converted to word content, to language Sound content carries out semantic analysis, and constantly carries out speech recognition deep learning;
Speech analysis module, semanteme, intonation for audio to be measured are analyzed and determined;
Comprehensive analysis module, for being integrated to facial Expression Analysis module and the obtained result of speech analysis module Analysis.
Preferably, pretreatment content includes: to carry out dimension-reduction treatment to image, carries out noise reduction process to audio and text is defeated Out.
Further, which further includes database, for storing data;
Further, pretreatment content includes: to carry out dimension-reduction treatment to image, carries out noise reduction process and text to audio This output.
Further, data acquisition subsystem includes image capture module and audio collection module,
Image capture module and audio collection module are respectively used to be acquired image and audio, and will be collected Image and audio be sent to data preprocessing module.
Further, face recognition module includes fisrt feature extraction unit, the first deep learning model and first Match and recognition unit;
Fisrt feature extraction unit, for according to the first deep learning model by pretreated image zooming-out facial image Feature vector;
First matching and recognition unit, the facial image in facial image feature vector and database for will extract It is matched, obtains the first recognition result, and the first recognition result is sent to database and is stored, the first deep learning mould Type is constantly updated according to the update of database.
Further, speech recognition module includes second feature extraction unit, the second deep learning model and second Match and recognition unit;
Second feature extraction unit, for according to the second deep learning model by pretreated audio extraction audio frequency characteristics Vector;
Second matching and recognition unit, for carrying out the audio data in the audio feature vector and database that extract Matching, obtains the second recognition result, and the second recognition result is sent to database and is stored, the second deep learning model root It is constantly updated according to the update of database.
Further, speech analysis module includes semantic analysis unit and intonation analytical unit;
Semantic analysis unit and intonation analytical unit carry out according to the voice that voice recognition unit is recognized semantic respectively It is analyzed with intonation.
The operation principle of the present invention is that:
Acquired image and voice are sent to data prediction mould respectively by image capture module and voice acquisition module Block, image is carried out the processing such as dimensionality reduction by data preprocessing module, and carries out the processing such as noise reduction and text output to voice, and data are pre- Pretreated image data and voice data are respectively sent to face recognition module and speech recognition module, people by processing module Face identification module by the first matching and recognition unit by the data in the facial image feature vector extracted and database into Row matching, judges whether there is the face, and obtain face recognition result, further carries out facial table according to face recognition result Mutual affection analysis;Speech recognition module passes through the second matching and recognition unit for the number in the audio feature vector and database that extract According to being matched, and then carry out semantic, intonation analysis.
Comprehensive analysis module synthesis face facial expression analysis and semantic intonation analysis are as a result, to obtain in current scene Mood of identified the people etc. is as a result, complete scene analysis.Client's mood can be obtained in real time according to scene analysis result, known Customer satisfaction degree can carry out early warning to emergency event, in addition, can be prevented for the service of smart city with dynamic early-warning Social event occurs.
A kind of scene analysis method based on deep learning technology, comprising the following steps:
(1) acquisition of image and voice;
(2) institute's acquired image and voice are pre-processed;
(3) recognition of face is carried out to testing image according to deep learning technology, judged in database with the presence or absence of to mapping Face as in, and the people's facial expression recognized is analyzed and determined;
(4) acoustic frequency is treated according to deep learning technology and carries out speech recognition, convert speech into word content, and to knowledge Semanteme, the intonation for the voice being clipped to are analyzed and determined;
(5) the analytical judgment result of step (3) and step (4) carries out comprehensive analysis.
It, can also be with it should be understood that the sequencing of step (3) and step (4) is not necessarily, can to carry out simultaneously It first carries out step (3) and carries out step (4) afterwards, vice versa, can also only carry out one of step, determines as needed.
Further, the detailed process of recognition of face are as follows:
According to the first deep learning model by pretreated image zooming-out facial image feature vector;
The facial image feature vector extracted is matched with the facial image in database, obtains the first identification knot Fruit, and the first recognition result is sent to database and is stored, the first deep learning model according to the update of database without It is disconnected to update.
Further, the detailed process of speech recognition are as follows:
According to the second deep learning model by pretreated audio extraction audio feature vector;
The audio feature vector extracted is matched with the audio data in database, obtains the second recognition result, And the second recognition result is sent to database and is stored, the second deep learning model according to the update of database and constantly more Newly.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part It is bright.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims (9)

1. a kind of scene analysis system based on deep learning technology characterized by comprising data acquisition subsystem, data Library and cloud AI platform;Wherein,
The data acquisition subsystem, the acquisition for image and voice;
The database, for storing data;
The cloud AI platform includes data preprocessing module, face recognition module, facial Expression Analysis module, speech recognition mould Block, speech analysis module and comprehensive analysis module;
The data preprocessing module, for being located in advance to data acquisition subsystem institute's acquired image and voice Reason;
The face recognition module, for carrying out recognition of face to testing image according to deep learning technology, and according to the number Differentiate whether the face in testing image has existed according to the data in library, and constantly carries out recognition of face deep learning;
The facial Expression Analysis module for analyzing and determining to people's facial expression in testing image, and constantly carries out Facial Expression Analysis deep learning;
The speech recognition module carries out speech recognition for treating acoustic frequency, voice content is converted to word content, to language Sound content carries out semantic analysis, and constantly carries out speech recognition deep learning;
The speech analysis module, semanteme, intonation for audio to be measured are analyzed and determined;
The comprehensive analysis module, for being integrated to facial Expression Analysis module and the obtained result of speech analysis module Analysis.
2. a kind of scene analysis system based on deep learning technology according to claim 1, which is characterized in that described pre- Process content includes: to carry out dimension-reduction treatment to image, carries out noise reduction process and text output to audio.
3. a kind of scene analysis system based on deep learning technology according to claim 1, which is characterized in that the number It include image capture module and audio collection module according to acquisition subsystem,
Described image acquisition module and the audio collection module, are respectively used to be acquired image and audio, and will be adopted The described image and the audio collected is sent to the data preprocessing module.
4. a kind of scene analysis system based on deep learning technology according to claim 1, which is characterized in that the people Face identification module includes fisrt feature extraction unit, the first deep learning model and the first matching and recognition unit;
The fisrt feature extraction unit, for according to the first deep learning model by pretreated image zooming-out facial image Feature vector;
First matching and recognition unit, for will be in the facial image feature vector that extracted and the database Facial image is matched, and obtains the first recognition result, and first recognition result is sent to the database and is deposited Storage, the first deep learning model are constantly updated according to the update of database.
5. a kind of scene analysis system based on deep learning technology according to claim 1, which is characterized in that institute's predicate Sound identification module includes second feature extraction unit, the second deep learning model and the second matching and recognition unit;
The second feature extraction unit, for according to the second deep learning model by pretreated audio extraction audio frequency characteristics Vector;
Second matching and recognition unit, for by the audio in the audio feature vector extracted and the database Data are matched, and obtain the second recognition result, and second recognition result is sent to the database and is stored, institute The second deep learning model is stated to be constantly updated according to the update of database.
6. a kind of scene analysis system based on deep learning technology according to claim 1, which is characterized in that institute's predicate Sound analysis module includes semantic analysis unit and intonation analytical unit;
The voice that the semantic analysis unit and the intonation analytical unit are recognized according to the voice recognition unit respectively Carry out the analysis of semantic and intonation.
7. a kind of scene analysis method based on deep learning technology, which comprises the following steps:
(1) acquisition of image and voice;
(2) institute's acquired image and voice are pre-processed;
(3) recognition of face is carried out to testing image according to deep learning technology, judged in database with the presence or absence of in testing image Face, and the people's facial expression recognized is analyzed and determined;
(4) acoustic frequency is treated according to deep learning technology and carries out speech recognition, convert speech into word content, and to recognizing Semanteme, the intonation of voice analyzed and determined;
(5) the analytical judgment result of step (3) and step (4) carries out comprehensive analysis.
8. a kind of scene analysis method based on deep learning technology according to claim 8, which is characterized in that face is known Other detailed process are as follows:
According to the first deep learning model by pretreated image zooming-out facial image feature vector;
The facial image feature vector extracted is matched with the facial image in database, obtains the first recognition result, And first recognition result is sent to the database and is stored, the first deep learning model is according to database It updates and constantly updates.
9. a kind of scene analysis method based on deep learning technology according to claim 8, which is characterized in that voice is known Other detailed process are as follows:
According to the second deep learning model by pretreated audio extraction audio feature vector;
The audio feature vector extracted is matched with the audio data in database, obtains the second recognition result, And second recognition result is sent to the database and is stored, the second deep learning model is according to database It updates and constantly updates.
CN201910433837.8A 2019-05-23 2019-05-23 A kind of scene analysis system and method based on depth learning technology Pending CN110263653A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910433837.8A CN110263653A (en) 2019-05-23 2019-05-23 A kind of scene analysis system and method based on depth learning technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910433837.8A CN110263653A (en) 2019-05-23 2019-05-23 A kind of scene analysis system and method based on depth learning technology

Publications (1)

Publication Number Publication Date
CN110263653A true CN110263653A (en) 2019-09-20

Family

ID=67915131

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910433837.8A Pending CN110263653A (en) 2019-05-23 2019-05-23 A kind of scene analysis system and method based on depth learning technology

Country Status (1)

Country Link
CN (1) CN110263653A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991329A (en) * 2019-11-29 2020-04-10 上海商汤智能科技有限公司 Semantic analysis method and device, electronic equipment and storage medium
CN112001275A (en) * 2020-08-09 2020-11-27 成都未至科技有限公司 Robot for collecting student information
WO2021134459A1 (en) * 2019-12-31 2021-07-08 Asiainfo Technologies (China) , Inc. Ai intelligentialization based on signaling interaction
CN115328661A (en) * 2022-09-09 2022-11-11 中诚华隆计算机技术有限公司 Computing power balance execution method and chip based on voice and image characteristics
CN115440000A (en) * 2021-06-01 2022-12-06 广东艾檬电子科技有限公司 Campus early warning protection method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104902345A (en) * 2015-05-26 2015-09-09 多维新创(北京)技术有限公司 Method and system for realizing interactive advertising and marketing of products
CN106095903A (en) * 2016-06-08 2016-11-09 成都三零凯天通信实业有限公司 A kind of radio and television the analysis of public opinion method and system based on degree of depth learning art
CN106709804A (en) * 2015-11-16 2017-05-24 优化科技(苏州)有限公司 Interactive wealth planning consulting robot system
WO2018052561A1 (en) * 2016-09-13 2018-03-22 Intel Corporation Speaker segmentation and clustering for video summarization
CN109558935A (en) * 2018-11-28 2019-04-02 黄欢 Emotion recognition and exchange method and system based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104902345A (en) * 2015-05-26 2015-09-09 多维新创(北京)技术有限公司 Method and system for realizing interactive advertising and marketing of products
CN106709804A (en) * 2015-11-16 2017-05-24 优化科技(苏州)有限公司 Interactive wealth planning consulting robot system
CN106095903A (en) * 2016-06-08 2016-11-09 成都三零凯天通信实业有限公司 A kind of radio and television the analysis of public opinion method and system based on degree of depth learning art
WO2018052561A1 (en) * 2016-09-13 2018-03-22 Intel Corporation Speaker segmentation and clustering for video summarization
CN109558935A (en) * 2018-11-28 2019-04-02 黄欢 Emotion recognition and exchange method and system based on deep learning

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991329A (en) * 2019-11-29 2020-04-10 上海商汤智能科技有限公司 Semantic analysis method and device, electronic equipment and storage medium
WO2021134459A1 (en) * 2019-12-31 2021-07-08 Asiainfo Technologies (China) , Inc. Ai intelligentialization based on signaling interaction
CN112001275A (en) * 2020-08-09 2020-11-27 成都未至科技有限公司 Robot for collecting student information
CN115440000A (en) * 2021-06-01 2022-12-06 广东艾檬电子科技有限公司 Campus early warning protection method and device
CN115328661A (en) * 2022-09-09 2022-11-11 中诚华隆计算机技术有限公司 Computing power balance execution method and chip based on voice and image characteristics
CN115328661B (en) * 2022-09-09 2023-07-18 中诚华隆计算机技术有限公司 Computing power balance execution method and chip based on voice and image characteristics

Similar Documents

Publication Publication Date Title
CN110263653A (en) A kind of scene analysis system and method based on depth learning technology
CN107945805B (en) A kind of across language voice identification method for transformation of intelligence
CN111461176B (en) Multi-mode fusion method, device, medium and equipment based on normalized mutual information
CN108197115A (en) Intelligent interactive method, device, computer equipment and computer readable storage medium
CN102509547B (en) Method and system for voiceprint recognition based on vector quantization based
CN103366618B (en) Scene device for Chinese learning training based on artificial intelligence and virtual reality
CN107731233A (en) A kind of method for recognizing sound-groove based on RNN
CN110289003A (en) A kind of method of Application on Voiceprint Recognition, the method for model training and server
CN108428446A (en) Audio recognition method and device
CN107972028B (en) Man-machine interaction method and device and electronic equipment
CN107492382A (en) Voiceprint extracting method and device based on neutral net
CN104700843A (en) Method and device for identifying ages
CN107622797A (en) A kind of health based on sound determines system and method
CN109036437A (en) Accents recognition method, apparatus, computer installation and computer readable storage medium
CN106776832B (en) Processing method, apparatus and system for question and answer interactive log
CN109036395A (en) Personalized speaker control method, system, intelligent sound box and storage medium
CN110085220A (en) Intelligent interaction device
CN107871499A (en) Audio recognition method, system, computer equipment and computer-readable recording medium
CN106512393A (en) Application voice control method and system suitable for virtual reality environment
CN112863529B (en) Speaker voice conversion method based on countermeasure learning and related equipment
CN109872713A (en) A kind of voice awakening method and device
CN110428853A (en) Voice activity detection method, Voice activity detection device and electronic equipment
CN109887510A (en) A kind of method for recognizing sound-groove and device based on empirical mode decomposition and MFCC
CN110136726A (en) A kind of estimation method, device, system and the storage medium of voice gender
Zhang et al. Voice biometric identity authentication system based on android smart phone

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 519000 office 618, No. 2202, xinxiangjiang Road, Hengqin new area, Zhuhai, Guangzhou, Guangdong

Applicant after: Guangdong Dingyi Interconnection Technology Co.,Ltd.

Address before: 519000 unit 1 and unit 3, 10th floor, convention and Exhibition Center, No. 1, Software Park Road, Zhuhai, Guangdong

Applicant before: Guangdong Dingyi Interconnection Technology Co.,Ltd.