WO2017219991A1 - 适用于模式识别的模型的优化方法、装置及终端设备 - Google Patents

适用于模式识别的模型的优化方法、装置及终端设备 Download PDF

Info

Publication number
WO2017219991A1
WO2017219991A1 PCT/CN2017/089417 CN2017089417W WO2017219991A1 WO 2017219991 A1 WO2017219991 A1 WO 2017219991A1 CN 2017089417 W CN2017089417 W CN 2017089417W WO 2017219991 A1 WO2017219991 A1 WO 2017219991A1
Authority
WO
WIPO (PCT)
Prior art keywords
algorithm
model
terminal device
samples
feature parameter
Prior art date
Application number
PCT/CN2017/089417
Other languages
English (en)
French (fr)
Inventor
王细勇
蒋洪睿
曹华俊
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to BR112018076645-3A priority Critical patent/BR112018076645A2/pt
Priority to MYPI2018002664A priority patent/MY193115A/en
Priority to JP2018566575A priority patent/JP6806412B2/ja
Priority to US16/313,044 priority patent/US10825447B2/en
Priority to EP17814729.4A priority patent/EP3460792B1/en
Publication of WO2017219991A1 publication Critical patent/WO2017219991A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

Definitions

  • the present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, and a terminal device for optimizing a model suitable for pattern recognition.
  • Existing terminal devices eg, smart phones (SPs)
  • SPs smart phones
  • RISC Reduced Instruction Set Computer
  • ARM Advanced RISC Machines
  • CPU Central Processing Unit
  • model training such as acoustic model training
  • the prior art generally performs model training in the cloud, and then pushes the model to the terminal device to identify the voice, picture or video.
  • model training for pattern recognition in the cloud it is usually based on samples uploaded from at least one terminal device (eg, voice files, face pictures, or video files), so the resulting model is universal, such as in voice.
  • the voice of all users can be identified by the model, and the voice of a certain user is not recognized, that is, the model does not have personalized features, but the user wants the terminal device to recognize only its own voice, instead of Identifying or not well recognizing the voices of other users, that is, the user wishes to train a more personalized model. Therefore, there is a need to optimize the above model suitable for pattern recognition.
  • the embodiment of the invention provides a method, a device and a terminal device for optimizing a model for pattern recognition, which can obtain a more personalized model and can reduce the calculation amount of the server.
  • an optimization method for a model for pattern recognition comprising:
  • the terminal device receives a general model delivered by the server, where the universal model is obtained by the server according to samples uploaded by at least one terminal device, where the universal model includes original feature parameters;
  • the new feature parameter is obtained according to the plurality of local samples, the original feature parameter, and the first training algorithm, wherein the first training algorithm corrects the original feature parameter according to the local sample.
  • a machine learning algorithm that derives new feature parameters
  • the terminal device can continue to collect local samples in the process of identifying the target information through the universal model. Since the local sample is used by the terminal device in the process of optimizing the general model, it can be stored locally only after collecting the local sample, without the above to the server. Thereby, it can be reduced The amount of traffic consumed by the terminal device to upload samples to the server. In addition, since the general model is obtained by the server according to hundreds of millions or billions of samples uploaded by at least one terminal device, the accuracy of the general model identification information is relatively high; then the general model is optimized to obtain a comparison. The personalized model can reduce the calculation amount of the terminal device and improve the accuracy of the identification of the information of a specific user.
  • model optimization conditions may include one or more of the following:
  • the number of local samples reaches a preset number
  • the current time reaches a preset time
  • the terminal device is in a preset state
  • the attribute value of the terminal device reaches a preset threshold.
  • the first training algorithm may include one or more of the following:
  • Hidden Markov model HMM training algorithm forward algorithm, Viterbi algorithm, forward-backward algorithm, maximum expectation EM algorithm, deep neural network DNN algorithm, convolutional neural network CNN algorithm and recurrent neural network RNN algorithm.
  • an apparatus for optimizing a model for pattern recognition having a function of realizing behavior of a terminal device in the actual method.
  • This function can be implemented in hardware or in hardware by executing the corresponding software.
  • the hardware or software includes one or more modules corresponding to the functions described above.
  • a third aspect provides a terminal device, where the terminal device includes a transceiver and a processing circuit, and the transceiver is configured to receive a general model delivered by the server, where the universal model is uploaded by the server according to at least one terminal device. Obtained by the sample, the general model includes original feature parameters; processing circuitry for identifying target information by the universal model, and collecting a plurality of local samples; when the model optimization condition is met, according to the plurality of local samples, Deriving the original feature parameter and the first training algorithm to obtain a new feature parameter, wherein the first training algorithm is a machine learning algorithm that corrects the original feature parameter according to the local sample to obtain a new feature parameter; according to the second training algorithm and The new feature parameter optimizes the general model to obtain an optimized general model.
  • the terminal device includes a transceiver and a processing circuit
  • the transceiver is configured to receive a general model delivered by the server, where the universal model is uploaded by the server according to at least one terminal device. Obtained by the sample, the general model
  • a computer storage medium for storing computer software instructions for use in the terminal device described above, including a program designed to perform the above aspects.
  • the method, the device and the terminal device for optimizing the model for pattern recognition provided by the embodiment of the present invention, the terminal device receiving a general model delivered by the server, the universal model including the original feature parameter; identifying the target information by using the universal model, and collecting more a local sample; when the model optimization condition is satisfied, the original feature parameter is corrected by the first training algorithm to obtain a new feature parameter; then the general model is optimized according to the second training algorithm and the new feature parameter to obtain Optimized generic model.
  • the terminal device further optimizes the general model received from the server according to the collected local samples, so as to obtain a more personalized model suitable for pattern recognition, thereby improving the user experience and solving the problem.
  • the server optimizes the general model, the server calculates a large amount of problems.
  • FIG. 1 is a schematic structural diagram of a network provided by the present invention.
  • FIG. 2 is a flowchart of an optimization method for a model for pattern recognition according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a training process of a first training algorithm of the present invention.
  • FIG. 4 is a schematic diagram of an apparatus for optimizing a model suitable for pattern recognition according to another embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a terminal device according to still another embodiment of the present invention.
  • the server can perform data communication with the terminal device, and specifically, it can receive at least one terminal.
  • a sample uploaded by the device including but not limited to voice files, image files, and video files.
  • the terminal device may collect the voice file in the following manner: it may be recorded by the recording software of the terminal device during the process of the user's call, or may be performed by the terminal device during the process of the user using the instant communication software for voice chat. Recorded, or it may be recorded in other scenarios where the user's voice signal can be received.
  • the terminal device may collect the picture file and the video by: the terminal device may record during the process of the user taking a picture or the video, or may be the application device from the terminal device (eg, Weibo, WeChat) Get in the circle of friends and QQ space). Or the server can collect samples by itself.
  • the terminal device may record during the process of the user taking a picture or the video, or may be the application device from the terminal device (eg, Weibo, WeChat) Get in the circle of friends and QQ space).
  • the server can collect samples by itself.
  • the server after receiving enough samples (eg, hundreds of millions or billions) sent by at least one terminal device, the server performs pre-processing on the samples, where the pre-processing of the samples may include: And adding annotation information and other processing; then you can get a generic model based on the sample.
  • the training algorithm can be trained according to samples to obtain a general model.
  • the general model obtained may be a voice recognition model, which may be used to identify voice information (also called a voice signal); and when the sample is a picture file, the obtained universal model may be a picture.
  • a recognition model which can be used to identify picture information; or, when the sample is a video file, the resulting generic model can be a video recognition model that can be used to identify video information.
  • the above training algorithms include, but are not limited to, a Hidden Markov Model (HMM) training algorithm, a Forward Algorithm, a Viterbi Algorithm, and a Forward-Backward Algorithm. ), Expectation Maximization (EM) algorithm, Deep Neural Network (DNN) learning algorithm, Convolutional Neural Network (CNN) learning algorithm, and Recurrent Neural Network (RNN) Learning algorithm.
  • the generic model obtained above may contain one or more original feature parameters.
  • the original feature parameter is a speech parameter, also called an acoustic model parameter, which may include, but is not limited to, a Mel Frequency Coefficient (MFCC) and a fundamental frequency.
  • the general model is a picture recognition model.
  • the original feature parameter is a picture parameter (also called a picture model parameter), which may include but is not limited to parameters such as color, texture, and shape.
  • the terminal device in FIG. 1 may have a dedicated digital signal processing (DSP) chip or a neural processing unit (NPU) chip, which can meet the requirements of a large amount of computation of the neural network, or the present invention.
  • DSP digital signal processing
  • NPU neural processing unit
  • a terminal device has a large amount of computing power (eg, computing capability to achieve matrix multiplication or addition), including but not limited to mobile phones, mobile computers, tablets, personal digital assistants (PDAs), media Players, smart TVs, smart watches, smart glasses, smart bracelets, etc.
  • each terminal device is used by a fixed user, that is, each terminal device can correspond to a specific user, and therefore samples collected by one terminal device often have personal characteristics with specific users.
  • the server obtains a generic model based on samples uploaded by at least one terminal device, the universal model can identify all user information (including: voice information, picture information, and video information), that is, its versatility is better, but when When the universal model is delivered to the terminal device, and the terminal device uses the universal model to identify the information of the corresponding specific user, no matter how many times the specific model identifies the information of the specific user, how long the specific user is identified
  • the information cannot improve the accuracy of the identification of the information of a specific user, that is, the general model obtained by the server is not personalized.
  • FIG. 2 is a flowchart of an optimization method for a model for pattern recognition according to an embodiment of the present invention.
  • the executor of the method may be a terminal device. As shown in FIG. 2, the method may specifically include:
  • Step 210 The terminal device receives a general model delivered by the server.
  • the generic model is obtained by the server based on samples uploaded by at least one terminal device, which may contain one or more original feature parameters.
  • the original feature parameters may be multiple, a plurality of original feature parameters may be saved in the first matrix to facilitate management.
  • Step 220 identifying target information by the universal model, and collecting a plurality of local samples.
  • the target information includes, but is not limited to, voice information, picture information, and video information.
  • the speech recognition model may be used to identify the speech information input by the user to obtain a text corresponding to the speech information; and when the general model herein is a picture recognition model, Then, the picture information may be identified by the picture recognition model, where the picture information includes but is not limited to a face picture and a picture containing the object; when the general model here is a video model, the video recognition model may be identified by the video recognition model.
  • Video information where the video information consists of picture information.
  • the terminal device may continue to collect local samples in the process of identifying the target information through the universal model. Since the local sample is used by the terminal device in the process of optimizing the general model, it can be stored locally only after collecting the local sample, without the above to the server. Thereby, the traffic consumed by the terminal device to upload samples to the server can be reduced, and the local samples can include, but are not limited to, voice files, picture files, and video files.
  • Step 230 When the model optimization condition is met, obtain a new feature parameter according to the multiple local samples, the original feature parameter, and the first training algorithm, where the first training algorithm is based on the local sample to the original feature.
  • a machine learning algorithm that corrects parameters to derive new feature parameters.
  • model optimization condition may include one or more of the following:
  • the number of local samples reaches a preset number
  • the current time reaches a preset time
  • the terminal device is in a preset state
  • the attribute value of the terminal device reaches a preset threshold.
  • the number of local samples is up to 5,000.
  • the number of collected voice files, image files, or video files exceeds 5,000.
  • the current time reaches the preset time. For example, the current time is longer than 12:00.
  • the device is in the preset state.
  • the terminal device is in the charging state or the standby state.
  • the value of the terminal device reaches the preset threshold. For example, the power of the terminal device exceeds 80% or the temperature of the terminal device is less than 25 degrees.
  • the first training algorithm may be consistent with the training algorithm used by the server to obtain the universal model.
  • the terminal device may also use the deep learning algorithm to the general model. The original feature parameters are corrected.
  • the training algorithm used by the terminal device to correct the original feature parameters is consistent with the training algorithm used by the server to obtain the general model, the calculation amount of the latter is much larger than that of the former. Because the former collects only a few thousand local samples, and the latter receives at least one terminal device to upload hundreds of millions of samples or billions of samples.
  • the sample input by the input layer is a local sample
  • the local sample input by the input layer is a voice file of a specific user corresponding to the terminal device, and no input is performed.
  • the new feature parameters output by the output layer have personalized features, that is, the general model is optimized according to the new feature parameters to better recognize the voice information of a specific user, but cannot be recognized or not well Identify voice messages of other users.
  • the local sample input by the input layer is a face image that can be a specific user, and no face image of other users is input, so the new feature parameter output by the output layer has personality.
  • the feature that is, the general model is optimized according to the new feature parameter, and the picture information of a specific user can be better recognized.
  • the terminal device can also correct the original feature parameters of the universal model according to the HMM training algorithm, the forward algorithm, the Viterbi algorithm, the forward-backward algorithm, the EM algorithm, the DNN learning algorithm, the CNN learning algorithm or the RNN learning algorithm. .
  • the general model can be optimized according to the above new feature parameters.
  • Step 240 Optimize the universal model according to the second training algorithm and the new feature parameter to obtain an optimized general model.
  • the general model received from the server can be optimized.
  • the second training algorithm may include, but is not limited to, a Bayesian statistical modeling algorithm, a vector machine modeling algorithm, and the like.
  • the original feature parameters of the general model as described above may be stored in the first matrix; and the obtained new feature parameters may be stored in the second matrix.
  • the process of optimizing the general model according to the second training algorithm and the new feature parameter may include: adding or multiplying the first matrix and the second matrix to obtain a target matrix, where the target matrix includes characteristics of the optimized general model Parameters, then the original feature parameters in the generic model By replacing the characteristic parameters of the optimized general model, an optimized general model can be obtained.
  • the dimensions of the two matrices can be unified by adding "0" before adding or multiplying the first matrix and the second matrix.
  • the number as in the foregoing example, may be extended to a matrix of 9000 dimensions by complementing "0", and then the first matrix and the second matrix are added or multiplied.
  • the dimensions of the two matrices may be unified by other means, which is not limited in this application.
  • steps 210-230 are only one optimization process of the universal model, and those skilled in the art can continually repeat the above steps 210-230, that is, by continuously combining the personalized information of a specific user.
  • the accuracy of information recognition for a specific user can be improved.
  • the terminal device of the present invention first receives a general model delivered by a server.
  • the general model is obtained by the server according to hundreds of millions or billions of samples uploaded by at least one terminal device, the universal model identification information is obtained.
  • the accuracy is relatively high; then the generalized model is optimized to obtain a more personalized model, thereby reducing the computational complexity of the terminal device and improving the accuracy of identifying the information of a particular user.
  • the apparatus for optimizing the model for pattern recognition is provided in the embodiment of the present application. As shown in FIG. 4, the apparatus includes: a receiving unit 401, and a processing unit 402. The acquisition unit 403 and the optimization unit 404.
  • the receiving unit 401 is configured to receive a general model delivered by the server, where the universal model is obtained by the server according to samples uploaded by at least one terminal device, where the universal model includes original feature parameters.
  • the processing unit 402 is configured to identify the target information by using the universal model received by the receiving unit 401, and collect a plurality of local samples.
  • the obtaining unit 403 is configured to obtain a new feature parameter according to the plurality of local samples, the original feature parameter, and the first training algorithm when the model optimization condition is met, wherein the first training algorithm is based on a local sample pair A machine learning algorithm that modifies the original feature parameters to derive new feature parameters.
  • the model optimization condition includes one or more of the following:
  • the number of local samples reaches a preset number
  • the current time reaches a preset time
  • the terminal device is in a preset state
  • the attribute value of the terminal device reaches a preset threshold.
  • the first training algorithm includes one or more of the following:
  • Hidden Markov model HMM training algorithm forward algorithm, Viterbi algorithm, forward-backward algorithm, maximum expectation EM algorithm, deep neural network DNN algorithm, convolutional neural network CNN algorithm and recurrent neural network RNN algorithm.
  • the optimization unit 404 is configured to optimize the general model according to the second training algorithm and the new feature parameter to obtain an optimized general model.
  • the receiving unit 401 receives the general model delivered by the server; the processing unit 402 identifies the target information by using the universal model, and collects a plurality of local samples; the obtaining unit 403 When the model optimization condition is met, the new feature parameter is obtained according to the plurality of local samples, the original feature parameter, and the first training algorithm; the optimization unit 404 is configured according to the second training algorithm and the new feature parameter.
  • the general model is optimized to obtain an optimized generic model. Thereby, the user experience is improved, and the problem that the server is computationally large when the server is optimized for the general model is solved.
  • the terminal device of the present application further provides a terminal device, as shown in FIG. 5, the terminal device includes a transceiver 510 and a processing circuit 520, and optionally, A memory 530 is included.
  • Processing circuit 520 can include processor 521, radio frequency circuit 522, and baseband 523.
  • Processor 521 may comprise an NPU, a dedicated DSP, a combination of NPUs and hardware chips, or a combination of dedicated DSP and hardware chips.
  • the NPU or dedicated DSP provides computing power, for example, matrix multiplication or addition can be implemented.
  • the foregoing hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above PLD can be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), and a general array logic (GAL). Or any combination thereof.
  • the processor 521 may further include a graphics processing unit (GPU).
  • the memory 530 may include a volatile memory, such as a random access memory (RAM), and the memory 530 may also include a non-volatile memory (English: non-volatile memory).
  • RAM random access memory
  • non-volatile memory English: non-volatile memory
  • read-only memory English: read-only memory, ROM
  • flash memory English: flash memory
  • hard disk English: hard disk drive, HDD
  • solid state drive English: solid-state drive, SSD.
  • the memory 530 may also include a combination of the above types of memories.
  • the transceiver 510 is configured to receive a general model delivered by the server, where the universal model is obtained by the server according to samples uploaded by at least one terminal device, where the universal model includes original feature parameters.
  • the processing circuit 520 is configured to identify the target information by the universal model and collect a plurality of local samples.
  • the processing circuit 520 is further configured to: when the model optimization condition is met, obtain a new feature parameter according to the multiple local samples, the original feature parameter, and the first training algorithm, where the first training algorithm is based on the local sample A machine learning algorithm that modifies the original feature parameters to derive new feature parameters.
  • model optimization condition is one or more of the following:
  • the number of local samples reaches a preset number
  • the current time reaches a preset time
  • the terminal device is in a preset state
  • the attribute value of the terminal device reaches a preset threshold.
  • the first training algorithm includes one or more of the following:
  • Hidden Markov model HMM training algorithm forward algorithm, Viterbi algorithm, forward-backward algorithm, maximum expectation EM algorithm, deep neural network DNN algorithm, convolutional neural network CNN algorithm and recurrent neural network RNN algorithm.
  • the processing circuit 520 is further configured to optimize the general model according to the second training algorithm and the new feature parameter to obtain an optimized general model.
  • the terminal device of the model provided by the embodiment of the present invention first receives a general model delivered by the server.
  • the universal model is obtained by the server according to hundreds of millions or billions of samples uploaded by at least one terminal device,
  • the accuracy of the general model identification information is relatively high; after that, the general model is optimized to obtain a more personalized model, thereby reducing the calculation amount of the terminal device and improving the accuracy of identifying the information of the specific user. .
  • the steps of a method or algorithm described in connection with the embodiments disclosed herein can be implemented in hardware, a software module executed by a processor, or a combination of both.
  • the software module can be placed in random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or technical field. Any other form of storage medium known.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)
  • Stored Programmes (AREA)

Abstract

一种适用于模式识别的模型的优化方法、装置及终端设备,该方法包括:终端设备接收服务器下发的通用模型(S210);通过通用模型识别目标信息,并收集多个本地样本(S220);当满足模型优化条件时,根据多个本地样本、原始特征参数以及第一训练算法,获得新特征参数(S230);根据第二训练算法以及新特征参数,对通用模型进行优化,以获得优化后的通用模型(S240)。

Description

适用于模式识别的模型的优化方法、装置及终端设备 技术领域
本发明涉及计算机技术领域,尤其涉及一种适用于模式识别的模型的优化方法、装置及终端设备。
背景技术
现有的终端设备(如,智能手机(Smart Phone,SP))一般是通过通用计算单元来运行各种算法指令,其中,该通用计算单元一般采用先进精简指令集计算机(Reduced Instruction Set Computer,RISC)机器公司(Advanced RISC Machines,ARM)体系架构,如,中央处理器(Central Processing Unit,CPU)。在上述体系架构下,如果多线程并行运行各种算法指令,则功耗会很高,这对于依赖电池供电的终端设备来说,是不可以接受的;如果单线程运行各算法指令,则运算能力又无法满足大计算量的需求。举例来说,在语音识别、计算机视觉等领域,适用于模式识别的模型训练(如:声学模型训练)过程中就需要非常大的计算量。而由于终端设备功耗的限制或者不能满足大计算量的需求,所以现有技术一般都是在云端进行模型训练的,之后再将模型推送到终端设备对语音、图片或者视频等进行识别。
然而在云端进行适用于模式识别的模型训练时,其通常依据从至少一个终端设备上传的样本(如,语音文件、人脸图片或者视频文件),因此得到的模型是通用的,如在进行语音识别时,通过该模型可以识别所有用户的语音,而不针对某个用户的语音进行识别,也即该模型不具有个性化的特征,然而用户希望终端设备只能识别其自身的语音,而不识别或者不能很好的识别其它用户的语音,也即用户希望可以训练出比较个性化的模型。因此,就有了对上述适用于模式识别的模型进行优化的需求。
发明内容
本发明实施例提供了一种适用于模式识别的模型的优化方法、装置及终端设备,可以获得比较个性化的模型,且可以减少服务器的计算量。
第一方面,提供了一种适用于模式识别的模型的优化方法,该方法包括:
终端设备接收服务器下发的通用模型,所述通用模型是由所述服务器根据至少一个终端设备上传的样本获得的,所述通用模型包含原始特征参数;
通过所述通用模型识别目标信息,并收集多个本地样本;
当满足模型优化条件时,根据所述多个本地样本、所述原始特征参数以及第一训练算法,获得新特征参数,其中,所述第一训练算法是根据本地样本对原始特征参数进行修正以得出新特征参数的机器学习算法;
根据第二训练算法以及所述新特征参数,对所述通用模型进行优化,以获得优化后的通用模型。
为了获得比较个性化的模型,终端设备在通过通用模型识别目标信息的过程中,可以继续收集本地样本。由于本地样本是终端设备在对通用模型进行优化的过程中使用的,因此其在收集到本地样本之后,只存储在本地即可,无需上述至服务器。由此,可以减少 终端设备向服务器上传样本所消耗的流量。此外,由于通用模型是由服务器根据至少一个终端设备上传的几亿条或者几十亿条样本获得的,所以该通用模型识别信息的准确性比较高;之后通过对该通用模型进行优化来获得比较个性化的模型,由此既可以减少终端设备的计算量,也可以提升对特定用户的信息的识别的准确性。
在一个可选的实现中,模型优化条件可以包括以下一种或多种:
本地样本的个数达到预设数量、当前时间达到预设时间、终端设备处于预设状态以及终端设备的属性值达到预设阈值。
在一个可选的实现中,第一训练算法可以包括以下一种或者多种:
隐马尔可夫模型HMM训练算法、前向算法、维特比算法、前向-后向算法、最大期望EM算法、深度神经网络DNN算法、卷积神经网络CNN算法以及递归神经网络RNN算法。
第二方面,提供了一种适用于模式识别的模型的优化装置,该装置具有实现上述方法实际中终端设备行为的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。
第三方面,提供了一种终端设备,该终端设备包括收发器和处理电路,收发器,用于接收服务器下发的通用模型,所述通用模型是由所述服务器根据至少一个终端设备上传的样本获得的,所述通用模型包含原始特征参数;处理电路,用于通过所述通用模型识别目标信息,并收集多个本地样本;当满足模型优化条件时,根据所述多个本地样本、所述原始特征参数以及第一训练算法,获得新特征参数,其中,所述第一训练算法是根据本地样本对原始特征参数进行修正以得出新特征参数的机器学习算法;根据第二训练算法以及所述新特征参数,对所述通用模型进行优化,以获得优化后的通用模型。
再一方面,提供了一种计算机存储介质,用于储存为上述终端设备所用的计算机软件指令,其包含用于执行上述方面所设计的程序。
本发明实施例提供的适用于模式识别的模型的优化方法、装置及终端设备,终端设备接收服务器下发的通用模型,该通用模型包含原始特征参数;通过该通用模型识别目标信息,并收集多个本地样本;当满足模型优化条件时,通过第一训练算法来对原始特征参数进行修正,以得出新特征参数;之后根据第二训练算法以及新特征参数,对通用模型进行优化,以获得优化后的通用模型。也即发明中,是由终端设备根据收集的本地样本,来对从服务器接收的通用模型进行进一步优化,以得到比较个性化的适用于模式识别的模型,从而既提升了用户体验,又解决了由服务器来对通用模型进行优化时,服务器计算量大的问题。
附图说明
图1为本发明提供的网络结构示意图;
图2为本发明一种实施例提供的适用于模式识别的模型的优化方法流程图;
图3为本发明的第一训练算法的训练过程示意图;
图4为本发明另一种实施例提供的适用于模式识别的模型的优化装置示意图;
图5为本发明再一种实施例提供的终端设备示意图。
具体实施方式
下面通过附图和实施例,对本发明的技术方案做进一步的详细描述。
本发明实施例提供的适用于模式识别的模型的优化方法应用于图1所示的网络拓扑结构图中,图1中,服务器可以与终端设备进行数据通讯,具体地,其可以接收至少一个终端设备上传的样本,该样本包括但不限于语音文件、图片文件以及视频文件。在一个例子中,终端设备可以通过如下方式采集语音文件:可以是在用户通话的过程中由终端设备的录音软件记录的,或者也可以是在用户使用即时通信软件语音聊天的过程中由终端设备记录的,或者也可以是在其它能接收到用户的语音信号的场景下记录的。再一个例子中,终端设备可以通过如下方式采集图片文件以及视频:可以是在用户拍照或者录像的过程中由终端设备记录的,或者也可以是由终端设备从应用软件(如,微博、微信朋友圈以及QQ空间)中获取等。或者服务器也可以自行收集样本。
图1中,服务器在接收到至少一个终端设备发送的足够多的样本(如,几亿条或者几十亿条)后,先对样本进行预处理,其中,样本的预处理可以包括:分类处理以及添加标注信息等处理;之后就可以根据样本来获得通用模型。具体地,可以根据样本对训练算法进行训练来获得通用模型。此处,当样本为语音文件时,则获得的通用模型可以为语音识别模型,其可以用于识别语音信息(也称语音信号);而当样本为图片文件时,则得到的通用模型可以图片识别模型,其可以用于识别图片信息;或者,而样本为视频文件时,则得到的通用模型可以为视频识别模型,其可以用于识别视频信息。
上述训练算法包括但不限于:隐马尔可夫模型(Hidden Markov Model,HMM)训练算法、前向算法(Forward Algorithm)、维特比算法(Viterbi Algorithm)、前向-后向算法(Forward-Backward Algorithm)、最大期望(Expectation Maximization,EM)算法、深度神经网络(Deep Neural Network,DNN)学习算法、卷积神经网络(Convolutional Neural Network,CNN)学习算法以及递归神经网路(Recurrent Neural Network,RNN)学习算法。此外,上述得到的通用模型可以包含一个或多个原始特征参数。以通用模型为语音识别模型为例来说,上述原始特征参数即为语音参数,也称声学模型参数,其可以包括但不限于美尔频率倒谱系数参数(Mel Frequency Coefficient,MFCC)以及基频参数等;而以通用模型为图片识别模型为例来说,上述原始特征参数即为图片参数(也称图片模型参数),其可以包括但不限于颜色、纹理以及形状等参数。
图1中的终端设备可以具有专用数字信号处理(Digital Signal Process,DSP)芯片或者神经处理器(Neural Processing Unit,NPU)芯片,该芯片可以满足神经网络大计算量的需求,或者说本发明的终端设备具有大数据量的计算能力(如,实现矩阵相乘或者相加的计算能力),其包括但不限于移动电话、移动电脑、平板电脑、个人数字助理(Personal Digital Assistant,PDA)、媒体播放器、智能电视、智能手表、智能眼镜、智能手环等。
可以理解的是,在通常情况下,每个终端设备被一个固定的用户所使用,也即每个终端设备可以与一个特定用户相对应,因此一个终端设备采集的样本往往与特定用户的个人特征相关联,但当服务器根据至少一个终端设备上传的样本获得通用模型时,该通用模型可以识别所有用户的信息(包括:语音信息、图片信息和视频信息),即其通用性比较好,但是当该通用模型被下发至终端设备,且该终端设备使用该通用模型识别对应的特定用户的信息时,无论通用模型识别了多少次特定用户的信息,识别了多长时间特定用户的 信息,均不能提高对特定用户的信息的识别的准确性,也即服务器获得的通用模型不具有个性化,然而为了提高用户的体验,往往希望能提升终端设备对特定用户的信息的识别的准确性,而可以不对其它用户的信息进行识别,因此需要对服务器下发的通用模型进行优化。
图2为本发明一种实施例提供的适用于模式识别的模型的优化方法流程图。所述方法的执行主体可以为终端设备,如图2所示,所述方法具体可以包括:
步骤210,终端设备接收服务器下发的通用模型。
如上所述,该通用模型是由服务器根据至少一个终端设备上传的样本获得的,其可以包含一个或多个原始特征参数。在一个例子中,当该原始特征参数为多个时,可以将多个原始特征参数保存在第一矩阵中,以方便管理。
步骤220,通过所述通用模型识别目标信息,并收集多个本地样本。
此处,目标信息包括但不限于语音信息、图片信息和视频信息。具体地,当此处的通用模型为语音识别模型时,可以通过该语音识别模型识别用户输入的语音信息,以得到该语音信息对应的文字;而当此处的通用模型为图片识别模型时,则可以该通过图片识别模型识别图片信息,此处的图片信息包括但不限于人脸图片和包含有物体的图片;当此处的通用模型为视频别模型时,则可以该通过视频识别模型识别视频信息,此处的视频信息由图片信息构成。
需要说明的是,为了获得比较个性化的模型,终端设备在通过通用模型识别目标信息的过程中,可以继续收集本地样本。由于本地样本是终端设备在对通用模型进行优化的过程中使用的,因此其在收集到本地样本之后,只存储在本地即可,无需上述至服务器。由此,可以减少终端设备向服务器上传样本所消耗的流量,本地样本可以包括但不限于语音文件、图片文件和视频文件。
步骤230,当满足模型优化条件时,根据所述多个本地样本、所述原始特征参数以及第一训练算法,获得新特征参数,其中,所述第一训练算法是根据本地样本来对原始特征参数进行修正以得出新特征参数的机器学习算法。
此处,模型优化条件可以包括以下一种或多种:
本地样本的个数达到预设数量、当前时间达到预设时间、终端设备处于预设状态以及终端设备的属性值达到预设阈值。
其中,本地样本的个数达到预设数量可以举例为:收集的语音文件、图片文件或者视频文件的数量超过5000条;当前时间达到预设时间可以举例为:当前时间超过晚上12:00;终端设备处于预设状态可以举例为:终端设备处于充电状态或者待机状态;终端设备的属性值达到预设阈值可以举例为:终端设备的电量超过80%或者终端设备的温度低于25度。
此外,上述第一训练算法可以与服务器获得通用模型时使用的训练算法相一致,如当服务器根据深度学习算法和样本来获得通用模型时,则终端设备也可以根据深度学习算法来对通用模型的原始特征参数进行修正。
举例来说,当第一训练算法为深度学习算法时,则通用模型中的原始特征参数的修正方法可参见图3所示,图3中,深度学习算法包括三层:输入层、隐藏层和输出层,其中,输入层用于输入输入数据,此处的输入数据包括原始特征参数以及本地样本,其可以 由一列“○”(称为节点)构成,每个节点用于输入一个输入数据,以原始特征参数为:a1,a2,…,an为例来说,其中,n>=1,也即原始特征参数的个数为n个,则在深度学习算法的输入层可以输入n个原始特征参数以及本地样本;隐藏层用于根据本地样本对输入层输入的n个原始特征参数进行修正,其具体是通过对输入数据进行相应的计算来实现修正的;图3中,隐藏层中第k列的各个节点用于表示对输入数据进行第k次计算后的结果,其中,任一列的每个节点根据其前一列所有节点线性组合确定,如,隐藏层中最前列的各个节点就是对输入数据进行1次计算后对应的结果;假设隐藏层第1列的第j个节点可以表示为:y2j,其中,则y2j的计算公式可以为:y2j=f(∑x1i·ω1i+b),其中,x1i表示第i个输入数据,ω和b可以根据经验值设定,f可以是自选的;输出层用于输出计算后的输入数据,其可以由一列“○”构成,每个节点用于输出一个计算后的输入数据,如,输出一个新特征参数,假设输出的全部的新特征参数可以为:b1,b2,…,bm,其中,m>=n,也即新特征参数的个数大于等于原始特征参数的个数。在一个例子中,输出层输出的新特征参数可以记录到第二矩阵中。
可以理解的是,虽然终端设备对原始特征参数修正时采用的训练算法与服务器获得通用模型时采用的训练算法相一致,但是,后者的计算量却远远大于前者的计算量。因为前者收集的本地样本数量仅为几千条,而后者接收的至少一个终端设备上传样本数量为几亿条或者几十亿条样本。
此外,图3中,由于输入层输入的样本是本地样本,如在对语音识别模型的原始特征参数进行修正时,输入层输入的本地样本是该终端设备对应的特定用户的语音文件,没有输入其它用户的语音文件,所以输出层输出的新特征参数具有个性化特征,也即根据该新特征参数对通用模型优化后能更好地识别特定用户的语音信息,而不能识别或者不能很好地识别其它用户的语音信息。再如在对图片识别模型的原始特征参数进行修正时,输入层输入的本地样本是可以是特定用户的人脸图片,没有输入其它用户的人脸图片,所以输出层输出的新特征参数具有个性化特征,也即根据该新特征参数后对通用模型优化后能更好地识别特定用户的图片信息。
此外,终端设备还可以根据HMM训练算法、前向算法、维特比算法、前向-后向算法、EM算法、DNN学习算法、CNN学习算法或者RNN学习算法来对通用模型的原始特征参数进行修正。
本发明为了获得精确性比较高的个性化模型,也就是说为了提升对特定用户的信息的识别的准确性,可以根据上述新特征参数对通用模型进行优化。
步骤240,根据第二训练算法以及所述新特征参数,对所述通用模型进行优化,以获得优化后的通用模型。
为了获得精确性比较高的个性化模型,则可以对从服务器接收的通用模型进行优化。此处,第二训练算法可以包括但不限于:贝叶斯统计建模算法以及向量机建模算法等。
如上所述通用模型的原始特征参数可以存储在第一矩阵中;而得到的新特征参数可以存储在第二矩阵中,在一个例子中,第一矩阵可以为3000(即n=3000)维的矩阵,而第二矩阵可以为9000(即m=9000)维的矩阵。上述根据第二训练算法以及新特征参数,对通用模型优化的过程具体可以包括:将上述第一矩阵与第二矩阵相加或者相乘后得到目标矩阵,该目标矩阵包含优化后通用模型的特征参数,之后将通用模型中的原始特征参数 替换为优化后的通用模型的特征参数,即可以获得优化后的通用模型。可以理解的是,由于第一矩阵与第二矩阵的维数不同,因此在对第一矩阵与第二矩阵进行相加或者相乘之前,可以通过补“0”的方式统一两个矩阵的维数,如前述例子,可以通过补“0”的方式将第一矩阵扩展为9000维的矩阵,之后,再对第一矩阵与第二矩阵进行相加或者相乘。
当然,在实际应用中,也可以通过其它方式统一两个矩阵的维数,本申请对此不作限定。如,在matlab中,也可以通过补“nan值”的方式,其中,“nan值”用于表示不具有实际意义的数值,具体地,matlab在处理“nan值”时,会跳过该“nan值”,而不作任何处理。
当然,上述只是举例说明了一种通用模型的优化方式,并不作为对本发明的限制,通用模型的优化的方式可以根据所采用的第二训练算法确定,本发明不再一一例举。
需要说明的是,上述步骤210-步骤230只是通用模型的一次优化的过程,本领域的技术人员可以不断地重复执行上述步骤210-步骤230,也即通过不断地将特定用户的个人化信息结合到通用模型中,就可以提升对特定用户的信息识别的准确性。
综上,本发明终端设备先接收服务器下发的通用模型,此处,由于通用模型是由服务器根据至少一个终端设备上传的几亿条或者几十亿条样本获得的,所以该通用模型识别信息的准确性比较高;之后通过对该通用模型进行优化来获得比较个性化的模型,由此既可以减少终端设备的计算量,也可以提升对特定用户的信息的识别的准确性。
与上述适用于模式识别的模型的优化方法对应地,本申请实施例还提供的一种适用于模式识别的模型的优化装置,如图4所示,该装置包括:接收单元401、处理单元402、获取单元403和优化单元404。
接收单元401,用于接收服务器下发的通用模型,所述通用模型是由所述服务器根据至少一个终端设备上传的样本获得的,所述通用模型包含原始特征参数。
处理单元402,用于通过接收单元401接收的所述通用模型识别目标信息,并收集多个本地样本。
获取单元403,用于当满足模型优化条件时,根据所述多个本地样本、所述原始特征参数以及第一训练算法,获得新特征参数,其中,所述第一训练算法是根据本地样本对原始特征参数进行修正以得出新特征参数的机器学习算法。
其中,所述模型优化条件包括以下一种或多种:
本地样本的个数达到预设数量、当前时间达到预设时间、终端设备处于预设状态以及终端设备的属性值达到预设阈值。
此外,第一训练算法包括以下一种或者多种:
隐马尔可夫模型HMM训练算法、前向算法、维特比算法、前向-后向算法、最大期望EM算法、深度神经网络DNN算法、卷积神经网络CNN算法以及递归神经网络RNN算法。
优化单元404,用于根据第二训练算法以及所述新特征参数,对所述通用模型进行优化,以获得优化后的通用模型。
本发明实施例提供的适用于模式识别的模型的优化装置,接收单元401接收服务器下发的通用模型;处理单元402通过所述通用模型识别目标信息,并收集多个本地样本;获取单元403当满足模型优化条件时,根据所述多个本地样本、所述原始特征参数以及第一训练算法,获得新特征参数;优化单元404根据第二训练算法以及所述新特征参数,对 所述通用模型进行优化,以获得优化后的通用模型。从而既提升了用户体验,又解决了由服务器来对通用模型进行优化时,服务器计算量大的问题。
与上述适用于模式识别的模型的优化方法对应地,本申请实施例还提供的一种终端设备,如图5所示,该终端设备包括收发器510和处理电路520,可选地,还可以包括存储器530。处理电路520可以包括处理器521、射频电路522和基带523。
处理器521可以包括NPU、专用DSP、NPU和硬件芯片的组合或者专用DSP和硬件芯片的组合。其中,NPU或者专用DSP提供计算能力,如,可以实现矩阵的乘法或者加法运算。此外,上述硬件芯片可以是专用集成电路(英文:application-specific integrated circuit,ASIC),可编程逻辑器件(英文:programmable logic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(英文:complex programmable logic device,CPLD),现场可编程逻辑门阵列(英文:field-programmable gate array,FPGA),通用阵列逻辑(英文:generic array logic,GAL)或其任意组合。此外,处理器521还可以包括图形处理器(Graphics Processing Unit,GPU)。
存储器530可以包括易失性存储器(英文:volatile memory),例如随机存取存储器(英文:random-access memory,RAM);存储器530也可以包括非易失性存储器(英文:non-volatile memory),例如只读存储器(英文:read-only memory,ROM),快闪存储器(英文:flash memory),硬盘(英文:hard disk drive,HDD)或固态硬盘(英文:solid-state drive,SSD)。存储器530还可以包括上述种类的存储器的组合。
收发器510,用于接收服务器下发的通用模型,所述通用模型是由所述服务器根据至少一个终端设备上传的样本获得的,所述通用模型包含原始特征参数。
处理电路520,用于通过所述通用模型识别目标信息,并收集多个本地样本。
处理电路520,还用于当满足模型优化条件时,根据所述多个本地样本、所述原始特征参数以及第一训练算法,获得新特征参数,其中,所述第一训练算法是根据本地样本来对原始特征参数进行修正以得出新特征参数的机器学习算法。
其中,所述模型优化条件以下一种或多种:
本地样本的个数达到预设数量、当前时间达到预设时间、终端设备处于预设状态以及终端设备的属性值达到预设阈值。
此外,第一训练算法包括以下一种或者多种:
隐马尔可夫模型HMM训练算法、前向算法、维特比算法、前向-后向算法、最大期望EM算法、深度神经网络DNN算法、卷积神经网络CNN算法以及递归神经网络RNN算法。
处理电路520,还用于根据第二训练算法以及所述新特征参数,对所述通用模型进行优化,以获得优化后的通用模型。
本发明实施例提供的模型的终端设备,先接收服务器下发的通用模型,此处,由于通用模型是由服务器根据至少一个终端设备上传的几亿条或者几十亿条样本获得的,所以该通用模型识别信息的准确性比较高;之后通过对该通用模型进行优化来获得比较个性化的模型,由此既可以减少终端设备的计算量,也可以提升对特定用户的信息的识别的准确性。
专业人员应该还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬 件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (10)

  1. 一种适用于模式识别的模型的优化方法,其特征在于,所述方法包括:
    终端设备接收服务器下发的通用模型,所述通用模型是由所述服务器根据至少一个终端设备上传的样本获得的,所述通用模型包含原始特征参数;
    通过所述通用模型识别目标信息,并收集多个本地样本;
    当满足模型优化条件时,根据所述多个本地样本、所述原始特征参数以及第一训练算法,获得新特征参数,其中,所述第一训练算法是根据本地样本对原始特征参数进行修正以得出新特征参数的机器学习算法;
    根据第二训练算法以及所述新特征参数,对所述通用模型进行优化,以获得优化后的通用模型。
  2. 根据权利要求1所述的方法,其特征在于,所述模型优化条件包括以下一种或多种:
    本地样本的个数达到预设数量、当前时间达到预设时间、终端设备处于预设状态以及终端设备的属性值达到预设阈值。
  3. 根据权利要求1或2所述的方法,其特征在于,所述第一训练算法包括以下一种或者多种:
    隐马尔可夫模型HMM训练算法、前向算法、维特比算法、前向-后向算法、最大期望EM算法、深度神经网络DNN算法、卷积神经网络CNN算法以及递归神经网络RNN算法。
  4. 一种适用于模式识别的模型的优化装置,其特征在于,所述装置包括:接收单元、处理单元、获取单元和优化单元;
    所述接收单元,用于接收服务器下发的通用模型,所述通用模型是由所述服务器根据至少一个终端设备上传的样本获得的,所述通用模型包含原始特征参数;
    所述处理单元,用于通过所述接收单元接收的所述通用模型识别目标信息,并收集多个本地样本;
    所述获取单元,用于当满足模型优化条件时,根据所述多个本地样本、所述原始特征参数以及第一训练算法,获得新特征参数,其中,所述第一训练算法根据对本地样本对原始特征参数进行修正以得出新特征参数的机器学习算法;
    所述优化单元,用于根据第二训练算法以及所述新特征参数,对所述通用模型进行优化,以获得优化后的通用模型。
  5. 根据权利要求4所述的装置,其特征在于,所述模型优化条件包括以下一种或多种:
    本地样本的个数达到预设数量、当前时间达到预设时间、终端设备处于预设状态以及终端设备的属性值达到预设阈值。
  6. 根据权利要求4或5所述的装置,其特征在于,所述第一训练算法包括以下一种或者多种:
    隐马尔可夫模型HMM训练算法、前向算法、维特比算法、前向-后向算法、最大期望EM算法、深度神经网络DNN算法、卷积神经网络CNN算法以及递归神经网络RNN算法。
  7. 一种终端设备,其特征在于,所述终端设备包括:收发器和处理电路;
    所述收发器,用于接收服务器下发的通用模型,所述通用模型是由所述服务器根据至少一个终端设备上传的样本获得的,所述通用模型包含原始特征参数;
    所述处理电路,用于:
    通过所述通用模型识别目标信息,并收集多个本地样本;
    当满足模型优化条件时,根据所述多个本地样本、所述原始特征参数以及第一训练算法,获得新特征参数,其中,所述第一训练算法是根据本地样本对原始特征参数进行修正以得出新特征参数的机器学习算法;
    根据第二训练算法以及所述新特征参数,对所述通用模型进行优化,以获得优化后的通用模型。
  8. 根据权利要求7所述的终端设备,其特征在于,所述模型优化条件包括以下一种或多种:
    本地样本的个数达到预设数量、当前时间达到预设时间、终端设备处于预设状态以及终端设备的属性值达到预设阈值。
  9. 根据权利要求7或8所述的终端设备,其特征在于,所述第一训练算法包括以下一种或者多种:
    隐马尔可夫模型HMM训练算法、前向算法、维特比算法、前向-后向算法、最大期望EM算法、深度神经网络DNN算法、卷积神经网络CNN算法以及递归神经网络RNN算法。
  10. 根据权利要求7-9任一项所述的终端设备,其特征在于,所述处理电路包括:神经处理器NPU或专用数字信号处理器DSP。
PCT/CN2017/089417 2016-06-23 2017-06-21 适用于模式识别的模型的优化方法、装置及终端设备 WO2017219991A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
BR112018076645-3A BR112018076645A2 (pt) 2016-06-23 2017-06-21 método e aparelho para otimizar modelo aplicável para reconhecimento de padrão, e dispositivo do terminal
MYPI2018002664A MY193115A (en) 2016-06-23 2017-06-21 Method and apparatus for optimizing model applicable to pattern recognition, and terminal device
JP2018566575A JP6806412B2 (ja) 2016-06-23 2017-06-21 パターン認識に適用可能なモデルを最適化するための方法および装置ならびに端末デバイス
US16/313,044 US10825447B2 (en) 2016-06-23 2017-06-21 Method and apparatus for optimizing model applicable to pattern recognition, and terminal device
EP17814729.4A EP3460792B1 (en) 2016-06-23 2017-06-21 Optimization method and terminal device suitable for model of pattern recognition

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610472755.0 2016-06-23
CN201610472755.0A CN107545889B (zh) 2016-06-23 2016-06-23 适用于模式识别的模型的优化方法、装置及终端设备

Publications (1)

Publication Number Publication Date
WO2017219991A1 true WO2017219991A1 (zh) 2017-12-28

Family

ID=60784235

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/089417 WO2017219991A1 (zh) 2016-06-23 2017-06-21 适用于模式识别的模型的优化方法、装置及终端设备

Country Status (7)

Country Link
US (1) US10825447B2 (zh)
EP (1) EP3460792B1 (zh)
JP (1) JP6806412B2 (zh)
CN (1) CN107545889B (zh)
BR (1) BR112018076645A2 (zh)
MY (1) MY193115A (zh)
WO (1) WO2017219991A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241745A (zh) * 2020-01-09 2020-06-05 深圳前海微众银行股份有限公司 逐步模型选择方法、设备及可读存储介质
CN111382403A (zh) * 2020-03-17 2020-07-07 同盾控股有限公司 用户行为识别模型的训练方法、装置、设备及存储介质
JP2020144775A (ja) * 2019-03-08 2020-09-10 トヨタ自動車株式会社 モデル集約装置及びモデル集約システム
EP3907662A4 (en) * 2019-02-27 2022-01-19 Huawei Technologies Co., Ltd. METHOD AND DEVICE FOR PROCESSING A NEURAL NETWORK MODEL
JP7440420B2 (ja) 2018-05-07 2024-02-28 グーグル エルエルシー 包括的機械学習サービスを提供するアプリケーション開発プラットフォームおよびソフトウェア開発キット

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11450319B2 (en) * 2017-09-29 2022-09-20 Cambricon (Xi'an) Semiconductor Co., Ltd. Image processing apparatus and method
US11437032B2 (en) * 2017-09-29 2022-09-06 Shanghai Cambricon Information Technology Co., Ltd Image processing apparatus and method
EP3667488B1 (en) * 2017-09-29 2023-06-28 Shanghai Cambricon Information Technology Co., Ltd Image processing apparatus and method
US11630666B2 (en) 2018-02-13 2023-04-18 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11704125B2 (en) 2018-02-13 2023-07-18 Cambricon (Xi'an) Semiconductor Co., Ltd. Computing device and method
EP3651078B1 (en) 2018-02-13 2021-10-27 Shanghai Cambricon Information Technology Co., Ltd Computation device and method
CN116991226A (zh) 2018-02-14 2023-11-03 上海寒武纪信息科技有限公司 处理器的控制装置、方法及设备
CN108682416B (zh) * 2018-04-11 2021-01-01 深圳市卓翼科技股份有限公司 本地自适应语音训练方法和系统
EP3624020A4 (en) 2018-05-18 2021-05-05 Shanghai Cambricon Information Technology Co., Ltd CALCULATION PROCEDURES AND RELATED PRODUCTS
CN108446687B (zh) * 2018-05-28 2022-02-01 唯思电子商务(深圳)有限公司 一种基于移动端和后台互联的自适应人脸视觉认证方法
CN108833784B (zh) * 2018-06-26 2021-01-29 Oppo(重庆)智能科技有限公司 一种自适应构图方法、移动终端及计算机可读存储介质
EP3825841A1 (en) 2018-06-27 2021-05-26 Shanghai Cambricon Information Technology Co., Ltd Method and device for parallel computation of a network model
EP3757896B1 (en) 2018-08-28 2023-01-11 Cambricon Technologies Corporation Limited Method and device for pre-processing data in a neural network
US11703939B2 (en) 2018-09-28 2023-07-18 Shanghai Cambricon Information Technology Co., Ltd Signal processing device and related products
CN111276138B (zh) * 2018-12-05 2023-07-18 北京嘀嘀无限科技发展有限公司 一种语音唤醒系统中处理语音信号的方法及装置
CN111415653B (zh) * 2018-12-18 2023-08-01 百度在线网络技术(北京)有限公司 用于识别语音的方法和装置
CN109683938B (zh) * 2018-12-26 2022-08-02 思必驰科技股份有限公司 用于移动终端的声纹模型升级方法和装置
CN111383637A (zh) 2018-12-28 2020-07-07 上海寒武纪信息科技有限公司 信号处理装置、信号处理方法及相关产品
CN111832737B (zh) 2019-04-18 2024-01-09 中科寒武纪科技股份有限公司 一种数据处理方法及相关产品
US11934940B2 (en) 2019-04-18 2024-03-19 Cambricon Technologies Corporation Limited AI processor simulation
CN111862945A (zh) * 2019-05-17 2020-10-30 北京嘀嘀无限科技发展有限公司 一种语音识别方法、装置、电子设备及存储介质
CN111859977B (zh) * 2019-06-06 2024-06-07 北京嘀嘀无限科技发展有限公司 一种语义分析方法、装置、电子设备及存储介质
US11676028B2 (en) 2019-06-12 2023-06-13 Shanghai Cambricon Information Technology Co., Ltd Neural network quantization parameter determination method and related products
CN112085192B (zh) 2019-06-12 2024-03-29 上海寒武纪信息科技有限公司 一种神经网络的量化参数确定方法及相关产品
US12001955B2 (en) 2019-08-23 2024-06-04 Anhui Cambricon Information Technology Co., Ltd. Data processing method, device, computer equipment and storage medium
US11599799B1 (en) * 2019-09-17 2023-03-07 Rockwell Collins, Inc. Digital signal processing with neural networks
CN112907309A (zh) * 2019-11-19 2021-06-04 阿里巴巴集团控股有限公司 模型更新方法、资源推荐方法、装置、设备及系统
CN111404833B (zh) * 2020-02-28 2022-04-12 华为技术有限公司 一种数据流类型识别模型更新方法及相关设备
CN111522570B (zh) * 2020-06-19 2023-09-05 杭州海康威视数字技术股份有限公司 目标库更新方法、装置、电子设备及机器可读存储介质
CN112070086B (zh) * 2020-09-09 2024-05-07 平安科技(深圳)有限公司 文本识别系统的优化方法、计算机设备及存储介质
CN112735381B (zh) * 2020-12-29 2022-09-27 四川虹微技术有限公司 一种模型更新方法及装置
CN112820302B (zh) * 2021-01-28 2024-04-12 Oppo广东移动通信有限公司 声纹识别方法、装置、电子设备和可读存储介质
CN112992174A (zh) * 2021-02-03 2021-06-18 深圳壹秘科技有限公司 一种语音分析方法及其语音记录装置
CN113780737A (zh) * 2021-08-10 2021-12-10 武汉飞恩微电子有限公司 基于机器学习的作业调度优化方法、装置、设备及介质
CN115472167A (zh) * 2022-08-17 2022-12-13 南京龙垣信息科技有限公司 基于大数据自监督的声纹识别模型训练方法、系统
CN115600177B (zh) * 2022-10-09 2024-04-16 北京金和网络股份有限公司 一种身份认证的方法、装置、存储介质及电子设备
CN115938353B (zh) * 2022-11-24 2023-06-27 北京数美时代科技有限公司 语音样本分布式采样方法、系统、存储介质和电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006137246A1 (ja) * 2005-06-21 2006-12-28 Pioneer Corporation 音声認識装置、音声認識方法、音声認識プログラムおよび記録媒体
CN103632667A (zh) * 2013-11-25 2014-03-12 华为技术有限公司 声学模型优化方法、装置及语音唤醒方法、装置和终端
CN105096941A (zh) * 2015-09-02 2015-11-25 百度在线网络技术(北京)有限公司 语音识别方法以及装置
CN105206258A (zh) * 2015-10-19 2015-12-30 百度在线网络技术(北京)有限公司 声学模型的生成方法和装置及语音合成方法和装置

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002331723A (ja) 2001-05-08 2002-11-19 Canon Inc 画像処理装置、画像処理システム、画像処理装置の制御方法、記録媒体、及び制御プログラム
GB2409560B (en) 2003-12-23 2007-07-25 Ibm Interactive speech recognition model
US9111540B2 (en) 2009-06-09 2015-08-18 Microsoft Technology Licensing, Llc Local and remote aggregation of feedback data for speech recognition
KR101154011B1 (ko) * 2010-06-07 2012-06-08 주식회사 서비전자 다중 모델 적응화와 음성인식장치 및 방법
WO2014096506A1 (en) 2012-12-21 2014-06-26 Nokia Corporation Method, apparatus, and computer program product for personalizing speech recognition
US9208777B2 (en) * 2013-01-25 2015-12-08 Microsoft Technology Licensing, Llc Feature space transformation for personalization using generalized i-vector clustering
US9582716B2 (en) * 2013-09-09 2017-02-28 Delta ID Inc. Apparatuses and methods for iris based biometric recognition
US20150170053A1 (en) * 2013-12-13 2015-06-18 Microsoft Corporation Personalized machine learning models
JP2015132877A (ja) 2014-01-09 2015-07-23 株式会社Nttドコモ 動作認識システム及び動作認識方法
US10824958B2 (en) 2014-08-26 2020-11-03 Google Llc Localized learning from a global model
EP2990999A1 (en) * 2014-08-29 2016-03-02 Accenture Global Services Limited A machine-learning system to optimise the performance of a biometric system
CN105489221B (zh) * 2015-12-02 2019-06-14 北京云知声信息技术有限公司 一种语音识别方法及装置
US20180358003A1 (en) * 2017-06-09 2018-12-13 Qualcomm Incorporated Methods and apparatus for improving speech communication and speech interface quality using neural networks
KR101936188B1 (ko) * 2017-07-12 2019-01-08 이민정 개체 판별 방법 및 장치
CN108830211A (zh) * 2018-06-11 2018-11-16 厦门中控智慧信息技术有限公司 基于深度学习的人脸识别方法及相关产品
US11144748B2 (en) * 2018-12-07 2021-10-12 IOT Technology, LLC. Classification system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006137246A1 (ja) * 2005-06-21 2006-12-28 Pioneer Corporation 音声認識装置、音声認識方法、音声認識プログラムおよび記録媒体
CN103632667A (zh) * 2013-11-25 2014-03-12 华为技术有限公司 声学模型优化方法、装置及语音唤醒方法、装置和终端
CN105096941A (zh) * 2015-09-02 2015-11-25 百度在线网络技术(北京)有限公司 语音识别方法以及装置
CN105206258A (zh) * 2015-10-19 2015-12-30 百度在线网络技术(北京)有限公司 声学模型的生成方法和装置及语音合成方法和装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FANG, BIN: "Yu3yinlshi2bie2zhongl zi4shi4yinglfanglfa3 de yan2jiul", CHINESE DOCTOR'S & MASTER'S THESES FULL-TEXT DATABASE (MASTER) INFORMATION TECHNOLOGY, 15 February 2007 (2007-02-15), pages 1 - 64, XP009511600 *
See also references of EP3460792A4 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7440420B2 (ja) 2018-05-07 2024-02-28 グーグル エルエルシー 包括的機械学習サービスを提供するアプリケーション開発プラットフォームおよびソフトウェア開発キット
EP3907662A4 (en) * 2019-02-27 2022-01-19 Huawei Technologies Co., Ltd. METHOD AND DEVICE FOR PROCESSING A NEURAL NETWORK MODEL
JP2020144775A (ja) * 2019-03-08 2020-09-10 トヨタ自動車株式会社 モデル集約装置及びモデル集約システム
US11531350B2 (en) 2019-03-08 2022-12-20 Toyota Jidosha Kabushiki Kaisha Model aggregation device and model aggregation system
CN111241745A (zh) * 2020-01-09 2020-06-05 深圳前海微众银行股份有限公司 逐步模型选择方法、设备及可读存储介质
WO2021139462A1 (zh) * 2020-01-09 2021-07-15 深圳前海微众银行股份有限公司 逐步模型选择方法、设备及可读存储介质
CN111241745B (zh) * 2020-01-09 2024-05-24 深圳前海微众银行股份有限公司 逐步模型选择方法、设备及可读存储介质
CN111382403A (zh) * 2020-03-17 2020-07-07 同盾控股有限公司 用户行为识别模型的训练方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN107545889B (zh) 2020-10-23
EP3460792A1 (en) 2019-03-27
MY193115A (en) 2022-09-26
US20190228762A1 (en) 2019-07-25
CN107545889A (zh) 2018-01-05
JP2019528502A (ja) 2019-10-10
EP3460792B1 (en) 2021-10-27
EP3460792A4 (en) 2019-06-12
JP6806412B2 (ja) 2021-01-06
BR112018076645A2 (pt) 2019-03-26
US10825447B2 (en) 2020-11-03

Similar Documents

Publication Publication Date Title
WO2017219991A1 (zh) 适用于模式识别的模型的优化方法、装置及终端设备
JP6741357B2 (ja) マルチ関連ラベルを生成する方法及びシステム
KR102048390B1 (ko) 심층 신경망 기반의 인식 장치, 트레이닝 장치, 및 이들의 방법
TWI794157B (zh) 自動多閾值特徵過濾方法及裝置
WO2019114344A1 (zh) 一种基于图结构模型的异常账号防控方法、装置以及设备
US20180158449A1 (en) Method and device for waking up via speech based on artificial intelligence
WO2022121180A1 (zh) 模型的训练方法、装置、语音转换方法、设备及存储介质
CN111583911B (zh) 基于标签平滑的语音识别方法、装置、终端及介质
CN112687266B (zh) 语音识别方法、装置、计算机设备和存储介质
WO2018059302A1 (zh) 文本识别方法、装置及存储介质
WO2019238125A1 (zh) 信息处理方法、相关设备及计算机存储介质
CN111144457A (zh) 图像处理方法、装置、设备及存储介质
CN113626610A (zh) 知识图谱嵌入方法、装置、计算机设备和存储介质
JP2024035052A (ja) 軽量モデルトレーニング方法、画像処理方法、軽量モデルトレーニング装置、画像処理装置、電子デバイス、記憶媒体及びコンピュータプログラム
WO2021253938A1 (zh) 一种神经网络的训练方法、视频识别方法及装置
CN113743277A (zh) 一种短视频分类方法及系统、设备和存储介质
CN114155388B (zh) 一种图像识别方法、装置、计算机设备和存储介质
WO2023010701A1 (en) Image generation method, apparatus, and electronic device
JP6633556B2 (ja) 音響モデル学習装置、音声認識装置、音響モデル学習方法、音声認識方法、およびプログラム
CN114333786A (zh) 语音情感识别方法及相关装置、电子设备和存储介质
CN112825152A (zh) 深度学习模型的压缩方法、装置、设备及存储介质
CN115481285B (zh) 跨模态的视频文本匹配方法、装置、电子设备及存储介质
KR102418887B1 (ko) 음성 인식을 위한 음향 모델 학습 장치 및 그 학습 방법
CN110728625B (zh) 一种图像推理的方法及装置
CN117076747A (zh) 基于机器人的数据抓取方法、装置和计算机设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17814729

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018566575

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112018076645

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 2017814729

Country of ref document: EP

Effective date: 20181221

ENP Entry into the national phase

Ref document number: 112018076645

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20181220