CN110704711A

CN110704711A - Object automatic identification system for lifetime learning

Info

Publication number: CN110704711A
Application number: CN201910856599.1A
Authority: CN
Inventors: 仲国强; 李涛; 刘文雪
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2019-09-11
Filing date: 2019-09-11
Publication date: 2020-01-17

Abstract

The invention discloses a lifetime learning-oriented object automatic identification system, which comprises a data set acquisition module, a data acquisition module and a database module, wherein the data set acquisition module is used for acquiring a data set; identifying an object; a picture verification code; and (5) voice recognition. The invention has the advantages of being capable of learning and automatically identifying objects throughout the life and high in accuracy.

Description

Object automatic identification system for lifetime learning

Technical Field

The invention belongs to the technical field of automatic object identification, and relates to an automatic object identification system for lifetime learning.

Background

As an important branch of artificial intelligence, object automatic identification has been applied to many fields such as security protection, unmanned driving and the like, and has great development potential. In the object recognition task, the size of the data set and the capabilities of the model determine the overall system performance. Under the condition of enough data quantity, the deep learning method and the machine learning method can achieve higher accuracy.

Disclosure of Invention

The invention aims to provide an object automatic identification system for lifetime learning, and has the advantages of lifetime learning, automatic object identification and high accuracy.

The technical scheme adopted by the invention is carried out according to the following steps:

step 1: acquiring a data set;

step 2: identifying an object;

and step 3: a picture verification code;

and 4, step 4: and (5) voice recognition.

Further, the step 1 of acquiring the data set is to collect data pictures by adopting a method of focusing a crawler, the data can be acquired uninterruptedly by the focusing crawler, and the deep learning model used in the invention can learn any kind of pictures because the data crawled by the crawler has diversity.

The method of focusing the crawler is as follows:

1) begin running at the initial URL;

2) acquiring a webpage;

3) capturing a new URL and putting the new URL into a URL queue;

4) evaluating the webpage and the URL according to an analysis algorithm;

5) if the stopping condition is met, ending, otherwise, turning to the next step;

6) and selecting the URL according to the search strategy, and jumping to the step 3.

Further, step 2: identifying an object;

the Deep learning model employed is a Deep Residual Network (Deep Residual Network). Generally, the performance of the network can be improved well by increasing the depth of the network, the deeper the network is, the better the detection effect is generally, so there is an idea that the deeper the network layer number is, the better the network layer number is, but this is not the case, after the network layer number reaches a certain number, the performance of the network is saturated, and the performance of the network starts to degrade by increasing the depth, and the degradation is not caused by overfitting, because the training precision and the testing precision are reduced at this time, which means that the deep network becomes difficult to train after the network becomes very deep. The occurrence of ResNet solves the problem of performance degradation after the network depth becomes deeper. The present invention uses Resnet-50 with a 50-layer network structure.

Further, step 3: a picture verification code;

and filtering the data crawled by the web crawler by using a picture verification code mode. When a picture of a specific category a is crawled, all obtained pictures cannot be guaranteed to belong to the category a, the pictures which do not belong to the category a are called as wrong pictures, and the wrong pictures generate noise during model training and reduce the performance of the model, so that the wrong pictures need to be eliminated by a desired method. The invention adopts the form of the picture verification code to filter data because the user clicks the verification code to be a form of marking the picture, and the invention completes the marking by utilizing the operation of the user. Specifically, taking 'camera' as an example, 6 pictures are selected each time as the verification code, wherein one part of the 6 pictures are selected from the 'non-camera' category in the auxiliary training set, and one part of the 6 pictures are sampled from the crawler data set. In the user operation stage, if the user correctly selects the photo of the auxiliary training set, the user passes the verification, and if the user selects the picture crawled by the crawler at the moment, the crawled picture is considered to not belong to the camera class with a high probability and can be directly deleted from the crawler data set. If the user does not fully select the pictures of the secondary training set (i.e. all 'non-camera' pictures), then refresh is performed.

Through the picture verification code, the method and the device can filter data by using the operation of a user and delete the wrong picture in the crawler data set. Due to the fact that data are obtained in real time, the real-time marks of users and the training set can be increased along with the increase of time, the accuracy of the model is improved, the generalization capability of the model to a new sample is greatly improved, and the purpose of lifelong learning is achieved.

Further, step 4: performing voice recognition;

through voice recognition filtering, a user rejects wrong pictures in the data set through voice recognition, and the data set is purified. The speech recognition uses a method of calling a Baidu speech recognition rest API.

Drawings

FIG. 1 is a data acquisition interface;

FIG. 2 is a model training interface;

FIG. 3 is a captcha interface.

Detailed Description

The present invention will be described in detail with reference to the following embodiments.

1. Test protocol

Firstly, 200 pictures are crawled, and error data in the pictures are removed to be used as a test set. Then, 1200 pictures are crawled again as a training set to train the model. Specifically, one model is trained by using 200 pictures, another model is trained by using all 1200 pictures, and the two models are tested on a test set to obtain the accuracy of the models.

2. Test environment set-up

(1) The whole testing process

And (3) testing environment: windows system

And (3) operating environment: python3.5

(2) Object identification process

And (3) testing environment: windows system is equipped with GPU

And (3) operating environment: tensorflow Keras

(3) Speech recognition process

And (3) testing environment: windows system

The installation is required: baidu AIP SDK, PYAUDIO

3. Test procedure

The method mainly comprises a data acquisition module, a model training module, a verification code module and a voice recognition module, wherein the test results of all the modules are as follows.

(1) Data acquisition phase

In fig. 1, the item name is entered and the label of the item is filled in according to the prompt. And then, crawler parameters are set, a starting page number (each page comprises 60 pictures) and a stopping page number are selected, after the page is clicked and determined, data begin to crawl, the crawled data can be stored in a preset folder, and a display frame below the page can display the crawling progress.

(2) Model training phase

In fig. 2, after the name and the label of the article are input according to the prompt, the batch _ size and the epochs are selected, and after the click training, the model training is performed by using the data in the folder obtained in the data acquisition stage. The lower display box will show the model training progress.

(3) Verification code phase

Shown in fig. 3 is a picture of the authentication code for selection by the user who, upon selection, can click a check box to select and then click an ok button to submit. Another option for the user is to click on the voice control and select the picture that matches the prompt by voice.

(4) Speech control phase

And a voice control stage, wherein the user clicks a voice control and then inputs a control voice, such as 'select 3, 5 pages', and if the input voice correctly selects a corresponding picture, a verification passing dialog box pops up.

4. Analysis of results

Table 1 shows the test accuracy obtained using 200 data training models and table 2 shows the test accuracy obtained using 1200 data training. The results show that the performance of the model can be improved by adding the training data. The deep learning model used in the invention achieves higher test accuracy.

Accuracy after learning of 1200 samples in table

Identifying an object	Camera with a camera module	Mobile phone	Tape recorder	Sound equipment	Adhesive tape
						Rate of accuracy	87.5％	89.75％	88.75％	90％	96.75％

Accuracy after learning of 11200 samples of the Table

Identifying an object	Camera with a camera module	Mobile phone	Tape recorder	Sound equipment	Adhesive tape
						Rate of accuracy	97.25％	97.75％	97.75％	98.5％	99.05％

The invention uses Deep Residual Network (Deep Residual Network) for object recognition. In order to obtain training data, the invention adopts a crawler mode to crawl data from the Internet. The data set crawled by the crawler has error pictures, and the invention utilizes a verification code mode to remove the error data. And finally, using the filtered data for training the deep residual error network. The whole process is carried out in an iteration mode, the data can be continuously crawled, filtered and the depth model can be trained, and therefore the whole-life learning of the system can be achieved. In addition, the invention also adds a voice control feedback module, so that the user can flexibly select a verification mode. Generally, the method and the device not only realize lifetime learning of the automatic object identification system, but also can utilize a crowd funding mode to carry out data marking, continuously expand the data set, effectively solve the problems of high consumption of manpower, material resources and financial resources of the manually marked data set and the like, and have very wide application prospect.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not intended to limit the present invention in any way, and all simple modifications, equivalent variations and modifications made to the above embodiments according to the technical spirit of the present invention are within the scope of the present invention.

Claims

1. The lifetime learning-oriented object automatic identification system is characterized by comprising the following steps of:

step 1: acquiring a data set;

step 2: identifying an object;

and step 3: a picture verification code;

and 4, step 4: and (5) voice recognition.

2. The lifetime-oriented learning object automatic recognition system according to claim 1, wherein: the step 1 of acquiring the data set is to collect data pictures by adopting a method of focusing a crawler, and the data can be acquired by the focusing crawler without intermission;

the method of focusing the crawler is as follows:

1) begin running at the initial URL;

2) acquiring a webpage;

3) capturing a new URL and putting the new URL into a URL queue;

4) evaluating the webpage and the URL according to an analysis algorithm;

3. The lifetime-oriented learning object automatic recognition system according to claim 1, wherein: the deep learning model adopted by the step 2 object identification is a deep residual error network, and Resnet-50 with a 50-layer network structure is used.

4. The lifetime-oriented learning object automatic recognition system according to claim 1, wherein: and 3, filtering the data crawled by the web crawler by using a picture verification code mode, selecting 6 pictures as the verification codes each time, wherein one part of the 6 pictures is selected from the pictures of the auxiliary training set, and the other part of the 6 pictures is the pictures of non-camera types, and the other part of the 6 pictures is the pictures sampled from the crawler data set.

5. The lifetime-oriented learning object automatic recognition system according to claim 1, wherein: and 4, voice recognition is to filter through voice recognition, a user rejects wrong pictures in the data set through voice recognition, and the data set is purified, wherein the voice recognition adopts a method of calling a hundred-degree voice recognition rest API.