CN108764459A

CN108764459A - Target identification network design method based on semantical definition

Info

Publication number: CN108764459A
Application number: CN201810465726.0A
Authority: CN
Inventors: 石光明; 谢雪梅; 高大化; 毛思颖; 马丽华
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2018-05-16
Filing date: 2018-05-16
Publication date: 2018-11-06
Anticipated expiration: 2038-05-16
Also published as: CN108764459B

Abstract

The present invention proposes a kind of target identification network design method based on semantical definition, mainly solves existing target identification network and big data is needed to drive, and high energy consumption is time-consuming, generalization ability and the weak problem of migration.Its implementation is：1. the Semantic unit for defining semantic hierarchical structure and being felt based on the mankind；2. separately designing the different types of base's semanteme neuron that can execute different function according to the Visual Neuron of mankind's primary visual cortex V1, it is each responsible for detection level line, vertical line, oblique line, straight line, camber line, triangle, quadrangle, polygon and color；3. the level of identification target is semantic as required, semantics recognition network is successively built；4. picture is identified with semantics recognition network.The present invention is inspired based on brain, and big data driving is not needed, and low energy consumption, and has very high generalization ability and migration, can be used for the identification to plurality of target.

Description

Target identification network design method based on semantical definition

Technical field

The invention belongs to artificial intelligence fields, relate generally to a kind of new target identification network, can be used for multi-class targets It is identified.

Background technology

The huge advance of current information technology pushes the basic theory of artificial intelligence, methods and techniques to develop in depth. With information technology, big data, deep learning, the huge advance of cranial nerve science pushes artificial intelligence technology again Leap.Artificial intelligence, which is just being showed, to be mixed based on information and knowledge processing, with human knowledge, can independently complete more awarenesses The new features such as work.

In recent years, more and more based on deep learning algorithm with the fast development of artificial intelligence field research work Neural network achieves marvelous results in computer vision, speech recognition, natural language processing etc., especially in computer The field of target recognition of vision, such as：The networks such as VGGNet, InceptionNet, ResNet, they are all to use smaller volume Product core and deeper network structure.Although these networks can efficiently extract the significant characteristics of image, in target identification Also all well and good achievement is achieved in task.But these target identification networks based on deep learning are all by a set of at present Common working mechanism's operation, and constraint rule can only be portrayed with mathematical function, according to the constraint rule, network could lead to It crosses propagated forward and reversed adjust weighs progress accurate calculation, the solution of error minimum, the fixed operating mode meeting of this set are found out with fitting Bring following several deficiencies：

1. the training of network drives by big data, a large amount of labeled data and cumbersome tune is needed to join process, and number It according to storage and calculates independently of each other, data transmission accounts for significant proportion, causes to store and calculating is complicated, energy consumption is very high.

2. there are the three difficult problems of " hardly possible design, hardly possible is expected, is difficult to resolve and releases " in network.

3. network lacks transfer learning and generalization ability, that is, single task role is can be only done rather than multitask, for same The sample adaptability of type objects in varied situations is poor, for example does can only just identifying face and being directed to just for recognition of face The face recognition effect in face is good, if face is partially a little bad with regard to None- identified or recognition effect slightly.

Invention content

It is above-mentioned to solve it is an object of the invention to propose a kind of target identification network design method based on semantical definition The deficiency of existing target identification network reduces storage, computation complexity and energy consumption, improves transfer learning and generalization ability, realizes Quickly and effectively target identification.

The present invention technological thought be：A completely new network is constructed by target semanteme knowledge definition, the mankind are obtained The target information obtained is directly injected into computer help computer identification target.

According to above-mentioned thought, the present invention is based on the target identification network design methods of semantical definition, it is characterised in that：Packet It includes：

1) definition semanteme and Semantic unit：

Semanteme be by multiple sub- semantic combinations, it is semantic again by sub- semantic combination per height, and so on, up to the bottom Semantic unit and the hierarchical structure that constitutes；

Semantic unit is the most basic sensory information of finger vision, the sense of hearing, smell, the sense of taste, tactile these types mankind；It is different Semantic unit between there are relevances；

2) all kinds of primitive filters are designed：

Different type 2a) is separately designed according to the neuron of mankind's primary visual cortex v1, and the base of different function can be executed First filter is filtered for the pretreatment image to input, detects different vision Semantic units, the vision semanteme base Member includes：Point, horizontal line, vertical line, oblique line, camber line, triangle, four side types, polygon, circle, circular arc, color；

2b) according to the matching degree of primitive filter and pretreatment image block so that the output of primitive filter is 0 or 1, Or some decimal between 0 to 1, and the pulse inputs of respective magnitudes is converted into next layer；

3) semantics recognition network is built：

3a) pretreatment image of input is filtered with different primitive filters, obtains the image containing Semantic unit, Corresponding Semantic unit is extracted from the image again, to form the bottom of semantics recognition network；

The different Semantic unit of bottom 3b) is subjected to longitudinal combination according to the target information of required identification, is abstracted into and is higher than The semanteme of bottom forms the semantics recognition network second layer, then from the second layer, is abstracted into the semanteme higher than the second layer, is formed Semantics recognition network third layer, and so on, it is further combined to the higher layer of semantics recognition network, until top layer forms mesh Poster justice；

It 3c) is added in bottom according to the relative tertiary location relationship between Semantic unit laterally attached and each in network Parallel link is carried out between layer, the wherein spatial relation between Semantic unit includes：Up and down, it controls, intersect, containing and is preceding Afterwards；

Each classification target semantic information carrier in network 3d) is defined as semantic neuron, it in hierarchical fashion will be every The semantic neuron parallel arranged of a kind of target forms the semantics recognition network that can identify multi-class targets；

4) identification process of semantics recognition network：

Pretreated image is input to semantics recognition network, is used in a networkIndicate i-th layer of t-th of semanteme Neuron, the then weighted type that the output of each semantic neuron is inputted by its last layer semanteme neuron are expressed as：

Wherein, i=1,2 ..., n, n are total numbers of plies of network, K (^i-1) be all semantic neurons in (i-1)-th layer collection It closes,For the weight coefficient corresponding to (i-1)-th layer of t-th of semantic neuron；

The first layer of network, pretreatment image of this layer semanteme neuron, that is, primitive filter to input are indicated as i=1 It is filtered and extracts Semantic unit and be input to the second layer, the relevant semantic neuron of the activation second layer, then extract the second layer Corresponding semantic information is input to third layer, activates the relevant semantic neuron of third layer, and so on, until successively activating institute There is relevant semantic neuron；

As i=n, the result of identification is exported on top layer accordingly semantic neuron.

Compared with the prior art, the present invention has the following advantages：

1. semantics recognition network proposed by the present invention is inspired based on brain, total is to copy the objective generation of human cognitive The Process Design on boundary, it is proposed that semantic hierarchical structure, and being identified using the priori of the mankind for many years allows base In the Semantic unit alternate data bit of sense organ, artificial intelligence truly is realized, reach the target that every people can identify Computer can identify.

2. the present invention does not need a large amount of training set and test set, completely by Knowledge driving, obtained using low volume data While good recognition effect, storage overhead, training energy consumption and cumbersome tune ginseng that traditional big data driving is brought are avoided Process.

3. the present invention compared with tional identification network, by data set training limited, can to semantic hierarchical structure into Row increase and decrease achievees the purpose that identify multiple target, realizes multitask classification, have good migration.

4. the present invention is due to the relevance between flexibly defining each semantic neuron so that the network has very strong general Change ability.

Description of the drawings

Fig. 1 is the implementation flow chart of the present invention；

Fig. 2 is the semantic hierarchical structure schematic diagram that the present invention defines；

Fig. 3 is the vision semanteme neuron schematic diagram that four classes that the present invention designs detect lines；

Fig. 4 is that the present invention is based on the semantics recognition network diagrams that semantic hierarchical structure is built；

Fig. 5 is the instance graph with semantics recognition Network Recognition handwritten word " 7 " proposed by the present invention.

Specific implementation mode

The present invention is described in detail with example below in conjunction with the accompanying drawings.

Referring to Fig.1, steps are as follows for realization of the invention：

Step 1, definition semanteme and Semantic unit.

1a) definition is semantic：

Traditional semanteme refer to transmission signal data in for the meaning that is more readily understood, and the semanteme that the present invention defines be by Q sub- semantic combinations, it is semantic again by p sub- semantic combination per height, and so on, up to the bottom Semantic unit and structure At hierarchical structure, wherein q >=1, p >=1.As commaterial is by molecular composition, molecule is made of atom and atom is by electricity Son and atomic nucleus combine as, in image data target object by its each component form and each component by junior unit group At, such as a little white rabbit can define in this way：Long long ear, red eyes, the mouth of three petaloids, white, it is fluffy Body, wherein ear, eyes, mouth can use more specifical concept, be defined such as outline, color, such as Fig. 2 institutes Show.

1b) define Semantic unit：

By describing fundamental sensation to recognize things be the first step that the mankind initially recognize, for example, the shape of things, color, Temperature, taste and sense of touch, and these be used to describe fundamental sensation vocabulary can't because of people stock of knowledge it is different and Difference, the mankind have vision, the sense of hearing, smell, the sense of taste, tactile this five kinds of fundamental sensations, the fundamental sensation that these mankind share It is the basis for constituting Semantic unit.Therefore, each Semantic unit that the present invention defines represents a kind of fundamental sensation, that is, is formed Vision Semantic unit, sense of hearing Semantic unit, smell Semantic unit, sense of taste Semantic unit and tactile Semantic unit, wherein：

Vision Semantic unit, including：Shape, contrast, saturation degree and color, the shape again include point, line, plane and Solid, the color include red, orange, yellow, green, blue, blue, purple, black and white again；

Sense of hearing Semantic unit, including：Tone, volume and tone color；

Smell Semantic unit, including：Perfume odor, rosin, fruit aroma, putrefactive odor, empyreumatic taste；

Sense of taste Semantic unit, including：Sweet tea, acid, hardship and salty；

Tactile Semantic unit, including：Temperature, humidity, pain, pressure and vibration；

1c) define between different classes of Semantic unit that there are relevances, such as one section of voice can be with vision or the sense of taste Or the Semantic unit in tactile or smell is interrelated, that is, organic combination is carried out to various feelings similar to human perception, It is significant entirety by a variety of attribute synthesis of things, for example, one people of identification can pass through the sound of the silhouette and he of this people It is associated to obtain a result.This pattern of being mutually related is the important foundation that semantics recognition network forms complicated semanteme so that net The information of network being capable of multimode.

Step 2, all kinds of underlying semantics neurons are designed.

The process that information is extracted by imitating human perception objective world can design the underlying semantics god of different sensory systems Corresponding Semantic unit information is extracted through member, underlying semantics neuron with different feelings can use different physical component or electronics Device realization, such as：Underlying semantics neuron based on tactile can be designed as temperature, humidity and pressure sensor.

In the present invention, it is mainly illustrated by taking the design of vision system underlying semantics neuron as an example.

Different type 2a) is separately designed according to the neuron of mankind's primary visual cortex v1, and the base of different function can be executed Plinth semanteme neuron, is denoted as primitive filter, is filtered for the pretreatment image to input, detects different vision languages Adopted primitive, the vision Semantic unit include：Point, horizontal line, vertical line, oblique line, camber line, triangle, four side types, polygon, circle, Circular arc and color, actually each primitive filter is exactly the spatial filter for having specific shape or attribute, for detecting And only detect some corresponding vision Semantic unit；

2b) according to the matching degree of primitive filter and pretreatment image block so that the output of primitive filter is 0 or 1, Or some decimal between 0 to 1：

When the target in primitive filter and image block exactly matches, i.e., primitive filter extracts Semantic unit and figure As information in the block is completely the same, then the output of primitive filter is 1；

When the target in primitive filter and image block mismatches completely, i.e., the Semantic unit of primitive filter extraction with Image information in the block is completely inconsistent, then the output of primitive filter is 0；

When the target Incomplete matching in primitive filter and image block, i.e., the Semantic unit of primitive filter extraction with Image INFORMATION OF INCOMPLETE in the block is consistent, then the output of primitive filter is the number between 0 to 1, such as：When image mesh in the block When there is graticule item the deflection of certain angle to be matched with the primitive filter for being responsible for detection vertical line specially, primitive filter The number between one 0 to 1 can be exported；

2c) by 2b) the obtained output of primitive filter is converted into the pulse inputs of respective magnitudes to next layer, then pulse Amplitude will characterize more advanced semanteme as the weight of each input picture block；

Above-mentioned 2a) to 2c) this process is equivalent to 2a) and described in the primitive filter of detection vision Semantic unit constitute One optic nerve tuple completes the vision Semantic unit in perceptual image and converts thereof into the process of nerve impulse.

Step 3, semantics recognition network is built.

Complicated semanteme can be formed by sub- semantic combination, and a combination thereof process is the process of high-grade intelligent, the identification of the mankind A kind of Knowledge based engineering composition identification is can be regarded as, i.e., is felt to carry out different combinations by different bases to complete, this Kind of recognition mode is built upon on the neural net base of people, for example, human vision signal from retina to foreign journals again to Visual cortex, including primary visual cortex V1 and line exodermis, the visual information in pathways for vision is a kind of transmission of rank character, herein In the process, more abstract concept is will produce in the information input of the innervation of a level to next layer, is inspired by this, this Hierarchical structure of the invention based on target semanteme, one semantics recognition different from conventional target identification network concept and structure of structure Network, construction step are as follows：

3a) two-value contour edge image after the pretreatment of input is filtered with different primitive filters, is contained The image of Semantic unit, then corresponding Semantic unit is extracted from the image, to form the bottom of semantics recognition network：

Target image pretreatment after M × M two-value contour edge images, 3a1) is further partitioned into the image of N × N at size The size of block, each block isWherein M >=1, N >=1；

3a2) separately design can detection level line, vertical line, oblique line, arc spatial filter, constitute four class detection lines The primitive filter of item represents it and is exported to the responsiveness of specific position as shown in figure 3, the color of primitive filter is deeper The amplitude of pulse is bigger；

Square formation 3a3) will be lined up in rows and columns respectively per a kind of primitive filter, being sized makes itself and 3a1) obtain Image block matches；

3a4) traversal whole image is filtered all matched image blocks, obtains the figure for containing only corresponding Semantic unit Picture；

It includes detection 3a5) to design more rich primitive filter：Triangle, quadrangle, polygon and circle, for right The space filtering of complicated image, filtered on relatively more complicated image by these primitive filter obtain it is corresponding more rich Semantic unit image.For example, the image containing bicycle is carried out with a variety of different primitive filters after a width is pre-processed Filtering, then being responsible for the circular primitive filter of detection will propose to contain only circular Semantic unit in the place of two wheels Circle in image, that is, input picture can and can only activate responsible circular primitive filter；It is responsible for the base of triangle First filter will propose the only Semantic unit image containing triangle in the position of bicycle beam, such, pass through primitive All Semantic units contained by target in input picture can be obtained after the filtering operation of filter；

The different Semantic unit of bottom 3b) is subjected to longitudinal combination according to the target information of required identification, is abstracted into and is higher than The semanteme of bottom forms the semantics recognition network second layer, then from the second layer, is abstracted into the semanteme higher than the second layer, is formed Semantics recognition network third layer, and so on, it is further combined to the higher layer of semantics recognition network, until top layer forms mesh Poster justice, as shown in Figure 4；

3c) in addition to 3b) other than longitudinal passage semantic in step, according to the relative tertiary location relationship sheet between Semantic unit Invention is laterally attached in bottom addition, i.e., it is general that the new semanteme of composition one can be connected between relevant underlying semantics neuron It reads, such as：The camber line of different angle can connect into the shapes such as a circle, semicircle, s types, ellipse, the wherein sky between Semantic unit Between position relationship include：Up and down, it controls, intersect, containing and is front and back；In addition, carrying out parallel link, class between each layer of network The throwback relationship being similar between ancestor node and descendant nodes, high-level semantics can also be with alternating floors or every several layers of rudimentary semantemes even It connects and forms new semanteme.For example, after high level obtains classification semantic concept " apple ", then the semanteme " green color " with low layer New semanteme " granny smith " can be formed；

Each classification target semantic information carrier in network 3d) is defined as semantic neuron, it in hierarchical fashion will be every The semantic neuron parallel arranged of a kind of target forms the semantics recognition network that can identify multi-class targets.

Step 4, the identification process of semantics recognition network.

Pretreated image is input to semantics recognition network and carries out target identification, uses m in a network_t ⁽ⁱ⁾Indicate i-th The semantic neuron of t-th of layer, then the weighting shape that the output of each semantic neuron is inputted by its last layer semanteme neuron Formula is expressed as：

Wherein, i=1,2 ..., n, n are total number of plies of network, K^(i-1)For the collection of all semantic neurons in (i-1)-th layer It closes,For the weight coefficient corresponding to (i-1)-th layer of t-th of semantic neuron；

It is different according to the value of i, successively activate semantic neuron：

As i=n, exported on top layer accordingly semantic neuron identifying as a result, for example, being counted in handwritten word data set It is divided into 3 × 3 after the image input network of word " 7 " totally 9 blocks, each block is connected to 3a2) the primitive filters of four kinds of detections lines Wave device is detected, i.e., corresponding four semantic output, these outputs activate the god of each level according to corresponding semantic path successively Pulse can be exported on the high-level semantics neuron that top layer represents " 7 " through member, i.e. high-level semantics can be expressed as these activation Neuron weighted sum, thus achieve the purpose that identification is digital " 7 ", as shown in Figure 5.

Above description is only example of the present invention, does not constitute any limitation of the invention, it is clear that for It, all may be without departing substantially from the principle of the invention, structure after having understood the content of present invention and principle for one of skill in the art In the case of, carry out various modifications in form and details and change, but these modifications and variations based on inventive concept Still within the claims of the present invention.

Claims

1. the target identification network design method based on semantical definition, which is characterized in that including：

1) definition semanteme and Semantic unit：

Semanteme be by multiple sub- semantic combinations, it is semantic again by sub- semantic combination per height, and so on, up to the language of the bottom Adopted primitive and the hierarchical structure constituted；

Semantic unit is the most basic sensory information of finger vision, the sense of hearing, smell, the sense of taste, tactile these types mankind；Different languages There are relevances between adopted primitive；

2) all kinds of primitive filters are designed：

Different type 2a) is separately designed according to the neuron of mankind's primary visual cortex v1, and the primitive filter of different function can be executed Wave device is filtered for the pretreatment image to input, detects different vision Semantic units, the vision Semantic unit packet It includes：Point, horizontal line, vertical line, oblique line, camber line, triangle, four side types, polygon, circle, circular arc and color；

2b) according to the matching degree of primitive filter and pretreatment image block so that the output of primitive filter is 0 or 1 or 0 Some decimal between to 1, and the pulse inputs of respective magnitudes is converted into next layer；

3) semantics recognition network is built：

3a) pretreatment image of input is filtered with different primitive filters, obtains the image containing Semantic unit, then from Corresponding Semantic unit is extracted in the image, to form the bottom of semantics recognition network；

The different Semantic unit of bottom 3b) is subjected to longitudinal combination according to the target information of required identification, is abstracted into higher than bottom Semanteme, form the semantics recognition network second layer, then from the second layer, be abstracted into the semanteme higher than the second layer, formed semantic Identify network third layer, and so on, it is further combined to the higher layer of semantics recognition network, until top layer forms target language Justice；

3c) according to the relative tertiary location relationship between Semantic unit bottom be added it is laterally attached, and each layer of network it Between carry out parallel link, the wherein spatial relation between Semantic unit includes：Up and down, it controls, intersect, containing and is front and back；

Each classification target semantic information carrier in network 3d) is defined as semantic neuron, it in hierarchical fashion will be per a kind of The semantic neuron parallel arranged of target forms the semantics recognition network that can identify multi-class targets；

4) identification process of semantics recognition network：

Pretreated image is input to semantics recognition network, is used in a networkIndicate i-th layer of t-th of semantic nerve Member, the then weighted type that the output of each semantic neuron is inputted by its last layer semanteme neuron are expressed as：

Wherein, i=1,2 ..., n, n are total number of plies of network, K^(i-1)For the set of all semantic neurons in (i-1)-th layer,For the weight coefficient corresponding to (i-1)-th layer of t-th of semantic neuron；

Indicate that the first layer of network, this layer semanteme neuron, that is, primitive filter carry out the pretreatment image of input as i=1 Filtering extracts Semantic unit and is input to the second layer, the relevant semantic neuron of the activation second layer, then to extract the second layer corresponding Semantic information be input to third layer, the relevant semantic neuron of activation third layer, and so on, until successively activating all phases The semantic neuron of pass；

2. according to the method described in claim 1, the basic sense of the vision, the sense of hearing, smell, the sense of taste, tactile wherein in step 1) Feel information, is described as follows respectively：

The basic visual information, including：Shape, contrast, saturation degree and color；

The basic auditory information, including：Tone, volume and tone color；

The basic olfactory information, including：Perfume odor, rosin taste, fruit aroma, putrefactive odor and empyreumatic taste；

The basic taste sensation information, including：Sweet tea, acid, hardship and salty；

The basic tactile data, including：Temperature, humidity, pain, pressure and vibration.

3. according to the method described in claim 2, the shape, including：Point, line, surface and stereochemical structure.

4. according to the method described in claim 2, the color, including：It is red, orange, yellow, green, blue, blue, purple, black and white.

5. according to the method described in claim 1, wherein step 3a) in different primitive filters to the pretreatment figure of input As being filtered, the image containing Semantic unit is obtained, is carried out as follows：

3a1) image that input size is M × M is divided into N × N's after pretreatment is at two-value contour edge image The size of image block, each block isWherein M >=1, N >=1；

3a2) separately design can detection level line, vertical line, oblique line, arc spatial filter, constitute four classes detection lines Primitive filter；

Square formation 3a3) will be lined up in rows and columns respectively per a kind of primitive filter, being sized makes itself and 3a1) image that obtains Block matches；

3a4) traversal whole image is filtered all matched image blocks, obtains the image for containing only corresponding Semantic unit.

6. according to the method described in claim 4, wherein step 3a2) in the primitive filter that designs, further include triangle, four Side shape, polygon and circle, for the space filtering to complicated image.

7. according to the method described in claim 1, wherein step 2b) according to of primitive filter and pretreated image block With degree so that some decimal of the output of primitive filter between 0 or 1 or 0 to 1 carries out according to the following rules：

When the target in primitive filter and image block exactly matches, i.e., primitive filter extracts Semantic unit and image block In information it is completely the same, then the output of primitive filter be 1；

When the target in primitive filter and image block mismatches completely, i.e., primitive filter extracts Semantic unit and image Information in the block is completely inconsistent, then the output of primitive filter is 0；

When the target Incomplete matching in primitive filter and image block, i.e., primitive filter extracts Semantic unit and image INFORMATION OF INCOMPLETE in the block is consistent, then the output of primitive filter is the number between 0 to 1.