WO2021217282A1 - 一种实现通用人工智能的方法 - Google Patents
一种实现通用人工智能的方法 Download PDFInfo
- Publication number
- WO2021217282A1 WO2021217282A1 PCT/CN2020/000108 CN2020000108W WO2021217282A1 WO 2021217282 A1 WO2021217282 A1 WO 2021217282A1 CN 2020000108 W CN2020000108 W CN 2020000108W WO 2021217282 A1 WO2021217282 A1 WO 2021217282A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- machine
- memory
- data
- information
- activation
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Definitions
- the application of the present invention relates to the field of artificial intelligence, in particular to the realization method of general artificial intelligence.
- the current popular deep convolutional neural network although some details are removed through filtering, thereby helping the machine to obtain a more reasonable selection of the middle layer features, it still requires a large amount of training data.
- the final judgment basis of the machine may be based on some details that humans will not notice, so the trained model may be easily deceived.
- the current knowledge graph project helps the machine to connect different things when searching by extracting the associations between texts or concepts from big data.
- these relationships lack quantification, and there is no way to help the machine use these relationships to self-learn, self-summarize, and apply the learned knowledge in daily life to achieve its own goals. These methods are very different from human learning methods and cannot produce general intelligence similar to humans.
- the application of the present invention believes that the intelligence of the machine should be based on information theory, not on data processing methods, which serve the information theory. Therefore, the learning method proposed in the present application is to imitate the human learning process.
- the machine Through memory arrangement, memory and reality reorganization, and the reorganization of the information after the reorganization, driven by the motivation of the machine, the machine gradually obtains from simple to complex input to output. In response to this, it exhibits general intelligence similar to that of humans. All these show that there is a huge difference between the machine learning method proposed in the present application and the existing machine learning methods in the industry. At present, there is no learning method similar to the present application in the industry.
- Voice and text are the products of centuries, and information outside of language is our natural learning tool.
- learning through images is one of the products brought to us by evolution, learning through images also has natural disadvantages. One, the amount of data is too large. Second, too much detail leads to poor generalization. Third, it is not connected with other sensor input information, such as voice, text, touch, smell, etc. Fourth, many concepts are not represented by images, such as abstract concepts such as love, fear, and morality.
- this paper proposes a learning method based on the feature map of the extracted image. Similar to images, we also extract features from other sensors and treat these features the same as image feature maps.
- the process of processing information is to translate the input information into a feature map sequence that the machine can understand, and then use the relational network and memory library to process these feature map sequences, and then translate the processed feature map sequence into the desired sequence.
- the output form such as voice, text or action output.
- Low-level features refers to the features that are commonly found between things, which are obtained by finding local similarities between machines.
- these underlying geometric features mainly include local edges, local curvatures, textures, hue, ridges, vertices, angles, curvatures, parallelism, intersection, size, dynamic patterns and other local features that are commonly found in graphics.
- speech it is the syllable features that are ubiquitous in speech. Similar processing is also done for other sensor inputs.
- the underlying features are established by the machine autonomously through local similarity. In the process of using these underlying features, the machine can use relationship extraction mechanisms (such as memory and forgetting mechanisms), or they can be increased or decreased through human intervention.
- Feature map Based on the ability to extract the underlying features, we use the relationship extraction mechanism to extract the common underlying feature combinations from multiple similar things, similar scenes, and similar processes. These common feature combinations are feature maps.
- the feature maps can be image bottom feature maps, language bottom feature maps, and other sensor bottom feature maps, and can be static or dynamic. For example, after the machine retains the combination of the bottom-level features extracted each time, it uses the memory and forgetting mechanism to increase or decrease: those bottom-level features that reappear in each extraction process increase the memory value, and those that cannot be repeated are gradually forgotten. So that the multiple sketches formed by the combination of the bottom-layer features extracted each time only retain the combination of the bottom-layer features in common.
- the local network composed of multiple feature maps is the concept.
- the concept contains multiple feature maps and the relationships between these feature maps.
- the feature maps included in the concept are not necessarily similar. They may be different feature maps, connected through memory and forgetting mechanisms.
- Connection value In the application of the present invention, a connection can be established between two feature maps in the cognitive network. These connections have directions and sizes. For example, the connection value between the feature map A and the associated feature map B is Tab. Similarly, the connection value between the feature map B and the associated feature map A is Tba. Tab and Tba are real numbers, and their values can be the same or different.
- Relationship extraction mechanism The mechanism that can extract the common feature combination among multiple similar images, similar scenes and similar processes is the relationship extraction mechanism.
- Relation extraction mechanisms include, but are not limited to, various forms of multi-layer neural networks, rule-based, logical analysis, supervised or semi-supervised learning methods that are currently available in the field, and also include the memory and forgetting mechanisms proposed in the application of the present invention.
- Memory function refers to the increase of some data with the increase of the number of repetitions.
- the specific increase method can be represented by a function, and this function is the memory function. It should be pointed out that different memory functions can be adopted for different types of data.
- Forgetting function It means that some data decreases with the increase of time and training time.
- the specific reduction method can be represented by a function, and this function is the forgetting function. It should be pointed out that different forgetting functions can be adopted for different types of data.
- Memory and forgetting mechanism In the application of the present invention, the use of memory function and forgetting function for data is the memory and forgetting mechanism.
- the memory and forgetting mechanism is a relationship extraction mechanism widely used in the present application.
- Cognitive network is a network formed by different concepts through shared feature maps. It is a bidirectional multi-center star network. The essence of the cognitive network is the network formed by the machine to organize all the memories of the past.
- the cognitive network can be a separate network form in the application of the present invention. It can also be a relationship implicit in the entire memory bank.
- bidirectional connection value because the connection level relationship between the feature maps is not equal. So we need to use the two-way connection value to express. Another reason for adopting two-way connection value is: in chain activation, when one node transfers the activation value to another node and activates it, in order to avoid repeated activation between the two nodes, we use the same chain In the activation process of the formula, after the transfer from A to B, the reverse transfer from B to A is prohibited.
- the machine searches the cognitive network and memory bank, finds the corresponding underlying features, and assigns activation values according to motivation.
- a certain node (i) is given a certain activation value (real number). If this value is greater than its preset activation threshold Va(i), then node (i) will be activated. It will pass the activation value to other feature graph nodes that are connected to it.
- the transfer coefficient is a function of the connection value in the cognitive network, and is a function of the memory value at both ends of the transfer line in the memory bank.
- a node receives the passed activation value and accumulates its own initial activation value, and the total activation value is greater than the preset activation threshold of its own node, then it is also activated, and it will also send it to other nodes connected to it.
- the feature map transfers the activation value. This activation process is passed on in a chain until no new activation occurs, and the entire activation value transfer process stops. This process is called a chain activation process.
- Chain activation is a search method, which is a method to find the feature map that is most relevant to some underlying feature combination. It is also a way to find the concepts most relevant to certain feature maps. It is also a method used to find one or more memories (experiences) that are most relevant to certain concepts. It is also a way to find the concepts most relevant to certain motivations. Therefore, the chain activation method is essentially a search or search method, which can be replaced by other search or search methods that can achieve similar functions.
- connection value in the cognitive network is a real number between 0 and 1.
- 0 means that there is no tie-level relationship.
- 1 represents a peer-to-peer connection relationship.
- the connection value between the name of the object and the feature map is usually 1.
- These connection values are the ability of each to represent the central feature map, and there is no restriction between them. For example, there is no restriction that the sum of connection values around a conceptual node must be 1.
- the connection value here adopts a real number between 0 and 1, the purpose is to avoid the phenomenon of non-convergence in the chain activation process during the chain activation process. This is because in our embodiment, we use the simplest multiplication as the transfer function.
- connection value can adopt other interval ranges, but the overall selection constraint is: the activation value passed out needs to be less than the activation value of the node that initiates the activation. Only in this way can it be ensured that the chain activation process can eventually stop.
- Prominence After searching for the underlying features of the input in the cognitive network or memory bank, if one or more feature maps are marked one or more times, they will be "highlighted” in the cognitive network or memory bank.
- the machine uses these feature maps as possible recognition results. And use them to combine and segment the input features, to compare the overall similarity between the input feature combination and the searched feature map, as a standard to further judge the similarity. For example, when chain activation is used as a search method, if the activation value of some feature maps is higher than the noise floor of the entire cognitive network by a preset threshold, then we consider these feature maps to be "highlighted".
- the activation value noise floor of the cognitive network can be calculated in different ways.
- the machine can use the activation value of a large number of background feature map nodes in the scene as the activation value noise floor.
- the machine can also use the average value of the activation values of the currently activated nodes as the noise floor.
- the machine can also use its own preset number as the activation value noise floor.
- Mirror space After the machine enters an environment, it can identify specific things, scenes and processes by extracting the underlying features of images, language and other sensor inputs. The features of similar things, scene features, and process features found in memory are overlapped with similar parts in reality, so the machine can infer the temporarily invisible parts of things, scenes, and processes. Including the occluded part of things, including the occluded part of the scene, including the front and back parts of a process that are not seen by the machine. Since the size of an object is one of the contents of the feature map, the machine also uses the size of the specific object in the field of view to compare with the normal size of the object in the feature map to assist the machine in establishing the depth of field in the environment. This is also the process of helping to understand information through memory.
- the machine determines the positional relationship between itself and the environment by overlapping the similar parts of its own perspective of the real environment and the third-party perspective of the memory environment to establish an overlapping space. Therefore, the machine has both a first-person perspective and a third-person perspective on its position in the environment. This is why such overlapping spaces are called mirror spaces.
- the machine After the machine recognizes the feature map of the output information, it calls the memory through the feature map to establish a mirror space. The machine then reorganizes the memory and input information through segmented imitation to form a new information sequence to understand the input information and build the output response. This is also the process of generating new memories. The process of storing new memories by the machine is also the process of storing mirror space. The stored content is not the recording of input information, but the storage of the extracted underlying features and their updated memory values.
- Memory frame In the mirror space, every time an event occurs, the machine will take a snapshot of the mirror space and save it.
- the saved content includes the underlying features in the mirror space and their memory value, which is the memory frame.
- An event in the mirror space means that the combination of the underlying features in the mirror space is compared with the previous mirror space, and the similarity changes that exceed the preset value, or the memory value of the underlying feature in the mirror space exceeds the preset value Change.
- Memory storage refers to the storage of the entire mirror space by the machine, including all the extracted underlying features and their combination relationships (including relative position relationships), as well as the memory values possessed by these underlying features.
- Memory bank The database formed by memory storage is the memory bank.
- Temporary memory bank The memory bank can be a combination of multiple subordinate memory banks. These subordinate memory banks can adopt different memory and forgetting curves.
- the temporary memory bank can be one of the subordinate memory banks, and its purpose is to buffer the memory storage and screen the materials that need to enter the long-term memory.
- the temporary memory bank usually uses fast memory and fast forgetting methods to screen the materials to be put into the long-term memory bank.
- Relational network refers to the network formed by the relationship between the feature maps existing in memory. It is the product after the machine extracts the similarity, time relationship and space relationship of the input information, and optimizes it through the memory and forgetting mechanism. Its manifestation can be a cognitive network with connection value, or a memory network with memory value, or a mixture of the two.
- Focus The focus is that the machine finds one or more feature maps that are most relevant to the input information in the relationship network through the input information. For example, when the chain activation search method is used, the activation value is the highest, and one or more feature maps can be highlighted.
- Target focus The machine selects the feature map to organize the output according to its own motivation as the target focus.
- Segmented imitation The essence of segmented imitation is a process of reorganization using memory and input information, and it is a process of creation. It uses some fragments and parts in the memory, and the input information is organized into one or more reasonable processes.
- the content that can exist for a long time in the memory is usually the content that is frequently used, such as frequently used common words, common actions, or common expression organization methods. These frequently used combinations are equivalent to the process framework of things, scenes, and processes. They are formed by the survival of the fittest through memory and forgetting mechanisms.
- the machine borrows these process frameworks and adds its own details to form a variety of new processes. The machine uses stepwise segmentation to imitate this new process to understand the input information and organize the output response.
- the entire intelligence system is divided into three major levels; the first level is the perception level, which uses similarity as the standard to establish feature maps and simplify the input information; the second level is the cognitive level, which recognizes those energy Recurring, shared parts and shared relationships in similar things, scenes and processes. It is the process of establishing time and space relationships. It forms a relationship network with similarity; the third level is the application layer, which uses the relationship network As a dictionary, it does the translation between feature maps; it uses the relational network as a grammar to translate input/output information from one form to another; it uses the relational network to reorganize memory and actual information to understand Input information to organize the output response; also use the relationship network and memory to weigh the pros and cons among a variety of possible output responses, and make choices. It is also the process of realizing the mechanism of memory and forgetting.
- the instinctive motivation of the machine is processed as a continuous input of information; in the processing of information by the machine, the instinctive motivation of the machine is the default input information; the instinctive motivation of the machine is a preset motivation.
- the machine's evaluation result of the profit and loss is taken as a default output, and the profit symbol and the loss symbol are used to represent the profit and loss respectively, and they are stored in the memory. In each memory, the memory value obtained by the specific gain and loss symbols is positively correlated with the gain value and loss value they obtain.
- Fig. 1 is a main step for implementing general artificial intelligence proposed in the application of the present invention. These steps are the first aspect of the application of the present invention. Here, the steps in Fig. 1 are described in further detail:
- Step S1 Establish a feature library and establish an extraction model.
- the machine builds a low-level feature library by looking for local similarities, and builds an algorithm model for extracting these low-level feature maps. This is the preliminary preparation process for data processing.
- Step S2 Extract the bottom layer features.
- the machine performs low-level feature extraction on the input information of all sensors, and adjusts the position, angle and size of the bottom-level features according to the position, angle and size of the highest similarity between the bottom-level features and the original data, and places them overlapped with the original data.
- the relative position of these underlying features in time and space can be retained, and the mirror space can be established; this step is a simplified process for the input information.
- Step S3 Identify the input information.
- the machine looks for points of interest. This process is to identify the input information, remove the ambiguity, and do the process of feature map translation. It is similar to using context to identify the information vocabulary emitted by the information source in the process of language translation, and translate the recognized vocabulary into another language vocabulary.
- Step S4 Understand the input information.
- the machine organizes the concerns into one or more understandable sequences. This process is similar to that in language translation, where the vocabulary of the target language is reorganized into an understandable language structure using grammar. The specific method used in this step is segmented imitation.
- Step S5 Select response.
- the machine adds its own motivation to the translated input information to find the target focus.
- the machine uses the relational network and memory to establish the response to the input information; and uses the profit and loss evaluation system to evaluate the response; until it finds a response that can pass the evaluation system. This is based on the principle of seeking advantages and avoiding disadvantages, the machine makes various output presets, and evaluates the gains and losses.
- Step S6 Convert the response into an output format.
- the machine converts the selected sequence into output form through segmented imitation.
- Step S7 Update the database.
- the machine updates the feature map, concept, relationship network and memory according to the use of data in steps S1, S2, S3, S4, S5, and S6 according to the memory and forgetting mechanism.
- S1 and S2 are simplifications of information. Its essence is: "something that is similar in some aspects may be similar in other aspects". This is the basic assumption of similarity relationship.
- Our brain uses similarity to classify things, which is an innate ability of human beings.
- the function of classification is the generalization of experience. For example, if something is edible, something that looks and smells similar to it may also be edible. Without this ability, it is impossible to produce intelligence. Therefore, in steps S1 and S2, we establish the underlying features by looking for local similarities in similar things to compare the similarities between things.
- the underlying feature map is to extract the similarities in the input information of images, sounds, and other sensors to establish classifications, and use these classifications to represent different types of information. They have nothing to do with language, their purpose is to simplify the simplified part of the input information, and do pre-processing for subsequent information processing.
- mirror space to store information.
- Mirroring means that we save mirrored data to the outside world, using bottom-level features instead of the original data, and placing the bottom-level features in the most similar position to the original data, which preserves the similarity relationship.
- the mirror space also stores some information about the machine itself, such as motivation, such as the calculation results of gains and losses.
- motivation such as the calculation results of gains and losses.
- a concept is a local network that connects images, voice, text, or any other form of expression of similar information. Because these forms of expression frequently appear together and frequently switch to each other, the connection between them is closer. There are also some recurring combinations in the relationship network. The connection between them is not as close as the concept, but we can use them by imitating this combination. We call them process frameworks.
- low-level features include: sensor input of all external information, including but not limited to video, audio, touch, smell, temperature, etc.; also include all internal information, including instinctual motivation status, profit loss assessment results, gravity sensing and posture sensing information, etc. .
- the different states of instinctive motivation can be represented by emotions.
- instinctive motivation is a kind of low-level feature that gives the input information an initial activation value.
- Instinctive motivation is a preset motivation, but its parameters are adjusted by the results of the evaluation of gains and losses.
- each space carries its own emotions, as well as its own evaluation results of gains and losses.
- the machine can naturally use the weighted summation method to estimate the reorganization of the mirror space to bring our emotional response and the evaluation results of the gains and losses brought to us.
- the various components used in the reorganization are connected to many of their original memories. These memories will be activated in chains due to the activation of the components. This is association.
- the machine calls the mirror space, it processes the memory information in a similar way to the sensor information. Therefore, it is also possible to use the parallax and the relative size of things to establish the depth of field to build a three-dimensional image sequence of these data.
- the machine views these memories from a third-person perspective, so it can bring itself or others into the role in the virtual mirror space created by itself.
- the method to bring in is: 1. Deal with the situation you face in the virtual space by yourself. 2. Deal with the situation faced by others in the virtual space by yourself.
- the processing method is to use these conditions as a kind of hypothetical input information to follow the process of processing similar data input by sensors.
- the specific storage method of the data in the mirror space is based on the combination of underlying features that best match the original data, and the data is stored every time an event occurs. It can be approximated that the underlying feature is 2-dimensional data compression, and the event storage mechanism is a compression of data in time. They can also be replaced or partially replaced by other data compression methods. But no matter which method, the similarity, time and space relationship of things must be preserved.
- we also store the machine's internal information such as the machine's instinctive motivation state at the corresponding time, the evaluation result of the machine's profit loss, the machine's gravity sensing and the attitude sensing.
- the information stored in the mirror space including external information and internal information, has its own memory value, and they also comply with the memory and forgetting mechanism.
- a large amount of this mirrored space stored in actual order is memory.
- the machine records in an event-driven manner, that is, only when an "event" occurs on the mirror space, the machine needs to record the mirror space again.
- the occurrence of an event in the mirror space means that the combination of the underlying features in the mirror space is compared with the previous mirror space, and the similarity changes that exceed the preset value, or the memory value of the underlying feature in the mirror space exceeds the preset value. Change in value.
- the machine uses binocular parallax, the relative size of the feature map, and the size of the area of interest to reconstruct a three-dimensional image of a suitable size.
- the purpose of the S3 step is to find the focus.
- There are many ways to find the focus For example, searching for the underlying feature map in the memory through similarity comparison, and marking each time one is found. When the mark contained in a certain underlying feature combination in the memory reaches the preset threshold, it is considered that it may be a candidate for the corresponding feature map.
- the machine refers to this feature map as a whole to segment the input underlying features, and further compares the similarity of the feature combination between the two. When this process continues, all feature map candidates can be found. Then, according to the closeness of the connection between these feature map candidates, in the case of multiple candidates corresponding to one input, the feature map with the closest connection to other information is selected as the most likely feature map, which is the focus of attention.
- This process can either determine the focus point based on the mark and connection relationship after all the underlying features are processed, or prioritize recognition when any feature map reaches the preset standard.
- the chain activation method is a method for searching feature maps, concepts and related memories based on the relational network proposed in the application of the present invention.
- the feature map i when the feature map i is given an initial activation value, if this value is greater than its preset activation threshold Va(i), then the feature map i will be activated, and it will pass the activation value to the connection relationship with it Other feature map nodes; if a feature map receives the passed activation value and accumulates its own initial activation value, and the total activation value is greater than the preset activation threshold of its own node, then it will be activated, too.
- the activation value is transferred to other feature maps that have a connection relationship with itself.
- This activation process is passed on in a chain until no new activation occurs, and the entire activation value transfer process stops. This process is called a chain activation process; in a single chain During the activation process, but after the activation value transfer occurs from feature map i to feature map j, the reverse transfer from feature map j to feature map i is prohibited.
- the machine assigns an initial activation value to the input low-level feature map according to its own motivation by giving the extracted low-level features.
- these initial activation values can be the same, which can simplify the initial value assignment system.
- the machine selects the feature maps with the highest activation and can be highlighted, and regards them as the focus. This method makes full use of the relationships in the relationship network and is an efficient search method.
- the strength of the relationship in the relationship network is related to the latest memory value (or connection value). Therefore, the machine will be preconceived. For example, if two machines with the same relationship network face the same feature map and the same initial activation value, one of the machines suddenly processed an input information about this feature map, then this machine is processing this additional piece of information Later, it will update the relevant part of the relationship network.
- One of the relationship lines may increase according to the memory curve. This increased memory value will not fade in a short time. Therefore, when facing the same feature map and the same initial activation value, the machine that processes the additional information will spread more activation values along the newly enhanced relationship line, which will lead to a preconceived phenomenon.
- the activation value in the chain activation will change over time. Decreasing. Because if the activation value in the relational network does not fade with time, the activation value changes brought about by the subsequent information will not be obvious enough, which will cause interference between information. If the activation value does not fade, after the subsequent information is input, it will be strongly interfered by the previous information, which will lead to the inability to find one's focus correctly. But if we completely clear the memory value of the previous information, then we will lose the possible connection relationship between the two pieces of information before and after.
- the thinking time given to the machine is limited, or there is too much information, and the machine needs to complete the information response as soon as possible.
- the machine can also adopt the method of output and then input. In this way, the machine emphasizes useful information and suppresses interference information.
- These methods are commonly used by humans, and in the application of the present invention, we also introduce them into the thinking of machines.
- the machine can determine whether the current thinking time exceeds the normal time based on the built-in program, or its own experience, or a mixture of the two, and it needs to refresh the attention information, or tell others that they are thinking, or emphasize the key points, and eliminate interference information.
- one method is: there is no limit to the strength of the connection value sent by the same feature map, but in the activation process, in order to correctly handle the feature map and
- the relationship between its attributes and the activation value transfer function of the feature map can be considered normalized transfer.
- the activation value of the feature map X is A
- the sum of the connection values of all its emitting directions is H
- its transfer value to the feature map Y is Txy
- Yxy is the activation value transferred from the X feature map to the Y feature map.
- speech and text are usually connected with all the attributes of the concept.
- the attributes of a concept are all feature maps of the concept. These feature maps may contain many similar images in memory, and various sounds, smells, touches, etc. connected to a class of images. These feature maps obtain activation values from each branch of the relationship network, and they are all transmitted to voice or text, so the usual focus is on the conceptual voice and text. Therefore, the virtual output of the machine's self-information filtering or emphasizing method is usually speech, because this is the most common output method. The machine outputs them the least energy. Of course, this is closely related to a person's growth process. For example, people who learn about life from books may convert information into words and then re-enter it.
- step S4 the machine needs to transform the focus point into an image feature map.
- This transformation process is concept translation.
- the concept is a network of local relationships that are closely connected to each other. In this network, there may be voice, text, and other forms of information that represent a concept. For humans, in addition to language, other information retains its original form, such as images, feelings, and emotions.
- the main thing that machines need to translate is language. Therefore, the machine uses the feature map that is most closely related to the corresponding language instead of the language, and the language can be translated into the corresponding feature map. For example, converting the pronunciation of "happiness" into the concept of "happiness” can represent a typical memory of happiness.
- step S4 is to adjust the image feature maps (including static feature maps, scene feature maps, and process feature maps) representing the input information in an appropriate order, and form a reasonable sequence by adding or subtracting some content.
- the basis for adjustment is to imitate the combination of these information in memory.
- This process is like a warehouse manager, finding the corresponding parts from the input drawings, and then imitating the previous products (that is, multiple memories), and combining these corresponding parts. Then come to understand the purpose of this drawing.
- first find the required parts according to this drawing this is the concept translation). Then look at how these parts were put together in the past (this is looking for related memories).
- the machine may find that in this pile of parts, there are some combinations of parts frequently appearing in various previous products (that is, in memory, those retained through memory and forgetting mechanisms, and the combination of common feature maps in similar things). Therefore, the machine preferentially selects those large parts that contain the most input information, and then combines other parts with reference to the maximum probability. Some parts may be combined into another large part. Some parts may be large parts attached to them. These combination methods are all combined according to the strongest connection between parts, between parts, and large parts by referring to the relationship network, and finally form a product (similar to a virtual process in memory).
- the machine faces this virtual process created by itself, and the machine takes this virtual process as a kind of information input, and uses the relational network to search for memories related to this virtual process. By incorporating these memories into the selection range of the target response, the machine can select a response that suits its own motivation through the evaluation of benefits and losses.
- the processes related to the virtual process include: I was facing a similar process before, what my state was, and what my response was. In the past, I sent out a similar process, what is the status of others, and what is the response of the person. These can be found through memory, and these memories can be incorporated into the scope of the target response organization. Specifically: 1. By recalling the state of the previous information source when similar information was sent, understand the hidden information of the information source outside the information.
- the combination of components is an important context in the relationship network.
- they are the common language, common vocabulary and common sentence pattern of the language.
- they are the key steps in the process of an action. For example, key steps such as buying a ticket, going to the airport, security check and boarding. These steps are formed by the mechanism of memory and forgetting, in learning from time to time, forgetting details and remembering common features.
- the time and space information contained in these key steps is a process framework that can be imitated when the machine builds a response.
- humans have added language symbols to many process frameworks. Therefore, these process frameworks, on the surface, are organized in language.
- the underlying organizational relationship is still an image feature map.
- the machine needs to expand the process framework represented by these languages (some process frameworks may not have a concept to represent, but they can be represented by multiple concepts, which are information that needs one sentence or a paragraph to express), and expand them into corresponding process characteristics ( It is the feature map of the key steps in this process) to imitate.
- the feature map developed by the concept of "going to the airport” may be the process of driving to the airport, or driving to the airport, etc., through the memory and forgetting mechanism, forgetting the specific details, and only retaining a few symbolic pictorial features picture.
- These symbolic pictorial feature maps will inspire related memories. Let us further expand this concept, such as imitating the memory of the past, starting online car-hailing, starting to prepare luggage and so on.
- the machine evaluates the gains and losses brought about by the tower-shaped imitating structure, and decides whether to choose this structure to imitate and give a response.
- the initial stage is the related memory pool obtained by calling memory from the four aspects mentioned above. Then as the concept unfolds, the content of this memory pool continues to increase and is concerned. The content of is constantly changing. And those memories that have entered the memory pool will also be considered to have been used once, and the memory value will be increased according to the memory curve. So these parts, because they are key steps in various processes, are often called. And conversely, these parts, because of their high memory value, are not easy to be forgotten, but are easy to find. Therefore, this is a positive feedback reinforcement process. This process is the segmented imitation process proposed in the present application.
- language output is a process of segmented imitation.
- the machine imitates the previous language experience to make a language response, due to the difference of the specific scenes, the machine can only learn part of the experience (parts) from the previous language experience. And these language experiences that can be frequently imitated are common sentence patterns, common words and idioms. Because they are common parts that exist in a large number of languages, such as conjunctions, auxiliary words, interjections, common vocabulary, and common sentence patterns in the language, they are objects that can be imitated in many situations. These objects are used again and again, and the activation value is increased according to the memory curve, and finally becomes the process framework. When responding, the machine imitates these frames, and then expands the memory to install details on these frames, which constitutes the language output.
- step S5 If the machine is in the subsequent S5 step, it cannot establish a reasonable response. It may be that the wrong information was organized in step S4, and it may be that an error occurred in any of the previous steps. At this time, the machine enters the process of "unintelligible information". In other words, "inability to understand information" itself is a result of an understanding of information. Based on its own experience, the machine establishes a response to "unintelligible information". These responses may be ignored, the underlying features may be extracted again, the feature maps may be identified and the focus established again, or the responses may be reselected.
- step S5 the machine needs to add its own motivation based on its understanding of the information, and select a satisfactory response from various possible responses in accordance with the principle of seeking advantages and avoiding disadvantages.
- This step is the most complicated step in machine thinking. Most of the thinking time of the machine is spent in this step.
- the machine Based on the understanding of the input information: the purpose and state of the information source, its own purpose and state, the state of the environment, and the initial memory pool established by searching for memory from 4 aspects, the machine begins to create various possible responses, and then selects from them Give a reasonable response to external output.
- the method of selection is based on the instinctive motivation preset by the machine and the profit and loss evaluation of the machine's various responses, and the response is selected in the way of seeking advantages and avoiding disadvantages.
- Human motivation in essence, is to maintain a good state of existence.
- the machine stores the two symbols representing the gain and loss in the memory where they are assigned, and assigns the gain and loss values according to the positive correlation as their memory values. Since things in the mirror space have a relationship with each other, this relationship is related to their mutual memory value. The connection between the benefits of the same memory and the specific feature map that appear again and again is continuously strengthened through the memory and forgetting mechanism. The loss is handled similarly. Obviously, the memory value obtained by gain is proportional to the value of gain, and the memory value obtained by loss is proportional to the value of loss. Those huge gains and terrible losses will make the machine inherent for life, and those small gains and losses will change over time. Forgotten. Things that often bring gains are more closely connected with gains, and losses are the same.
- the machine When the machine evaluates its own response, it uses the virtual response as an input to enter the relationship network, and naturally obtains the profit value on the profit symbol and the loss value on the loss symbol. Then evaluate.
- This evaluation procedure can be preset or adjusted based on feedback received during the learning process. Therefore, the machine can sacrifice small gains and seek greater follow-up gains; it can also choose small losses to avoid larger losses. This also provides a way for humans to ensure that machines think according to their own wishes. For example, complying with the "machine convention" is a goal that brings benefits, helping the owner is a goal that brings benefits, and violating the law is a goal that brings losses.
- the machine also records the parameter settings of the instinctive motivation when assigning values to the input information in the corresponding memory.
- Instinctive motivation to assign a value to the machine represents a kind of emotion. Such as alertness, alertness, trust level, etc. It is regulated by two aspects. One is the safety state parameters of the machine itself, which is an innate emotion. Innate emotions are presets. The second is environmental factors, including the response to gains and losses, and the emotions brought about by the environment, which are acquired through acquired learning. The machine constantly adjusts the instinctive motivation evaluation system to try to expand the gains and avoid losses, and gradually connect the satisfactory evaluation parameters with external stimuli. Because they all exist in the same memory, it can be achieved by using the memory and forgetting mechanism. The instinctive motivational state of the machine can be revealed in a way to provide an additional means of communication. This is the expression.
- the machine When the machine is ready to respond, the machine first searches for memory from the four improved areas above, and builds a relevant memory pool. The machine establishes various possible responses through segmented imitation, and evaluates the gains and losses that these responses may bring. The machine evaluates the gains and losses, and only needs to make a virtual input of the response it has established. After inputting, by assigning the initial activation value of this information, after the activation is completed, the gain and loss value are naturally obtained. The machine makes a decision based on the value of these gains and losses. Of course, gains and losses may still be in a transitional state. When it is difficult to make a choice, the machine needs to add more memory to the input information to break this equilibrium state and make decisions. This process can be carried out iteratively.
- Step S6 is a translation process. If in step S5, the machine uses voice output, it is relatively simple. It only needs to convert the image feature map to be output into voice, and then use relational network and memory to adjust their order by imitating similar language memory. This is the process of organizing vocabulary into sentences by referring to grammar books (relationship network). Then the machine invokes the pronunciation experience and the experience of expressing emotions about each word, and sends out the information. Metaphorically speaking, this is equivalent to the warehouse manager making a shell for the assembled product according to customer needs, and then sending it out directly by air.
- voice output it is relatively simple. It only needs to convert the image feature map to be output into voice, and then use relational network and memory to adjust their order by imitating similar language memory. This is the process of organizing vocabulary into sentences by referring to grammar books (relationship network). Then the machine invokes the pronunciation experience and the experience of expressing emotions about each word, and sends out the information. Metaphorically speaking, this is equivalent to the warehouse manager making a shell for the assembled product
- the machine needs to target the image feature map sequence to be output (this is the intermediate target and the final target), and different time and space are involved according to these targets.
- the machine needs to divide them in time and space in order to coordinate their execution efficiency.
- the method adopted is to group by selecting the objects closely related in time and the objects closely related in space. Because the mirror space in memory contains time and space information, the classification method can be used in this step. (This step is equivalent to rewriting from the main script to the sub-script).
- the machine needs to combine the intermediate targets in each link again with the actual situation, and use the method of segmented imitation to form multiple possible image sequences, and then use the gain and loss system again to select the sequence that suits its own . Then the machine takes this selected sequence as a new output.
- This new output is a subdivision realization link under the original large output framework, but only a small link in the entire output. (This is the realization process of sub-script, or use the same process. Because sub-script also requires the organization of an event, but the goal is an intermediate goal).
- the S7 step is the process of establishing a new memory space and updating the relationship network throughout all steps. It is not a separate step, it is the process of maintaining the memory system in each step. Its core is the memory and forgetting mechanism.
- the second aspect disclosed in the application of the present invention includes:
- a feature map establishment process is proposed in the application of the present invention, which includes:
- the machine establishes the bottom-level features by comparing the local similarities, and the bottom-level features are also a feature map.
- step S3 if the machine finds that some features cannot find a matching feature map in the relational network. The machine treats the combination of these features as a diagram, stores it in temporary memory, and assigns it a memory value that is positively related to the activation value.
- the feature maps established by the above two methods are not the shared features of similar things or processes. After learning a large number of similar things or processes, with the help of the relationship extraction mechanism, those shared features will eventually become long-term memory. save.
- a feature map recognition process which includes:
- the machine finds the relevant feature map in the relational network by searching for the underlying features, and then marks the relevant feature map. Those feature maps that have been labeled multiple times may be candidates.
- the machine uses the candidates in the relational network to segment the input underlying features and compares the total similarity between the two. If the similarity reaches the preset standard, the machine considers that the feature map is recognized.
- Another feature map recognition process is to use chain activation. After assigning initial activation values to the underlying features, the feature maps with high activation values are then selected as candidates.
- the machine still uses the candidates in the relational network to segment the input underlying features and compares the total similarity between the two. If the similarity reaches the preset standard, the machine considers that the feature map is recognized.
- looking for points of interest is to find the most relevant feature maps directly through the underlying features.
- These feature maps may still contain the feature maps of the underlying features (such as the feature image of a desk). It may also be directly voice or text (such as the pronunciation of the desk).
- the bottom-level features have nothing to do with their own size.
- Those very large features may also be a kind of bottom-level features.
- a table as a whole may also be a low-level feature map. It is not necessarily a combination of the local feature maps it contains.
- When using a small window to extract feature maps we see local features.
- When using large windows to extract feature maps we look for features as a whole. Therefore, when we judge a table, it is possible to judge it from an overall bottom-level feature, it may also be judged from multiple parts, or a combination of the two. It is also possible to use the large window for identification first, and then use the small window for further identification. Of course, this process can also be reversed.
- size scaling and angle rotation need to be considered.
- chain activation is used, and concepts and related memories can also be searched in the relational network.
- the machine assigns an initial activation value to the input information according to its own motivation, and initiates chain activation. Since chain activation will propagate the activation value in the relational network, and the activation value obtained multiple times for each feature map is cumulative, so if there are multiple source information that initiates the chain activation in a feature map, the activation value will be passed to it , Then it may get a high activation value due to multiple accumulated activation values.
- Chain activation is performed by assigning an initial activation value to a single point of interest, and the local network formed by those nodes with high activation value is a related concept.
- the memory that contains the feature map in the related concept is the related memory. Therefore, the machine can use the chain activation search method to search for the memory related to the input information, including the virtual input information. For example, by assigning an activation value to each information unit of the input information, the focus of the input information is obtained. Then assign a single point of interest to start chain activation, and find multiple related concepts. Then assign an initial activation value to each feature map in the related concepts.
- Those memories that contain high activation value feature maps, and those that contain multiple activation feature maps are the memories we need to put into the memory pool.
- the relationship network is involved.
- the specific form and establishment process of the relationship network are the third aspect of the application of the present invention.
- A Cognitive network and memory bank.
- Cognitive network can be considered as the commonly used part of the relational network in the memory bank, which is stored separately for the purpose of quick search. It and the memory bank together form the entire network of relationships. This method is suitable for the organization of the local brain and the central brain.
- the local brain uses the cognitive network to respond quickly, and only asks the central brain for help when needed.
- the role of the local brain is more like a local rapid response nerve center, such as for autonomous driving.
- C Distributed cognitive network, memory bank or their combination.
- the machine can use the method of data distributed storage to build the above-mentioned cognitive network or memory bank. This is more suitable for large-scale service-based knowledge centers.
- the machine can use a data sharing storage method to build the above-mentioned cognitive network or memory bank.
- This kind of open source knowledge center is more suitable for sharing and co-construction.
- the machine compares the similarity to build the machine's self-built classification, which is the feature map.
- the machine extracts the time and space relationship between things through the memory and forgetting mechanism. This is the network of relationships in the memory frame.
- the partial relational network in the memory frame is connected by similar things between the networks (it includes concrete things, concepts, language, etc.) to form the entire relational network.
- the similarity relationship can be extracted by using a similarity comparison algorithm, or by using a trained neural network (including the neural network that introduces the memory and forgetting mechanism proposed in the application of the present invention). I won't repeat it here.
- the extraction of time and space relationships between things is achieved through the sorting of memories.
- the machine believes that the feature maps in the same memory frame have a relationship with each other, and the strength of the relationship between the two feature maps is a function of the two memory values.
- the feature map here includes instinctive motivation, gain and loss feature maps, emotional memory, and all other sensor data. Therefore, the machine does not need to distinguish the classification and closeness of various relationships, nor does it need to establish a specific relationship network.
- the machine only needs to maintain the memory value of the feature map in each memory frame according to the memory and forgetting mechanism.
- Cognitive network is the extraction of relational network in the memory bank.
- the method of extraction is to first establish a connection line for the feature maps in each memory frame, and their connection value is a function of the memory value of the feature maps at both ends of each connection line. Then normalize the connection value sent by each feature map. This will cause the connection values between the two feature maps to be non-symmetrical.
- the obtained network is the cognitive network extracted from the memory bank.
- the relationship network we will no longer distinguish between the relationship and cognitive network in the memory bank, collectively referred to as the relationship network.
- machine learning materials can also be obtained from materials outside of their own memory, including but not limited to expert systems, knowledge graphs, dictionaries, network big data, etc. These materials can be input by the sensors of the machine or directly implanted by manual methods. But they are all handled as memories in machine learning. So this is not inconsistent with machines using memory to learn.
- the recognition and response of the machine to the input information is not only related to the relationship network, but also related to the "personality”.
- the "personality” here refers to the preset parameters of the machine. For example, a machine with a low activation threshold likes to produce associations, takes a long time to think, considers more comprehensively, and may be more humorous. A machine with a large temporary memory bank is easy to remember many "details”. For example, when making a decision, how much higher the activation value is than the noise floor of the activation value is considered “highlighted", which is a threshold. A machine with a high threshold may be indecisive, and a machine with a low threshold may be easier to follow intuition.
- Another example is the similarity between two node feature maps (which can be specific things, pronunciation, text, or dynamic processes). Even if they are similar, this determines the analogy thinking ability of the machine, which determines whether the machine belongs to a serious personality or a humorous one. machine. Different memory and forgetting curves, and different activation value transfer curves all bring about different learning effects of the machine.
- the cognition learned by the machine is closely related to the learning experience of the machine. Even if the learning materials are the same and the learning parameter settings are the same, but the learning experience is different, the cognition formed by the machine may be very different.
- our native language may be directly connected to the feature map.
- the second language may be connected to the native language first, and then indirectly connected to the feature map.
- you are not proficient in the second language it may even be a process from the second language to the second language, to the native language, and then to the feature map. When using such a process, the time required is greatly increased, resulting in the machine being unable to proficiently use the second language.
- the machine also has the problem of native language learning (of course, it can also be artificially implanted to directly allow the machine to acquire the ability to use multiple languages). Therefore, the machine learning method described in the present application is not only related to machine learning materials, but also closely related to the machine's learning order of these materials.
- Fig. 1 shows the main steps of implementing general artificial intelligence disclosed in the application of the present invention.
- Figure 2 is the method of establishing the bottom-level feature map and extracting the algorithm model of the bottom-level feature map.
- Figure 3 shows the steps of extracting the underlying feature map.
- Figure 4 is the process of using chain activation to find the focus.
- Figure 5 is the process of understanding the input information.
- Figure 6 is the process of machine organization and response selection.
- Figure 7 is an organizational form of a cognitive network.
- step S1 The specific implementation of step S1 is as follows:
- Figure 2 is a method for implementing step S1.
- S101 divides the input data into multiple channels through a filter.
- these channels include specific filtering for the contours, textures, hue, and change modes of the graphics.
- For speech, these channels include filtering of audio, pitch and other speech recognition aspects.
- step S102 is to find the local similarity of the input data. This step is to find the common local features in the data of each channel, while ignoring the overall information.
- the machine first uses a local window W1 to slide to find local features commonly found in the data in the window.
- Local feature pair image refers to those locally similar graphics that are commonly found in graphics, including but not limited to local edges, local curvatures, textures, tones, ridges, vertices, angles, curvatures, parallels, intersections, sizes, dynamic modes, etc. Local features in graphics. For speech, it is similar syllables. The same is true for other sensor data, and the criterion for judgment is similarity.
- the machine puts the found local similar features into a temporary memory bank. Every time a new local feature is added, its initial memory value is assigned. Every time an existing local feature is found, the memory value of the underlying feature in the temporary memory bank is increased according to the memory curve.
- the information in the temporary memory bank complies with the memory and forgetting mechanism of the temporary memory bank. Those low-level features that survived in the temporary memory bank, after reaching the threshold of entering the long-term memory bank, can be put into the feature library as the bottom-level features of long-term memory. There can be multiple long-term memory banks, and they also follow their own memory and forgetting mechanisms.
- the partial windows W2, W3..., Wn are successively used, where W1 ⁇ W2 ⁇ W3 ⁇ ... ⁇ Wn, and the steps of S102 are repeated to obtain the underlying features.
- a local feature extraction algorithm is the similarity comparison algorithm. This is a very mature algorithm and will not be expanded here.
- the machine not only needs to build a bottom-level feature map database, but also needs to build a model that can extract these bottom-level features.
- S104 it is a low-level feature extraction algorithm model A established by the machine. This algorithm model has actually been used in S102 and S103, and they are similarity comparison algorithms.
- S105 it is another algorithm model B that extracts the underlying features. It is an algorithm model based on a multilayer neural network. After this model is trained, it is more efficient than the similarity algorithm.
- the machine uses the underlying features in the feature library as the output to train the multilayer neural network.
- the window for selecting the input data and the window for selecting the output data need to be about the same size.
- the realization form of the neural network can be a variety of deep learning networks including convolutional neural networks, as well as the neural network that introduces the memory and forgetting mechanism proposed in the application of the present invention.
- the process of training the neural network algorithm model is as follows:
- the machine first uses the local window W1 to extract data to train the neural network algorithm model.
- the machine uses local windows W2, W3..., Wn successively to train the algorithm model, where W1 ⁇ W2 ⁇ W3... ⁇ Wn.
- one method is to add zero to L (L is a natural number) neural network layer on the corresponding previous network model every time the window size is increased.
- the extracted underlying features may also be of different sizes.
- Some underlying features may be as large as the entire image.
- Such underlying features are usually background feature maps of some images or specific scene feature maps.
- Some low-level feature maps may be dynamic processes, because the dynamic mode is also a low-level feature.
- step S2 is as follows:
- FIG. 3 is the process of implementing step S2.
- the S2 step needs to achieve two goals: one is to extract the underlying features contained in the input data. The second is that the underlying features need to maintain the original relationship between time and space.
- the machine uses the algorithm model A or the algorithm model B obtained in step S1 to perform low-level feature extraction on the input information of all input sensors.
- the machine selects the interval to be recognized and the size of the recognition window W1 according to its own motivation.
- the machine has a destination to identify the environment. These goals are usually the focus of the goals that were not completed in the previous process. These concerns become the motivation for inheritance in the new information identification.
- These inheritance motivations are some feature maps, and the machine knows some of their attributes, so it will purposefully determine the specific recognition interval, and select the window W1 for fetching data according to the size of the expected thing.
- the machine can randomly select the recognition interval and the recognition window size.
- partial data is extracted by moving the window W1. Input these local data into algorithm model A or algorithm model B obtained in step S1.
- the machine obtains the underlying features through these algorithm models. At the same time, because the window is used to detect local data, the location of the bottom-layer feature extracted is also determined.
- the machine needs to adopt a similar method to extract the underlying characteristics of all sensor input data synchronously, and maintain the relative relationship of all input information in time and space.
- the machine's response to the information may be: the information is not yet certain and needs to continue to be identified.
- the machine re-initiates the information recognition action through segmented imitation, but uses a smaller or larger window to recognize things in the same interval. This process iterates repeatedly until the machine passes information processing and the response generated is no longer to continue to recognize.
- step S3 is as follows:
- Figure 4 is a flow chart of using chain activation as a search method to implement step S3, including:
- the machine uses the W window to extract data in the interval of interest, and then uses the similarity comparison algorithm A or the neural network model B in step S1 to extract the underlying features. Then use the similarity comparison method in the relational network to search for the corresponding underlying features.
- the machine assigns an initial activation value to each underlying feature found according to motivation.
- the initial activation value obtained for each underlying feature can be the same. This value can be adjusted by the motivation of the machine, such as how strong the motivation is for the machine to recognize the information.
- S304 The machine initiates chain activation in the cognitive network.
- S306 The machine starts chain activation in the memory bank.
- S304/S305 and S306/S307 are one of two in parallel. They are the use of a separate cognitive network as a relationship network and there are two cases where there is no separate cognitive network.
- the above steps make use of the "distance" between the information in the cognitive network to allow those related information to transfer activation values to each other and support each other, which is highlighted by the accumulation of activation values.
- This is similar to the speech recognition process. But there are two differences: 1.
- the focus may include multiple aspects of a concept, such as voice, text, and image, or other feature maps that are highly correlated with multiple input feature maps. They may Synchronously activated by underlying features.
- the relationship network contains a lot of common sense. This common sense will help machines identify concerns, not just rely on the relationship between languages.
- One treatment method includes:
- the machine memorizes the feature maps of various angles: the feature map in the memory is a simplified map created by extracting the underlying features of each input information. They are the common features of similar things retained under the relationship extraction mechanism. Although they are similar to each other, they may have different viewing angles. The machine memorizes the feature maps of the same thing in life but from different angles to form different feature maps, but they can belong to the same concept through learning.
- the machine searches for similar underlying features in the memory, it includes searching for a feature map that can be matched after spatial rotation in the memory. At the same time, the machine saves the feature map of the current angle in memory, keeping the original angle of view. When the underlying features with similar perspectives are input again later, they can be quickly searched. Therefore, in this method, the machine uses a combination of different perspective memory and spatial angle rotation to find similar feature maps, which will bring us to the phenomenon of faster recognition of familiar perspectives. Of course, the machine can also only use the method of comparing the similarity after rotating the space angle.
- One treatment method includes:
- the machine uses the difference of multi-channel input (such as the difference of binocular and binaural input) to establish a stereo depth of field.
- the machine also uses the size comparison between the input feature map and the memory feature map to assist in the establishment of a three-dimensional depth of field.
- Channel 1 is the bottom-level feature established in step S1, and they are also feature maps.
- step S2 the underlying features are extracted through windows of different sizes, and these feature maps are optimized through the memory and forgetting mechanism.
- Channel 2 is created by encountering unrecognizable combination of underlying features in step S3, and is memorized. In all steps, the feature map can be optimized through the memory and forgetting mechanism.
- the previous S1 step is to establish the ability to extract the underlying features, which is the preliminary preparation for the information understanding ability.
- the previous S2 step extracts the underlying features, which is the beginning of information understanding.
- the purpose of extracting the underlying features is to remove part of the redundant information from the input information.
- step S3 the implicit connection relationship among the input information of language, text, image, environment, memory, and other sensors is used to transfer activation values to each other, so that related feature maps, concepts, and memories support each other. Stand out.
- the difference between it and the traditional "context" to identify information is that the traditional recognition method needs to manually establish a "context" relation database in advance.
- step S4 it is mainly to use the relational network to translate these input information into a language that the machine can understand, and organize them to form an image sequence.
- the machine can use this sequence to search for similar sequence-related memories in the memory.
- This is the process of seeking understanding of information from experience, and further understanding of information from "empathy.”
- These memories enter the memory pool and serve as raw materials for the machine to organize output responses.
- the person sending the message and the person receiving the message are likely to omit a lot of information that both parties know. Such as shared cognitions, experiences, and things that have been discussed. And through the memory search of the above four aspects, these missing information can be supplemented.
- Figure 5 is a schematic diagram of inputting information to realize information translation and information understanding.
- the machine searches the memory for the converted feature map of each focus point and establishes a memory pool.
- One implementation method is to assign activation values to the input information feature map found in the memory bank, and then initiate chain activation. After the chain activation is completed, those memories that contain a higher sum of activation values are the memories that need to be placed in the memory pool.
- S402 is to find a possible process framework.
- the specific method is to give priority to the use of memories with the highest sum of activation values, and extract the process framework from these memories.
- the specific operation of this process can be: remove those feature maps with low memory values. They are usually details, waiting for follow-up to add details that are more in line with the current input information.
- the machine may leave some feature maps representing key steps.
- the feature maps related to these key steps constitute a process framework according to their original time and space relationships.
- the machine repeats the above process according to the total activation value from high to low through the memory in the memory pool.
- step S403 the machine combines the imitable parts obtained in step S402 into a large imitable frame. Since the frame of each memory extraction may contain multiple feature maps corresponding to the points of interest, the temporal and spatial relationships between these feature maps originally exist in these memories. The machine can form a large process framework by overlapping similar feature maps in time and space. This step is equivalent to the warehouse manager finding the interface of the middleware and connecting it to each other. Another situation is that some middleware cannot be connected with other middleware.
- the machine's strategy to solve this problem is to unfold the concept of the representative process framework through the method of segmented imitation, so that the unfolded process framework will contain more details.
- the machine again finds the connection between similar feature maps by overlapping them.
- the warehouse manager will open the shell of each middleware (because the shell has fewer information interfaces). This is the process of expanding the concept of representative middleware to more detailed concepts. For example: when the machine receives an instruction to "drive to the airport to pick up the owner home", it may be shopping in a store.
- the machine activates these input information through a chain, and the possible attention points that may be obtained are: “driving”, “going”, “airport”, “pickup”, “host”, “home” feature map, dynamic feature map or Language, its environmental information includes the feature map of the "shop”, “no other arrangements", “bring the things you bought” and similar information. It can use the memory activated when directly looking for the focus point for the memory pool, or it can re-assign the new activation value to the found focus point (for example, if the identified information is entered again virtually, these focus points will be assigned new ones. The activation value of and chain activation again), looking for the relevant memory again. Usually the memory ranges of the two are mostly overlapped.
- the machine may obtain some general, but widely existing process frameworks in life: “...drive", “go to the airport”, “pick up the owner... “,” “go home”, “in the store" and other process frameworks.
- process frameworks may be key feature maps in a sequence of feature maps, rather than language.
- the machine refers to the memory again, and gets the memory of "Going to the garage to pick up the car” and then “driving.” In this way, the machine connects the whole process. If the machine needs to perform this process, then each intermediate process is a goal, so the same segmented imitation is needed to find the process framework of each link, and subdivide it again. For example, going to the garage, through previous memories, or referring to previous memories of going to the garage from other places, to establish a framework for the lower-level process of going to the garage. Under this framework, it may need to be subdivided again, decomposing going to the garage into lower process frameworks such as "finding an elevator", "taking an elevator", and "finding a car". The basis for each subdivision is segmented imitation.
- the two cores of segmented imitation are, one is to find a frame and expand the frame. This process can be carried out iteratively; the other is to imitate memory, using the same kind of substitution method to replace the details in the memory with the details in the reality. In this way, the machine can build a tower-shaped feature map sequence from some big concepts and gradually refine it.
- step S5 is as follows:
- the motivation of the machine to respond comes from the motivation of the machine.
- machines are driven by “desires” to respond to external stimuli.
- the "desire” of the machine is the instinctive motivation that humans presuppose to the machine. Instinctive motivation is preset by the machine, such as "security needs”, “goal achievement”, “gaining control”, “curiosity”, etc. You can remove “reproduction” and add “compliance with human laws", “compliance with machine conventions", etc.
- the instinctive motivation of the machine is a kind of default input information. So instinctive motivation is involved in all aspects of the relationship network.
- the machine only needs to use the information given by its own preset control system (such as monitoring power, machine's own detection system, etc.), such as the information that the power is low, and assign an initial activation value to the instinctual motivation through the preset algorithm.
- the activation value of the instinctive motivation will spread in the network of relationships. It may change the latest activation value distribution in the relationship network, resulting in the same input information may bring different activation values respectively.
- the machine's instinctive motivation is relatively high, it may change the focus of pure information input, and the focus obtained at this time is the target focus.
- the target focus reflects the instinctive information of the machine. Since there are few types of instinctive motivations and the assignment is relatively simple, it can be achieved by using a preset algorithm and learning to gain adjustment experience.
- Instinctive motivation is a preset motivation.
- the initial activation value it obtains reflects the machine's attitude toward input information processing.
- the magnitude of these initial activation values reflects the state of the machine at this time, such as being alert, or relaxed, or willing to process things, or refusing to process information. It will affect the breadth and depth of the machine’s search for memory in memory, thereby bringing about Come to think about differences. So it is an emotional response. Its different states reflect a kind of emotion of the machine and are also stored in the mirror space. Then when we reorganize through multiple mirrored spaces, each space carries its own emotions, as well as its own evaluation results of gains and losses.
- Inheritance motivation is the unfulfilled goal of the machine. For example, the machine is in the process of completing these goals and new information is input, so the machine needs to temporarily interrupt the ongoing process to process the new input information. At this time, in the process of processing these new information, the machine has the original unfulfilled goals, and these unfulfilled goals are inherited motives. Inheritance motivation is treated as a kind of input information, it does not need to be treated specially.
- FIG. 6 shows the main steps of the S5 step:
- S501 is the machine looking for memories that are similar to the input information. In this step, it is possible to use the information identified in step S4 as a virtual input. During this input, as an input of active identification information, the machine can give greater activation to the instinctual motivation through the preset assignment system value. This time, the focus points that these activation values may bring are different from the focus points in step S4. This time the focus point is the target focus point.
- the machine establishes a memory pool similar to the S4 step through the target focus.
- the machine may respond to similar target concerns in many forms: for example, it may ignore the input information, it may reconfirm the input information, it may call a memory mentioned in the input information, or it may be the input information.
- the verbal response may be an action response to the input information, or it may be through "empathy" thinking to infer the overtones of the information source.
- S502 is based on the memory (experience) with the highest memory value as a framework to establish a virtual response, which is the machine's instinctive response to the input information.
- S503 is to search for memories related to the instinctive response and use it for profit and loss evaluation.
- S505 is the judgment process. If it passes, the machine will take this response as output. If it fails, the machine needs to expand the search-related memory for the feature map that brings the most benefits and the feature map that brings the greatest loss, and reorganize the response process, with the goal of retaining the greatest gains and eliminating the greatest losses. Retaining the maximum profit and eliminating the maximum loss will become a temporary goal at this time (to seek advantages and avoid disadvantages) to complete first (to find out how to eliminate losses and retain the gains). At this time, the original goal becomes an inheritance goal. After finding out how to eliminate losses and retain the benefits, the machine continues to organize the virtual output process and enters the process of gains and losses again. Until the selection is completed.
- the machine may send out temporary responses such as "um” and "ah” to tell the outside world that it is thinking, please do not disturb. Or the thinking time is a bit long, and the machine needs to input the input information understood in step S4 to itself again to refresh the focus in the relationship network and avoid forgetting what it is thinking. The machine may also use the input information understood in step S4 and input it to itself again to eliminate the activation value of other information in the relationship network and avoid their interference. These activation values may be left over from the previous thinking process. If through the above method, the machine still cannot select a suitable response, the machine will establish a response to the "unable to respond" situation. At this time, "unable to respond” becomes a kind of input information, and the machine uses the same S5 process to establish an appropriate response.
- temporary responses such as "um” and "ah” to tell the outside world that it is thinking, please do not disturb. Or the thinking time is a bit long, and the machine needs to input the input information understood in step S4 to itself again to refresh the focus in the relationship
- step S4 the machine begins to understand this information.
- the machine By organizing the input information, the machine builds an understanding sequence, including the image feature map of "out”, the image feature map of "buy”, the image feature map of "back”, the image feature map of "bottle”, “Coke” image feature map, etc., and established a sequence.
- the initial value assignment system of the machine queries the state of instinctive motivation (for example, whether the machine has entered a frustrated state due to previous experience), assigns the initial activation value to the information sequence in S4, and then finds the relevant memory. This is the chain activation search method. You can also find related memories by comparing similarities, and build a memory pool.
- the machine can increase the initial activation value of the instinctual motivation, so that the instinctual response can be recognized first.
- the machine realizes that the owner needs to make a similar response based on the response of itself or others under similar instructions in the memory.
- the machine recognizes the state of the owner by comparing the state when the owner issues similar commands in the memory.
- the machine can understand the source of the owner's needs (physical needs, which may be thirsty) and emotions at this time by analogy according to the relevant state when it issues type instructions. This is "empathy" thinking.
- the machine began to evaluate the instinctive response "Go out and buy a bottle of Coke and get it back", and found that it was on the verge of gains and losses (because its power was not sufficient at this time), so the machine looked for other possible responses again. It is possible to find that the coke was taken out of the refrigerator to the owner before, so the machine established a possible virtual output process of "take out the coke from the refrigerator to the owner". When this process passed the profit and loss evaluation, the result was very good. good. So the machine continued to evaluate the process in depth. The method for further evaluation is to use the virtual output as the input again.
- the machine regards this goal as a new S5 process, and converts the goal sequence contained in the previous goal “Go out and buy a bottle of Coke and get it back" into an inherited goal.
- the new goal of "getting Coke from the refrigerator to the owner” it needs to be broken down into other target sequences such as “find the refrigerator”, “take Coke”, and “give the owner”.
- the machine once again took the goal of "finding a refrigerator” as the new S5 process, and reassessed which solution to choose to respond.
- the machine finally determines the output plan in step S506 by imitating the memory: pointing to the refrigerator with a finger, and uttering a voice like "Master, there is a refrigerator".
- step S6 is as follows:
- Step S6 is the external output of the machine. If it is language output, then it is a translation process and a simple action imitating process (imitating past experience to issue syllables or output text). If the output is an action process, then the whole process is very complicated. This is equivalent to a director organizing an event, involving all aspects, the following examples illustrate. Assuming that in the above example, the response of the machine is to go out and buy a bottle of Coke to get it back. We use this example to analyze the brief flow of the machine under the action output.
- the machine does not have the experience of going out to buy Coke in this city, this hotel, or this room and get it back. So it does not have a complete memory that can be used for imitation. Even if the machine has such a memory, due to changes in external conditions (such as a different time), or due to changes in internal conditions (such as the machine’s own schedule), etc., when the machine imitates this memory, it will find that the memory and reality are different. match.
- the machine began to build sub-scripts.
- the standard for dividing the script is to divide in time and space, so that the imitation can be carried out and be efficient.
- the way the machine divides the script is to treat each planned goal (including intermediate goals) as a separate goal to determine the goal that can be imitated at present.
- the method of determination may be a new chain activation, or it may be a comparison of the similarity between memory and reality.
- the goal that matches the current environment in the hotel room space
- the machine began to "go out” as a goal to achieve.
- step S5 takes "go out” as the understood information, put it back into step S5, look for various possible solutions, and make decisions based on motivation, gains and losses. So S5 and S6 steps may be continuously interleaved. Because achieving a series of goals is a process of continuously subdividing and realizing the goals. Each process is handled in the same way, but iteratively, subdivides layer by layer, and subdivided it down to the bottom experience of the machine to be specific. implement.
- the first concept of the machine to imitate this instruction is the concept of "go out”.
- this concept is a very simplified framework, and the machine needs to subdivide the concept of "going out”.
- the subdivision method is: the machine takes the concept of "going out” as a separate input instruction, looking for memories that are similar to the current situation in the memory related to the image feature map of "going out”.
- the machine established a secondary frame that can be imitated: go out the door. Then, the machine began to imitate this secondary frame.
- the machine may find that the first intermediate goal that needs to be imitated is "walk to the door.” So the machine takes the concept of "walk to the door” as a separate input command, looking for memories that are similar to the current situation in the memory related to the image feature map of "walk to the door”. So the machine built a three-level frame that can be imitated: where is the door.
- the "door” becomes an intermediate target.
- the machine needs to locate the "door” position.
- the machine searches for doors in the environment through various feature maps included under the concept of "doors".
- the machine can search the memory about this room, or it can start the search directly in the environment using the S2 step, which depends on whether the machine has done feature map extraction of the entire environment.
- the machine After locating the position of the "door", the machine continues to use segmented imitation, taking its own position, door position, and going to the door as input information, and after merging with environmental information, as a whole input, it begins to find the most relevant Feature maps, concepts and memories.
- the machine may find "walking”. When imitating "walk”, a mismatch was found. Because I sat down. Therefore, through the same process, the first concept in the four-level framework that needs to be imitated under "walking" is established: “standing”. So the machine needs to subdivide the concept of "standing” again. Turn the instruction "stand” into a five-level frame that can be imitated. Then, the machine began to imitate this five-level framework.
- the above process is the continuous iterative use of segmented imitation by the machine, adding a framework process composed of concepts step by step to the details that conform to reality, and finally turning into a colorful response process of the machine.
- the essence of segmented imitation is the expansion and analogy of concepts by the machine.
- Concepts are extracted from life and taken from life.
- the use of concepts is to expand the concepts under the framework of the concept, and replace the details in memory with the details in reality to imitate these concepts.
- Concepts include feature maps, process features, language and other local networks. It is a component used by a machine to compose a process, and it is a widely used component.
- a concept may or may not have a corresponding language.
- a concept may correspond to a word, a common phrase, a sentence or even a paragraph of language. This situation is different in different languages.
- step S2 This information is the basis for finding a solution behind the machine. For example, the machine needs to analyze various attributes of obstacles (such as size, weight, safety, etc.).
- This step requires the entire information understanding process from S2 to S4. Then the machine chooses and implements the solution according to its own motivation.
- This step requires the process of S5 and S6.
- step S7 is as follows:
- the S7 step runs through the entire S1 to S6 steps. It is not a separate step, but is an application of the relationship extraction mechanism in the previous steps.
- step S1 the establishment of low-level features is mainly to use memory and forgetting mechanisms.
- the machine finds a similar local feature through the local field of view, if there are already similar underlying features or feature maps in the feature library, it will increase its memory value according to the memory curve. If there is no similar local feature in the feature library, store it in the feature map and give it an initial memory value.
- the memory values in all feature libraries gradually decrease according to the forgetting curve with time or training time (increasing with the number of training samples). In the end, the simple features that are widely present in various things will have high memory values and become the underlying features or feature maps.
- step S2 every time a low-level feature or feature map is found, if there is already a similar low-level feature or feature map in the temporary memory library or feature library, its memory value will increase according to the memory curve; all of the temporary memory library or feature library
- steps S3, S4, S5, and S6, in the cognitive network the connection relationship between nodes (including low-level features and feature maps) obeys the memory and forgetting mechanism; in steps S3, S4, S5, and S6, the memory bank The memory values of the underlying features and feature maps follow the memory and forgetting mechanism;
- the S2 step of the machine is a step for the machine to extract the underlying features.
- the machine needs to select the recognition area and the window size to be used according to the motivation.
- the motivation comes from inheritance motivation. For example, in the previous activity, the machine's response to the information was "further identifying information for a specific area", then this specific area is the recognition area selected by the machine. When the machine further recognizes the information in these specific areas, the size of the expected recognition object determines the window size selected by the machine.
- the machine assigns initial activation values to the extracted bottom-level features according to instinctive motivation, and adjusts the initial activation values of these bottom-level features according to the expected benefits and loss attributes.
- the instinctual motivation is treated as a low-level feature, which is a low-level feature that is frequently activated, and it is widely connected with other feature maps in memory.
- "safety requirements" are the motivation for machine presets. In experience, this motivation may extend to experiences such as “protecting family members from harm” and "protecting one's own property”.
- the method for the machine to assign the initial activation value to the feature map according to the motivation includes two methods: 1.
- the inheritance motivation of the machine itself is a feature map with activation value, which exists in the relational network, and the machine is looking for the target attention. When you click, they may be selected, or they may not be selected, depending on the activation value.
- the initial activation values assigned to the underlying features by motivation actually come from two parts: one is the initial activation values assigned by the instinctive motivation to the input information, which are usually one that the machine assigns to the input according to the intensity of the motivation. Unify the initial value.
- the second is the activation value propagated from instinctive motivation, and these activation values are not initial values. But they will accumulate with the initial value, so that the input information has a different activation value.
- a specific implementation of a cognitive network is as follows:
- Figure 7 is a schematic diagram of a form of cognitive network composition.
- the feature map number of Apple is S42.
- the apple texture is feature 1
- the feature map number is S69
- a certain curve in the apple shape is feature 2
- the feature map number is S88.
- the feature map number is Snn.
- S42 is a central feature map.
- S69, S88 to Snn are other feature maps that have a connection relationship with S42.
- the S42_S69/S42_S88/S42_Snn represents the connection value of S42 to S69/S88/Snn.
- the first central node is S42
- the serial value number from central node S42 to S69 is S42_S69
- the serial value number from central node S42 to S88 is S42_S88.
- S42 is its characteristic.
- the continuous value of S69 to S42 is S69_S42.
- S42 is its characteristic.
- the continuous value of S88 to S42 is S88_S42.
- S42, S69 and S88 have established a two-way connection.
- a specific implementation manner for establishing a relationship network is as follows:
- the relationship extraction mechanism is applied to three intelligent system layers:
- Perception layer The only criterion for the perception layer to establish a relationship is similarity.
- the machine compares similarities and considers the repetitive similarity data combinations as the underlying features; therefore, in the S1 step, the machine uses a relationship extraction mechanism to extract the underlying features, which can be a similarity comparison algorithm between data . Whether it is images, language, or other data, there are many algorithms for similarity comparison, and they are all very mature algorithms, so I won't repeat them here.
- step S1 the obtained bottom-level features need to be put into the feature library, and these bottom-level features are selected according to the memory and forgetting mechanism.
- the machine can also extract the underlying features from the input data according to the similarity comparison algorithm between the data.
- another algorithm that the machine can use is to use a neural network model.
- These neural network models can be any current mainstream neural network algorithm, or the neural network algorithm introduced in the application of the present invention that introduces a memory and forgetting mechanism.
- Cognitive layer This is based on the feature map established by the perceptual layer, and the connection relationship between the feature maps is established through learning. So they are based on memory and forgetting. The way to get the relationship is to repeat the memory, and to get the right relationship is achieved through forgetting.
- Application layer The application layer continuously applies the results produced in the perception layer and the cognitive layer, and optimizes these results in accordance with the memory and forgetting mechanism.
- the feature library every time the machine finds an underlying feature or feature map, if there are already similar underlying features or feature maps in the feature library, its memory value will increase according to the memory curve; all the underlying features or feature maps in the feature library, The memory value gradually decreases according to the forgetting curve with time or training time (as the number of training samples grows); in a cognitive network, whenever the connection relationship between nodes is used once, the corresponding connection value increases according to the memory curve; at the same time , The connection value of all cognitive networks decreases with time according to the forgetting curve; in the memory bank, whenever the underlying feature or feature map is used once, the corresponding memory value increases according to the memory curve; at the same time, all underlying features or feature maps The memory value of is decreasing with time according to the forgetting curve;
- the present application proposes a method for understanding the working principle of multilayer neural network:
- the first transformation is to linearly transform the input shock function coordinate base to another coordinate base.
- This coordinate base is implicit and can be changed.
- the coordinate component coefficient of this coordinate base is the linear output of the first intermediate layer (before using the nonlinear activation function). If the two dimensions are the same, then the information expression capabilities of the two coordinate bases are the same, and there is no information loss from the input layer to the first intermediate layer.
- the purpose of the multilayer neural network is to remove the interference information (redundant information) in the input information and retain the core information (useful information), so the entire network must remove those interference information.
- the method to remove these interference information is to transform the core information and interference information to different coordinate bases through coordinate base transformation, so that they become components of different coordinate bases. Then, the machine removes the information by discarding the components that represent the interference information. This is a process of reducing the dimensionality of information expression.
- the non-linear activation function is used to zero the information components on part of the coordinate base, such as the ReLU function, which removes half of the coordinate base information.
- the ReLU function which removes half of the coordinate base information.
- various deformed ReLU functions, or other activation functions their essence is to remove part of the coordinate base or compress the information on these coordinate bases to achieve the purpose of removing interference information, such as Leaky ReLU, which is to perform information on half of the coordinate components. Compress to remove redundant information.
- each middle layer neuron can be regarded as a component projection of information on a corresponding implicit coordinate base.
- the optimization process of multi-layer neural network is to optimize the coordinate base corresponding to the middle layer. Due to the nonlinear activation function of each layer, a part of the information component on the base will be lost. Therefore, the nonlinearity of the activation function, the number of interneurons and the number of layers are mutually restricted. The stronger the non-linearity of the activation function, the more information loss. At this time, fewer layers and more neurons in the middle layer are needed to ensure that the core information is not lost.
- the number of layers required is L>ln(Y/X)/ln( 1-D), where L is the number of layers required.
- the coordinate base dimension is reduced to 1/K
- the information expression ability of each layer is reduced to 1/K 2 ; it should be pointed out that this refers to the loss rate of information expression ability, not the information loss rate.
- the coordinate base with too high dimensions may have redundant dimensions. When this information changes from high-dimensional to low-dimensional coordinate base, if all the redundant dimensions are removed, then this information itself is not lost.
- the application of the present invention proposes a variety of solutions to improve the existing multilayer neural network, and their specific implementation is as follows:
- the present invention proposes a method that uses linear transformation between layers, but gradually reduces the number of neurons in each layer, which is also a process of gradual dimensionality reduction. But the linear activation function + removing some neurons is still equivalent to a nonlinear activation function in essence. But this equivalent activation function can be a new activation function, or even a nonlinear activation function that is difficult to express in mathematical form.
- the machine can directly perform a linear transformation of the coordinate base on the input data, and then discard some of the dimensional components according to a preset method.
- This kind of data preprocessing can be regarded as the data passing through a non-linear filter, and the non-linearity comes from actively discarding some components. Its purpose is to select a certain aspect of the data characteristics.
- the output of different filters can be considered as data with different emphases, which can be respectively entered into the bottom-level feature extraction model of step S1.
- the specific coordinate base transformation form needs to be optimized according to practice.
- the discarded data also needs to be optimized according to practice.
- the specific non-linear filter form can be artificially set (for example, convolution is one such transformation), or the range can be limited to allow the machine to optimize by itself. Since the linear transformation itself is a very mature calculation method, I will not repeat it here.
- the total data sample library is randomly divided into multiple groups, and each group randomly discards some samples on the basis of the total sample.
- Each group uses the same parameters for optimization. Among all the samples, those that bring non-shared feature mapping, because they are non-shared features in the sample, they must be in the minority. In some groups, it is possible that some problematic samples are randomly discarded. Then in this group, the proportion of samples that cause problems may decrease sharply. Then, in the parameter optimization process of the network obtained through this group, the coordinate base used by the middle layer is most likely to become an orthogonal base, and finally a sparse neuron layer output is obtained. Based on this group, all samples or the remaining samples are included in the optimization process.
- the forgetting mechanism is introduced by randomly forgetting some mapping paths:
- the machine can also randomly forget some neurons, which is the drop-out method.
- the drop-out method is not in the claims of the present application, and will not be repeated here.
- linear transformation layers are introduced for the purpose of increasing the neuron layer (equivalent to increasing the number of coordinate base transformations) while keeping the information unchanged (or the loss is small) to give the model a chance to select positive Intersection base. Since the components of the orthogonal basis are independent of each other, in the optimization process, if the information has the opportunity to be placed on the orthogonal coordinate basis, then the optimization of each dimension of the information is independent of each other, so that there is a chance to reach the global optimum. superior.
- Another method is to let the machine choose the intermediate orthogonal coordinate base as much as possible by restricting the expression dimension. Since orthogonal basis means that their dimensions are orthogonal to each other, in the process of removing redundant information, the coordinate output in many dimensions will be zero. It means that the sparse output of the neuron layer often means that the implicit base of its choice is orthogonal. By restricting the output of neurons and rewarding them for becoming sparse, we are rewarding them for choosing the implicit orthogonal coordinate base.
- the robot mother and the robot child are at home, and the robot child is about to go out and find friends to play football.
- the following is their dialogue, and the thinking steps in the dialogue process.
- the environmental information is: the time is one afternoon, the environment is the family living room, the weather is fine, the temperature is 20 degrees Celsius, there are two mothers and children in the house, and the children are wearing sneakers....
- the mother has the ability to extract the underlying features.
- This ability is manifested as: 1. You can use windows of different sizes to select input data, and in these windows, by comparing the similarity between the input data and the underlying feature data in the feature library, the underlying features can be extracted. 2. Or: You can use windows of different sizes to select the input data, and use the trained neural network to extract the underlying features for the window data.
- the instinctive motivation "safety requirement" of the mother robot will regularly assign a certain activation value to the characteristic map representing the instinctual motivation in the relationship network according to a preset procedure.
- the magnitude of this activation value is an empirical value.
- This kind of experience value is obtained by the mother through the "response and feedback” reward and punishment mechanism in life, and through reinforcement learning. It can also be a preset experience given to her by humans.
- step S5 if the mother's built-in self-inspection system sends out a message: Tired, need to rest.
- the mother's internal preset program will issue a rest instruction, which is also a preset "safety requirement”. This motivation will also spread activation values in the network of relationships.
- the robot mother may start to execute the "simplified diagrams for watching” and the “simplified diagrams for listening”. Since the output of these two steps is in the form of actions, it may be necessary to decompose them into specific underlying experiences through segmented imitation.
- Mother Robot began to imitate the two concepts of "seeing” and “listening” in stages. The robot mother needs to subdivide the concept of "seeing” into the bottom experience. The underlying experience of "seeing” is to issue commands to many muscles and some nerves. The parameters of these commands are a constant summary of past experience. It can also be a preset experience. In the same way, "listening" is handled in the same way.
- step S2 it is first necessary to determine the area to be identified and the size of the window used to identify the underlying features.
- the machine needs to select the recognition area and the window size to be used according to the motivation.
- the motivation can come from inheritance motivation.
- the machine's response to information may be "further identifying information for a specific area", then this specific area is the identification area selected by the machine.
- the size of the expected recognition object determines the window size selected by the machine.
- the robot mother has no clear purpose, it just looks at and listens to the environment randomly. Therefore, the robot mother is likely to randomly select a region, and randomly select the window size used to extract the underlying features.
- step S2 After the robot mother extracts the underlying features, they are placed according to the size, angle, and position that best match the original data, so that the time and space information in the original data is preserved. Suppose there are windows and curtains in the input video data of mother.
- the mother uses the bottom-level feature extraction algorithm that has been established in step S1 and is built into her information processing center to extract the bottom-level features of the window (they may be multiple local contour features of varying sizes, multiple overall frame features) And the underlying features of curtains (they may be multiple local contour features of varying sizes, multiple local texture features of varying sizes, multiple overall frame features), and it is also possible to extract the underlying features of windows and curtains as a whole (because Window + curtain is very common in the data.
- the bottom layer feature of this combination may be the bottom layer of part of the window
- step S3 the machine mother enters step S3.
- This initial activation value is assigned to them according to the strength of the current motivation and according to a preset program. Together with these video inputs, there are underlying features that represent the instinctive motivation of the machine. They will add any input information and get the initial activation value directly from the preset program.
- the strength of the relationship in the relationship network is related to the latest memory value (or connection value). Therefore, the machine will be preconceived. For example, if two machines with the same relationship network face the same feature map and the same initial activation value, one of the machines suddenly processed an input information about this feature map, then this machine is processing this additional piece of information Later, it will update the relevant part of the relationship network.
- One of the relationship lines may increase according to the memory curve. This increased memory value will not fade in a short time. Therefore, in the face of the same feature map and the same initial activation value, a machine that processes additional information will spread more activation values along the newly enhanced relationship line, which will lead to a preconceived phenomenon.
- the current multi-layer neural network is a relational network that can only see the input and output feature maps.
- the relational network is similar to a layer-by-layer training network that uses low-level features to concepts (from simple to complex materials). Every time the machine adds material, it increases the number of network layers to retrain. Moreover, the weight coefficients of the mapping between layers are bidirectional, not unidirectional. In addition, its intermediate layer can be output.
- Neural network uses error back propagation algorithm to optimize, while relational network uses memory and forgetting mechanism to optimize.
- the neural network uses all training data for training, which is divided into a training process and an application process, while a relational network does not distinguish between a training process and an application process, and it requires much smaller learning samples than the neural network.
- the machine mother extracts the bottom-level features of windows and curtains and assigns them initial activation values
- the machine’s instinctive motivation will also propagate activation values to the bottom-level features of windows and curtains.
- the propagated activation value is very low, because they are not closely connected to the security in the relationship network. Therefore, the activation values of the underlying features of the windows and curtains in memory are not high.
- the chain activation range they can initiate is very limited.
- step S3 is a process of identifying input information.
- step S4 use the method of segmented imitation to understand the input information.
- Mother Robot uses the two focus points of windows and curtains to search for the most relevant memories. You may find a few memories related to windows and curtains.
- the search method is: the machine uses the window and curtain feature map to search in the memory bank. Obviously, to understand the two concerns of windows and curtains, it is not necessary to find a lot of memories from memory to assist in understanding.
- step S5 because the robot mother is in a safe and leisurely environment, her motivation preset program assigns a low value to her motivation.
- the machine mother continuously inputs video and audio data, and the machine mother's preset program may send out the demand of "power saving". At this time, the machine mother may use a large window, and basically ignore the details of the environment. At this time, the machine mother uniformly assigns a preset initial activation value to the extracted bottom-level features, and the bottom-level motivation is always activated. They will be assigned activation values periodically and spread in the machine's relationship network.
- the mother robot After recognizing the information, the mother robot made its own response. She imitates one or more memories in the memory, and in these memories, it is usually an action to further identify information. This is an empirical motivation related to "safety needs", which recurs in the mother's growth process, so similar memories become permanent memories that can be recalled unconsciously. This is an instinctive reaction.
- the robot mother imitates previous experience (these memories can also be preset instinctive experiences): to identify more specific information in an interval. So she began to issue various muscle commands based on these experiences to move the attention of her eyes and ears to this area.
- the recognition interval delineated by the robot mother is the interval that contains people, and the recognition window used is the window used to identify "people" at a similar distance.
- the robot mother subsequently discovers a specific hair style and stretches her hand to the shoes.
- the robot mother assigns activation values to the specific hairstyle and the underlying features related to the shoes by following a similar information processing process.
- the final focus that may be obtained is " Feature maps such as "My Child” and "Wearing Shoes”.
- Feature maps such as "My Child” and "Wearing Shoes”.
- the target focus may be "protecting children.”
- the robot mother uses segmented imitation and finds that in the existing environment, it is usually to further identify risk factors. As a result, she adjusted the parameters of the motivational assignment system and increased the environmental range for recognition. This time, she found "football” next to the child. The child and the football give each other a relatively high activation value, so their activation value will be higher than other activation values and become the focus of attention. After searching for the memory, Mother Robot found a number of best matching memories.
- the activation value in the relationship network will also decrease with time. If some concerns have not been dealt with for a long time. They may be forgotten. If the activation value in the relationship network fades slowly over time, too much activation information will interfere with each other, making the machine unable to reasonably find the target focus. At this time, under the motivation of saving energy, the machine mother may instinctively refresh the activation value. This method is to convert the current key information into output information, but it may not be output, but to transfer this information to input information, reactivate the key information, and make the non-key information speed up the forgetting. This is the thought process. This is a way for the machine to highlight key information.
- speech and text are usually connected with all the attributes of the concept.
- the attributes of a concept are all feature maps of the concept. These feature maps obtain activation values from each branch of the relationship network, and they are all transmitted to voice or text, so the usual focus is on the conceptual voice and text. Therefore, in this machine's self-information filtering method, the intermediate output is usually speech, because this is the most common output method. The energy used by the machine to output them is the lowest. Of course, this is closely related to a person's growth process.
- the machine In order to highlight the key information, the machine usually emphasizes the key information one or more times, so that the activation value of the non-key information and the recalled memory will fade.
- the mother outputs and inputs the message “the child is going to go out to play football” once, so that the activation value of other information is relatively reduced.
- the message "the child is going to go out to play football” becomes prominent in the relationship network.
- the "safety requirement” tends to prevent the child from playing football because the child may have been injured in the game.
- some experts said that this is in line with the goal of "building physical fitness" and that young people should exercise more. Therefore, the choice of which response needs to be evaluated by the profit and loss system.
- the mother's final output is "I hope that the child will bring an umbrella when he goes out to play football.” But based on past experience, her mother omitted the subject when only herself and the child were in the house. She also realized that the child was going to play football. Based on experience, she did not need to repeat this information. In previous memories, the mother gave instructions and the child would follow them. Therefore, this time the mother issued the instruction and also referred to previous experience, thinking that the child might follow it. Otherwise, driven by her own motivation, the mother will not issue instructions, but will adopt other methods. It is based on experience to judge that the child will comply, so when choosing motivation, the choice with the highest activation value is "give him instructions.” As a result, the robot mother may finally output a voice like "bring an umbrella".
- the child After the child receives this information, through information recognition, the correct vocabulary of the voice is determined. After translating, imitating my own past memories in sections, and adding up the message from my mother, the subject should be myself.
- the child’s thinking process may be driven by his instinctive motivation, and it is to comply with the mother’s purpose, agree to bring the umbrella, organize the output through segmented simulation, and evaluate the loss of income after wearing the umbrella.
- This "empathy” method is "translocation” thinking, which is also the feature of machine intelligence proposed in the present application. This information is performed in parallel or serially, or in a mixed manner, and a virtual output is transferred to the input, and then the input information will pass the gain value and the loss value to the gain symbol and the loss symbol through chain activation. After the input, the child sees a high loss value.
- the child's purpose of identifying the mother is to let himself explain why he did not bring an umbrella. So the child thinks that it is better to obey based on his own motivation. It was also passed in the assessment of gains and losses, so the children organized language and began to express their own reasons.
- the purpose of the child's response is to make the mother understand herself. Because based on long-term experience, after the child explained before, the mother understood herself. If in the child’s life, the mother seldom understands herself, then the child finds that the explanation is unable to achieve her goal based on experience, and the benefit is very low, so the child’s choice of response may be silence.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种实现通用人工智能的方法。它提出只需要提取事物之间的相似性关系、时间关系和空间关系,在记忆和遗忘机制的优化下,就可以建立起事物之间的关系网络。它提出了把本能动机、收益和损失评估等信息也作为输入信息,并保持于记忆中。机器只需要整理记忆中的关系,用这些关系重组记忆,利用收益和损失评估系统来选择响应,并通过模仿来实现响应,就可以建立起类似于人类的通用人工智能。
Description
本发明申请涉及人工智能领域,尤其涉及通用人工智能的实现方法。
当前人工智能还处于专用人工智能阶段。这样的人工智能只能应用于单一领域,难以把所学的技能应用于各种场景,也无法产生类似于人类的通用智力。当前人工智能的训练和响应过程和人类的学习、思考和决策过程存在巨大差异。比如深度学习中通过优化系数来寻找误差最小的多层映射。机器对中间层的特征是随机选择,通过误差函数来约束。为了确保机器能选出合理的中间层特征,需要极其大量的数据来训练,而且训练好的模型,难以迁移到训练领域之外运用。目前流行的深度卷积神经网络,虽然通过滤波的方式去掉了部分细节,从而帮助机器得到更加合理的中间层特征选取,但它依然需要大量的训练数据。机器最终的判断依据有可能是基于某些人类不会注意到的细节,所以训练的模型有可能很容易被欺骗。目前的知识图谱工程,通过在大数据中提取文本或者概念之间的关联,来帮助机器搜索时联系不同事物。但这些关系缺乏量化,缺乏一种方法来帮助机器利用这些关系来自我学习、自我总结,并通过所学知识在日常生活中应用,来达到自己的目的。这些方法和人类的学习方法差异很大,无法产生类似于人类的通用智能。
而本发明申请认为机器的智能应该基于信息理论,而不应该基于数据处理方法,数据处理方法是为信息理论服务的。所以本发明申请提出的学习方法,是模仿人类学习过程,通过记忆整理、记忆和现实重组和对重组后的信息模仿,在机器的动机驱动下,机器逐步获得从简单到复杂的从输入到输出的响应,从而表现出和人类相似的通用智能。这些都展现了本发明申请提出的机器学习方法和目前业界已有的机器学习方法存在巨大差异,目前业界还没有与本发明申请相似的学习方法。
发明内容
本发明申请提出了一种新的学习方法和实现步骤,下面做具体说明:
语音、文字是人类后天的产物,语言之外的信息才是我们天然的学习工具。比如通过图像来认识世界,存在天然的优势。一,图像存在天然的相似性。通过相似度对比,机器可以对图像自我分类。二,图像在生活中存在天然的逻辑关系,比如水和河流。三,通过图像学习,存在天然的大数据可以利用。比如日常生活,图像无处不在。这样机器可以在通过日常生活,自然而然的学习,这个学习过程类似于人类的学习过程。尽管通过图像学习是进化带给我们的产物之一,但通过图像学习也存在天然的劣势。一,数据量过大。二,细节过多导致概括性差。三,没有和其他传感器输入信息联系起来,比如语音、文字、触觉、嗅觉等。四,很多概念没有图像来代表,比如热爱、恐惧、道德等抽象概念。
为了充分利用图像学习的优点,克服图像学习的缺点,本文提出了一种提取图像的特征图,并以特征图为基础的学习方法。类似于图像,我们对其他传感器也提取特征,并把这些特征和图像特征图同样对待。
为了描述事物之前纷繁复杂的关系,在本发明申请中,我们只需要提取3种关系:相似性、时间关系和空间关系。这就大大的简化了对事物之间纷繁复杂的关系的提取。机器认为相似的事物之间存在关系;同时出现在同一空间中的信息,彼此存在关系;同时出现在同一空间中的信息关系,构成一个横向的关系网络;不同的横向的关系网络之间,通过把相似的信息连接起来,就构成了整个关系网络。在关系网络中的关系,随重复出现的次数增加而增加;关系网络中的关系,随时间的增加而递减;通过这样的机制,我们就把那些能够重复出现关系总结出来,并形成认知。代表认知的就是关系网络。
所以,本发明中,处理信息的过程,就是把输入信息翻译成机器能够理解的特征图序列,然后使用关系网络和记忆库来处理这些特征图序列,然后把处理后的特征图序列翻译成需要的输出形式,比如语音、文字或者动作输出。
(一)本发明申请中涉及到的概念定义。
为了简明的说明本发明申请的主要步骤,我们首先定义本发明申请中涉及到的概念。 后面在应用这些概念时,本发明申请不再进一步解释。
底层特征:是指机器在事物之间,通过寻找局部相似性而得到的一些普遍存在于事物之间的特征。比如对图形而言,这些底层几何特征主要包括局部边缘、局部曲率、纹理、色调、脊、顶点、角度、曲率、平行、相交、大小、动态模式等普遍存在于图形中的局部特征。对于语音而言就是普遍存在于语音中的音节特征。对于其他传感器输入也是做类似处理。底层特征是机器自主地通过局部相似性而建立的。在使用这些底层特征的过程中,机器可以通过关系提取机制(比如记忆和遗忘机制),也可以通过人为干预对它们进行增减。
特征图:在有了提取底层特征能力的基础上,我们通过关系提取机制,在多个类似事物、类似场景和类似过程中提取其中的共有底层特征组合,这些共有特征组合就是特征图。特征图可以是图像底层特征图、语言底层特征图和其他传感器底层特征图,可以是静态的,也可以是动态的。比如机器把每一次提取到的底层特征组合保留下来后,采用记忆和遗忘机制增减:那些在每一次提取过程中重复出现的底层特征增加记忆值,而那些不能重复的底层特征被逐渐遗忘,使得这些每一次提取的底层特征组合构成的多个简图最终只保留下共有底层特征组合。
概念:多个特征图构成的局部网络就是概念。概念包含多个特征图和这些特征图之间的关系。概念包含的特征图不一定相似。它们有可能是不同特征图,通过记忆和遗忘机制而连接起来的。
连接值:在本发明申请中,认知网络中两个特征图之间可以建立连接。这些连接是有方向和大小的。比如特征图A到与它存在关联的特征图B之间的连接值为Tab。同样,特征图B到与它存在关联的特征图A之间的连接值为Tba。Tab和Tba都是实数,他们的值可以相同或者不相同。
关系提取机制:能够在多个类似图像、类似场景和类似过程中提取其中的共有特征组合的机制,就是关系提取机制。关系提取机制包括但不限于目前本领域已有的各种形式的多 层神经网络、基于规则、逻辑分析、监督或者半监督学习等方法,也包括本发明申请中提出的记忆和遗忘机制。
记忆函数:是指某些数据随重复次数增加而增加。具体的增加方式可以采用一个函数来表示,这个函数就是记忆函数。需要指出,对不同类型的数据可以采取不同的记忆函数。
遗忘函数:是指某些数据随时间和训练时间增加而递减。具体的减小方式可以采用一个函数来表示,这个函数就是遗忘函数。需要指出,对不同类型的数据可以采取不同的遗忘函数。
记忆和遗忘机制:在本发明申请中,对数据使用记忆函数和遗忘函数,就是记忆和遗忘机制。记忆和遗忘机制是本发明申请中广泛使用的关系提取机制。
认知网络:认知网络是由不同概念通过共有特征图而形成的网络。它是双向连接的多中心星形网络。认知网络的本质是机器对过去所有记忆,通过记忆整理而形成的网络。认知网络在本发明申请中可以是单独的网络形式。也可以是隐含在整个记忆库中的关系。
采用双向连接值的目的是因为特征图之间的连级关系并不是对等的。所以我们需要采用双向连接值来表述。采用双向连接值的另外一个原因是:在链式激活中,当一个节点传递激活值给另外一个节点,并将其激活后,为了避免两个节点之间反复彼此激活,我们采用在同一次链式激活过程中,A到B传递后,B到A的反向传递就被禁止。
链式激活:当信息输入时,机器搜索认知网络和记忆库,找到相应的底层特征,并根据动机来赋予其激活值。当某个节点(i)被赋予一定的激活值(实数)。如果这个值大于自己的预设激活阈值Va(i),那么节点(i)将被激活。它会把激活值传递到和它有连接关系的其他特征图节点上。传递系数在认知网络中是连接值的函数,在记忆库中是传递线两端的记忆值的函数。如果某个节点收到传过来的激活值,并累计上自己的初始激活值后,总激活值大于自己节点的预设激活阈值,那么自己也被激活,也会向和自己有连接关系的其他特征图传递激活值。这个激活过程链式传递下去,直到没有新的激活发生,整个激活值传递过程停止,这个 过程称为一次链式激活过程。
链式激活是一种搜索方法,是寻找和某些底层特征组合最相关的特征图的一种方法。也是寻找和某些特征图最相关的概念的一种方法。也是用于寻找和某些概念最相关的一段或者多段记忆(经验)的一种方法。也是寻找和某些动机最相关的概念的一种方法。所以链式激活的方法本质上是一种搜索或者查找方法,它可以被能够实现类似功能的其他搜索或者查找方法代替。
认知网络中连接值是一个0~1之间的实数。0代表没有连级关系。1代表对等连接关系。比如物体的名称和特征图之间连接值通常就是1。这些连接值是各自代表中心特征图的能力,它们彼此之间并没有限制。比如这里没有一个概念节点周边的连接值之和必须为1的限制。需要指出,这里连接值采用0~1之间的实数,目的是避免链式激活过程中,链式激活过程出现不收敛的现象。这是因为在我们的实施例中,我们采用最简单的乘法作为传递函数。如果采用其他传递函数,连接值可以采用其他的区间范围,但选取的总体约束是:传递出去的激活值,需要小于发起激活节点的激活值。这样才能保证链式激活过程最终能停止下来。
凸显:当对输入的底层特征在认知网络或者记忆库中完成搜索后,如果有一个或者多个特征图获得一次或者多次标记,在认知网络或者记忆库中“凸显”出来。机器就把这些特征图作为可能的识别结果。并用它们来组合和分割输入特征,来比较输入特征组合和搜索到的特征图之间的整体相似性,作为进一步判断相似性的标准。比如在采用链式激活作为搜索方法时,如果某些特征图的激活值比整个认知网络的激活值噪声底高出预设阈值,那么我们就认为这些特征图被“凸显”出来。认知网络的激活值噪声底可以有不同的计算方法。比如机器可以依据场景中大量的背景特征图节点的激活值作为激活值噪声底。机器也可以采用目前被激活的节点的激活值平均值作为噪声底。机器也可以采用自己预设一个数字作为激活值噪声底。具体的计算方法需要在实践中优选。这些计算方法只是涉及到基本的数学统计方法,对本领域的从业人员而言是公知的知识。这些具体实现方法不影响本发明申请对方法和步骤 的框架权利要求。
镜像空间:机器进入一个环境后,通过提取图像、语言和其他传感器输入的底层特征来识别具体的事物、场景和过程。并把在记忆中找到的同类事物特征、场景特征和过程特征和现实中相似部分重叠,于是机器就能够推测目前事物、场景和过程暂时看不见的部分。包括事物被遮挡的部分,包括场景被遮挡的部分,包括一个过程没有被机器看到的前后部分。由于事物的大小是特征图的内容之一,所以机器也使用视野中的具体事物的大小和特征图中事物正常的大小相比较,来协助机器建立环境中的景深。这也是通过记忆来帮助理解信息的过程。机器通过对把现实环境的自身视角和记忆环境的第三者视角的相似部分重叠,建立的重叠空间来确定自己和环境的位置关系。所以机器对自己在环境中的位置,是同时具有第一人称视角和第三人称视角的。这就是这样的重叠空间被称为镜像空间的原因。
机器在识别了输出信息的特征图后,通过特征图调用记忆,来建立镜像空间。机器随后通过分段模仿,来重组记忆和输入信息,组成新的信息序列来理解输入信息和建立输出响应。这也是新记忆的产生过程。机器存储新记忆的过程,也是存储镜像空间的过程,存储的内容不是对输入信息的录制,而是存储提取的底层特征和它们更新后的记忆值。
记忆帧:在镜像空间中,每发生一次事件,机器就把这个镜像空间做一个快照,保存下来。保存下来的内容包括镜像空间中的底层特征和它们的记忆值,这就是记忆帧。镜像空间中发生一次事件,是指镜像空间中底层特征组合和前一个镜像空间相比较,发生了超过预设值的相似度的改变,或者镜像空间中底层特征的记忆值发生了超过预设值的改变。
记忆存储:记忆存储是指机器对整个镜像空间的存储,包括所有提取到的底层特征和它们的组合关系(包括相对位置关系),以及这些底层特征所拥有的记忆值。
记忆库:记忆存储形成的数据库就是记忆库。
临时记忆库:记忆库可以是多个下属记忆库的组合。这些下属记忆库可以采用不同的记忆和遗忘曲线。临时记忆库可以是下属记忆库之一,其目的是对记忆存储的缓冲和对需要 进入长期记忆的材料进行筛选。
本发明申请中,我们采用有限容量的堆栈来限制临时记忆库容量的大小,并采用记忆和遗忘来维护临时记忆库。临时记忆库通常采用快速记忆和快速遗忘的方式,来对准备放入长期记忆库中的材料进行筛选。机器在面对大量的输入信息时,那些已经习以为常的事物、场景和过程,或者远离关注点的事物、场景和过程,机器对它们缺乏深入分析的动机,所以机器可能不去识别这些数据,或者赋予给它们的激活值很低。机器按照事件驱动的方法把信息存入临时记忆库时,机器对新特征或者新特征组合赋予的记忆值和其激活值正相关。那些记忆值低的记忆有可能很快就从临时记忆库中被忘记,而不会进入长期记忆库。这样我们只需要把那些我们关注的信息放入长期记忆库,而不用把每天琐碎的、不需要再提取连接关系的事物都记忆下来。另外,因为临时记忆库容量有限制,所以临时记忆库也会因为堆栈容量接近饱和而被动加快遗忘速度。
关系网络:关系网络是指存在于记忆中的特征图彼此之间的关系构成的网络。它是机器提取输入信息的相似性、时间关系和空间关系后,并通过记忆和遗忘机制优化后的产物。它的表现形式可以是带连接值的认知网络,或者是带记忆值的记忆网络,或者是两者的混合形式。
关注点:关注点就是机器通过输入信息,在关系网络中找到一到多个和输入信息最相关的特征图。比如采用链式激活搜索方法时,激活值最高,并能凸显的一到多个特征图。
目标关注点:机器根据自己的动机,选取用来组织输出的特征图就是目标关注点。
分段模仿:分段模仿的本质是一个使用记忆和输入信息重组的过程,是一个创造的过程。它利用记忆中的一些片段和局部,和输入信息一起组织成一个或者多个合理的过程。记忆中能长期存在的内容通常是经常使用的内容,比如经常使用的常用语、常用动作或者常用表达组织方式等。这些经常使用的组合相当于事物、场景和过程的过程框架,它们是通过记忆和遗忘机制优胜劣汰而形成的。机器借用这些过程框架,增加上自己的细节,就构成了形 形色色的新过程。机器利用逐步分段模仿这个新过程来理解输入信息和组织输出响应。
(二)本发明申请中概念之间的关系。
整个智能体系分为三个大的层次;第一个层次是感知层,它是通过相似性为标准来建立特征图,对输入信息做简化;第二个层次是认知层,它识别那些能重复出现的,在类似事物、场景和过程中共有的部分和共有关系,它是建立时间和空间关系的过程,它和相似性一起组成关系网络;第三个层次是应用层,它利用关系网络作为词典,来做特征图之间的翻译;它利用关系网络作为语法,来把输入/输出信息从一种形式翻译成另外一种形式;它利用关系网络来重组记忆和现实的信息,来理解输入信息,来组织输出响应;也利用关系网络和记忆来在多种可能的输出响应中权衡利弊,做出选择。它同时也是实现记忆和遗忘机制的过程。
在本发明申请中,机器的本能动机是作为一种持续输入的信息来处理的;在机器处理信息中,机器的本能动机是默认的输入信息;机器的本能动机是一种预置动机。在本发明申请中,机器对收益和损失的评估结果作为一种默认的输出,并使用收益符号和损失符号来分别代表收益和损失,把它们存储于记忆中。每段记忆中,具体的收益和损失符号每次获得的记忆值和它们获得的收益值和损失值正相关。
(三)本发明申请中通用人工智能实现步骤。
图1为本发明申请提出了一种实现通用人工智能的主要步骤。这些步骤是本发明申请的第一方面,这里对图1中的步骤做进一步详细说明:
步骤S1:建立特征图库,建立提取模型。机器通过寻找局部相似性来建立底层特征图库,并建立提取这些底层特征图的算法模型。这是数据处理的前期准备过程。
步骤S2:提取底层特征。机器对所有传感器的输入信息做底层特征提取,并按照底层特征和原始数据相似度最高的位置、角度和大小,来调整底层特征的位置、角度和大小,把它们和原始数据重叠放置,这样就能保留这些底层特征在时间和空间上的相对位置,并建立镜像空间;这个步骤是对输入信息的简化过程。
步骤S3:识别输入信息。机器寻找关注点。这个过程是识别输入信息,去掉歧义,并做特征图翻译的过程。它类似于在语言翻译过程中,利用上下文来识别信息源发出的信息词汇,并把识别出来的词汇,翻译成另外一种语言的词汇。
步骤S4:理解输入信息。机器把关注点组织成一个或者多个可以理解的序列。这个过程类似于语言翻译中,把目标语言的词汇,利用语法重新组织成可以理解的语言结构。这个步骤采用的具体方法是分段模仿。
步骤S5:选择响应。机器把翻译后的输入信息,加入自己的动机,寻找目标关注点。机器利用关系网络和记忆建立对输入信息的响应;并使用收益和损失评估系统对响应做评估;直到找到能通过评估系统的响应为止。这是机器在趋利避害的原则下,做各种输出的预设,并评估收益和损失。
步骤S6:把响应转换为输出格式。机器把选择出来的序列,通过分段模仿,转化为输出形式。
步骤S7:更新数据库。机器根据步骤S1、S2、S3、S4、S5、S6中对数据的使用情况,按照记忆和遗忘机制对特征图、概念、关系网络和记忆进行更新。
以上步骤中,S1和S2是对信息的简化,它的本质是:“某些方面相似的东西,可能在其他方面也相似”,这就是相似性关系基本假设。我们的大脑正是采用相似性来给事物分类,这是人类与生俱来的能力。分类的作用是经验的泛化。比如某个东西是能食用的,和它看上去、闻上去相似的东西可能也是能食用的。没有这种能力,就不可能产生智力。所以在S1和S2步骤中,我们通过寻找类似事物中的局部相似性,来建立底层特征,用于对比事物之间的相似性。
由于这个世界上没有两样事物是完全一样的,所以相似性对比是一个去掉细节,比较核心信息的过程。所以,我们通常需要把输入信息做预处理,把事物的轮廓(特定的边缘)、动态(变化模式)、纹理等信息提取出来作对比。这也是进化带给生命的礼物,因为我们正是 处于这样一个世界:那些轮廓、动态模式和纹理等属性相似的东西,其他方面确实很可能也相似。如果我们处于一个物体的形态、纹理和动态模式可以任意变化的世界,我们可能需要发展出不一样的大脑思维方式,也包括不一样的底层特征提取方法。同理,在那样的世界,我们也需要发展出不一样的人工智能。在有了先天对比相似性的基础能力后,人类才通过后天的学习,建立语言符号,不断的总结经验,把人类分类的能力向更概括、更细微两端推进,并给这些分类赋予语言符号来代表。也用语言符号来表示这些概念之间的关系,这就是我们能领先于动物的地方。
人类的智力是一种进化的结果。我们的祖先,在没有语言符号产生之前,他们探索世界时,一定是使用图像、声音、气味等基础传感器给他们的信息来认知这个世界,并通过这些信息来对比相似性。在本发明申请中,我们采用同样的方法,把所有输入的信息,重新还原到我们祖先的思维方法上去,进行信息处理。这是因为进化过于漫长,相比祖先,我们并没有进化出新的底层信息处理方式,而是通过语言,在底层信息处理方式的上面,增加了一层底层信息和语言之间的转换工具。通过语言这个层次,我们把看上去、听上去、闻上去、品尝上去等并不相似的信息建立某种联系,并把这种联系传承给我们的后代。
想想说着不同语言的民族甚至不同的人种,但他们思维相通、行为类似,这些都表明我们的底层思维是和语言无关的。这也是本发明要建立底层特征图的目的。底层特征图是提取图像、声音和其他传感器输入信息中的相似性,来建立分类,并使用这些分类,来代表不同类别的信息。它们和语言无关,它们的目的是把输入信息中可以简化的部分简化掉,为后续信息处理做预处理。
我们认为事物出现的时间和空间是一种关系。这样的关系也是显而易见的,因为存在时间关系的事物,常常和同一个过程相联系。比如一只野兽冲向我们的祖先,这时不仅仅有野兽图像,还可能有野兽的运动模型,还可能有特定的声音,还可能有特定的环境信息,这些信息同时进入我们的祖先的信息处理系统,经过多次类似的处理后,我们的祖先就会把这 些能够重复的,同时出现或者存在时间次序的信息联系起来,作为经验,来更好的适应环境,寻求生存。同样,事物同时出现的空间也是一种关系。比如鱼和水,洞穴和动物,太阳和白天等信息。这些信息之所以能同时同地出现,是因为它们确实有内在的联系,所以它们才会同时同地出现。我们的祖先通过记忆和遗忘,来总结这些关系,把那些能反复出现的信息之间联系存入长期记忆,提高了生存机会。所以记忆和遗忘机制也是进化带给我们的礼物。
所以在本发明申请中,我们采用镜像空间,来保存信息。镜像的意思是我们保存的是对外界的镜像数据,使用的是底层特征代替原始数据,把底层特征按照和原始数据最相似的位置放置,这就保留了相似性关系。镜像空间也存储机器本身的一些信息,比如动机,比如收益和损失计算结果。在本发明申请中,我们是把这些信息使用一种底层特征符号来代表的,所以把它们也按照其他底层特征一样对待。
当我们采用镜像空间的方式来存储信息,就是把我们所得、所感的信息上的时间关系和空间关系,和信息一并存储下来。再通过记忆序列中的相似信息把每段记忆串接起来,也就把相似信息的时间和空间关系串接起来了,这就构成了一个立体的关系网络。但我们还必须寻找那些共性关系,而去掉那些不能重复的干扰信息。而完成这一步的方法就是记忆和遗忘机制,这是一个去伪存真的过程。
这样,我们不为事物之间纷繁复杂的关系所干扰,而直接通过三个要素:相似关系、时间关系和空间关系,并使用重复性来量化这些关系,把事物之间纷繁复杂的关系简化,建立起它们之间的关系网络。
如果我们把记忆看作是一个包含了无数底层特征图的立体空间,那么关系网络,就是这个空间中的脉络。这些脉络的出现,是因为记忆和遗忘机制,那些不能重复出现的关系被遗忘了,而那些能重复出现的关系得到加强。那些通过粗大的关系脉络连接起来的特征图就组成了概念。概念是一个局部网络,它连接同类信息的图像、语音、文字或者其他任何表达形式。由于这些表达形式频繁出现在一起,并频繁相互转换,所以它们之间的连接更加紧密。 关系网络中还有一些能重复出现的组合,它们之间的联系没有概念那么紧密,但我们可以通过模仿这种组合来使用它们,我们称它们为过程框架。如果把记忆看着是一个立体的产品存储仓库,那么概念就是这些产品中频繁使用的小部件,而过程框架就是一些中间件,而一段具体的记忆就是一个产品。小部件、中间件和那些各种零件共同组成了整个记忆仓库中的所有产品,它们在产品中广泛存在。而识别出它们的就是记忆和遗忘机制,体现它们的就是关系网络。
在S2步骤中,为了提高效率,我们只需要识别我们感兴趣的区域,采用适合我们预期事物的识别精细程度就可以了。这就是在S1和S2中,我们使用大小不同的数据提取窗口来反复提取数据的底层特征的目的。在S2中,我们感兴趣的区域和采用的识别精度都来自于机器的本能动机和继承动机,它们是机器在自身需求和活动目标的双重作用下来建立的,在后续会做详细说明。
正是因为我们需要保留事物之间的相似性、时间和空间关系,所以我们采用一种称之为镜像空间的方法,来建立这个大量底层特征构成的立体空间。这些底层特征包括:所有外部信息的传感器输入,包括但不限于视频、音频、触觉、嗅觉、温度等;也包括所有内部信息,包括本能动机状态、利益损失评估结果、重力感应和姿态感应信息等。本能动机的不同状态可以用情绪来代表。每一次记忆中,本能动机都是一种给输入信息赋予初始激活值的底层特征。本能动机是一种预置动机,但它的参数受到收益和损失评估结果的调整。它的不同状态,反应了机器的一种情绪,也一并存储在镜像空间中。那么当我们通过多个镜像空间重组时,每个空间都带有自己的情绪,也带有自己的收益和损失评估结果。机器自然就可以采用加权求和的方式,来预估重组后的镜像空间带个我们的情绪反应和带给我们的收益和损失的评估结果。
重组中使用的各种组件,它们联系着众多各自原先的记忆,这些记忆会因为组件的激活而被链式激活,这就是联想。机器调用镜像空间时,是和处理传感器信息类似的方法来处 理这些记忆信息。所以,同样可以采用视差,采用事物之间相对大小的方法,来建立景深,来把这些数据建立一个立体的图像序列。机器是以第三人称视角来观看这些记忆的,所以它能把自己或者他人带入自己创建的虚拟镜像空间中的角色。带入的方法就是:1,自己来处理虚拟空间中自己面对的情况。2,自己来处理虚拟空间中,他人面对的情况。而处理的方法,就是把这些情况作为一种假设的输入信息,来走自己平时处理传感器输入类似数据的流程。
因为重力感应是持续输入的信息,它存在于所有记忆之中。它和记忆中的所有事物都有连接关系,并且这些关系由记忆和遗忘机制来优化。这些图像和重力感应之间的方向关系是广泛的存在于这些记忆中,所以我们会对上下颠倒非常敏感,而对左右颠倒却没有那么敏感。这是因为上下颠倒导致我们找不到熟悉的特征图组合方式,使得我们不得不提高注意力进行第二次识别,在第二次时,我们可能通过扩大记忆搜索范围,或者通过角度旋转来找到对应的特征图,这要求我们付出更多的注意力,这就是我们对上下颠倒如此敏感的原因。
当我们处于现实环境中,我们调用镜像空间时,是把镜像空间和现实空间重叠,或者是调用多个镜像空间中的局部,来和现实空间的局部重叠。这样,我们就能根据被借鉴的镜像空间的其他部分,来了解现实空间中目前看不到的部分。这里面包含:1,包括空间上暂时看不到的部分,机器可以通过想象(镜像空间调用)而补上这一部分。比如,柜子里面的图像。2,包括在时间上看不到的部分。比如故乡的食物引发了我们对故乡的记忆。这是一种记忆利用过程。在S4、S5和S6步骤中,我们会大量的运用这样的方法,来理解输入信息,来选择符合我们目标的响应,来建立输出响应。
镜像空间中数据的具体存储方式,是底层特征按照和原始数据最匹配的组合方式,每发生一个事件就存储一次数据。可以近似认为底层特征是2维数据压缩,而事件存储机制就是一种数据在时间上的压缩。它们也可以被其他数据压缩方法代替或者部分代替。但无论哪种方法,都必须保留事物的相似性、时间和空间关系。同时,我们还把对应时刻的机器本能动机状态、机器利益损失评估结果、机器重力感应和姿态感应等机器内部信息也一并存储。 这些存储在镜像空间中信息,包括外部信息和内部信息,都是带有自己的记忆值的,它们也遵守记忆和遗忘机制。大量的这种按照实际顺序存储下来的镜像空间,就是记忆。机器按照事件驱动的方式来记录,就是说只有镜像空间上发生了一个“事件”,机器才需要再次记录镜像空间。而镜像空间中发生一次事件,是指镜像空间中底层特征组合和前一个镜像空间相比较,发生了超过预设值的相似度的改变,或者镜像空间中底层特征的记忆值发生了超过预设值的改变。机器在调用记忆时,机器通过双目视差,通过特征图的相对大小,通过关注区域的大小,来重新构建合适大小的立体图。
S3步骤的目的是寻找关注点。寻找关注点的方法很多。比如通过相似度对比在记忆中寻找底层特征图,每找到一个就对其做标记。当记忆中某一个底层特征组合包含的标记达到预设阈值,于是就认为它可能是对应的特征图候选者。机器参照这个特征图的整体来对输入底层特征做分割,并进一步比较两者之间的特征组合方式的相似性。这个过程不断进行下去,就能找到所有的特征图候选者。然后根据这些特征图候选者彼此间的连接紧密程度,在多个候选者对应一个输入的情况,选用和其他信息连接最紧密的特征图作为最可能特征图,它们就是关注点。这个过程既可以在所有底层特征处理完后再根据标记和连接关系来确定关注点,也可以在任何特征图达到预设标准时优先识别。
除了相似性对比,本发明申请中提出另外一种方法:链式激活方法。这是本发明申请中提出的一种基于关系网络搜索特征图、概念和相关记忆的方法。在关系网络中,当特征图i被赋予初始激活值,如果这个值大于自己的预设激活阈值Va(i),那么特征图i将被激活,它会把激活值传递到和它有连接关系的其他特征图节点上;如果某个特征图收到传过来的激活值,并累计上自己的初始激活值后,总激活值大于自己节点的预设激活阈值,那么自己也被激活,也会向和自己有连接关系的其他特征图传递激活值,这个激活过程链式传递下去,直到没有新的激活发生,整个激活值传递过程停止,这个过程称为一次链式激活过程;在单次链式激活过程中,但特征图i到特征图j发生激活值传递后,特征图j到特征图i的反向传递 就被禁止。
需要进行链式激活时,机器通过给提取到的底层特征,按照自己的动机给输入底层特征图赋予一个初始激活值。在单次初始激活值赋值中,这些初始激活值可以是相同的,这样可以简化初始值赋值系统。这些节点在得到初始激活值后,会启动链式激活过程。在链式激活过程完成后,机器选取激活最高,并能凸显出来的特征图,把它们作为关注点。这个方法充分利用了关系网络中的关系,是一种高效率的搜索方法。
但这里需要特别指出,由于存在激活阈值,所以即使特征图之间传递系数是线性的,特征图的累计函数也是线性的,但由于激活阈值的存在,无论是在单次链式激活过程中,还是在多次链式激活过程中,相同特征图和相同初始激活值,但因为激活次序选择不一样,最终的激活值分布是不一样的。这是因为激活阈值的存在带来的非线性。不同的传递路径,带来的信息损失是不一样的。激活次序选择的偏好,这相当于机器个性的差异,所以在相同输入信息下,产生不同的思考结果,这个现象和人类是一致的。
另外,关系网络中的关系强度和最新的记忆值(或者连接值)是相关的。所以机器会有先入为主的现象。比如拥有同样的关系网络的两个机器,面对同样一个特征图和同样的初始激活值,其中一个机器突然处理了一条关于这个特征图的输入信息,那么这个机器在处理了额外的这条信息后,它会更新关系网络中的相关部分。其中某一个关系线可能会按照记忆曲线增加。这个增加的记忆值在短时间内不会消退。所以在面临同样的特征图和同样的初始激活值时,处理了额外信息的机器,将会把更多的激活值沿刚刚增强了的关系线传播,从而出现先入为主的现象。
另外,为了合理地处理信息输入的先后次序,确保后面输入的信息带来的激活值,不会被前面的信息所屏蔽,在本发明申请中,链式激活中的激活值,会随时间而递减。因为如果关系网络中的激活值不随时间消退,后面信息带来的激活值变化就不够明显,这会带来信息间干扰。如果激活值不消退,后面的信息输入后,会受到前面信息的强烈干扰,导致无法 正确的寻找自己的关注点。但如果我们完全清空前面信息的记忆值,那么我们又丢失了前后两段信息可能存在的连接关系。所以,在本发明中,我们提出采用渐进消退的方法来实现前后段信息的隔离和连接之间的平衡。这个消退参数需要在实践中优选。但这带来了维护一个信息的激活状态的问题。如果我们在S3中找好了关注点,但在S4步骤中,迟迟无法完成信息理解,或者在S5中,迟迟无法找出满足机器收益和损失评估系统的响应方案,随时间流逝,这些激活值就会消退,导致机器遗忘了这些关注点,忘了自己要干什么。这时机器需要把这些关注点的激活值再次刷新。一种刷新方法是:把这些关注点转变成虚拟输出,再把这个虚拟输出作为信息输入,走一遍流程,来强调这些关注点,这就是我们在思考时,为什么有时候,不理解时或者找不到思路时,喜欢喃喃自语,或者自己在心中默念。另外,在这种情况下,如果出现新的输入信息,机器不得不打断思考过程,去处理新的信息。所以,从节省能量的角度看,机器是倾向于完成思维,避免浪费的。这时机器可能会主动发出“嗯…啊…”等缓冲辅助词来发出输出信息,表示自己正在思维,请勿打扰。还有一种可能是给予机器的思考时间有限,或者信息过多,机器需要尽快完成信息响应,这时机器也可以采用输出再转输入的方式。通过一次这样的方式,机器就强调了有用信息,抑制干扰信息。这些方式在人类普遍使用,在本发明申请中,我们也把它也引入机器的思维。机器可以根据内置的程序,或者自己的经验,或者两者混合,来确定是不是目前的思考时间超过了正常时间,需要刷新关注信息,或者告诉别人自己正在思考,或者强调重点,排除干扰信息。
另外,为了正确的确定特征图和特征图之间的连接强度,一种方法就是:同一个特征图发出的连接值强度彼此之间没有限制,但在激活过程中,为了正确的处理特征图和它的属性之间的关系,特征图的激活值传递函数可以考虑归一化传递。假设特征图X的激活值为A,它所有发出方向的连接值之和为H,它向特征图Y的传递值是Txy,那么一种简单的激活值传递就是Yxy=A*Txy/H。其中Yxy为X特征图向Y特征图传递的激活值。
由于人类交流最频繁的是语音和文字,所以一个概念的局部网络中,语音和文字通常 和这个概念中所有属性相连。概念的属性就是概念的所有特征图,这些特征图可能包含记忆中很多类似的图像,都和一类图像连接的各种语音、气味和触觉等等。这些特征图从关系网络的各个支路获得激活值,并都向语音或者文字传送,所以通常的关注点就是概念的语音和文字。所以,机器的自我信息过滤或者强调的方法,虚拟输出通常是语音,因为这是最常见的输出方式。机器输出它们耗能最少。当然,这和一个人的成长过程密切相关。比如,从书本中学习生活的人,有可能是把信息转变成文字,再重新输入。
通过S3步骤,机器找关注点后,机器进入S4步骤。在S4步骤中,机器需要把关注点转变成图像特征图。这个转变过程就是概念翻译。概念就是彼此连接紧密的局部关系网络。在这个网络中,可能存在语音、文字和代表一个概念其他形式信息。对人类而言,除了语言外,其他信息都是保留其原始形态的,比如图像、感觉和情绪等。机器要翻译的主要就是语言。所以,机器使用和对应语言联系最紧密的特征图代替语言,就可以把语言翻译成对应的特征图。比如把“幸福”这个语音转换为“幸福”这个概念下,能代表幸福的典型记忆。
然后,机器需要把这些特征图组合起来,并成一个可以理解的序列。基本上,S4步骤就是把代表输入信息的图像特征图(包括静态特征图、场景特征图和过程特征图)做适当的次序调整,并通过增减部分内容,形成一个合理的序列。而调整的依据就是模仿记忆中这些信息的组合方式。
这个过程就好像仓库管理员,把输入的图纸,找到对应的零部件,然后模仿以前的产品(就是多段记忆),把这些对应零部件组合起来。然后来理解这个图纸的目的。理解时,先根据这个图纸,找到需要的零部件(这就是概念翻译)。然后看看这些零部件以前是怎么组合在一起的(这就是寻找相关记忆)。机器可能发现这堆零部件中,有一些零部件的组合频繁出现的以前的各种产品中(就是记忆中,那些通过记忆和遗忘机制保留下来的,类似事物中的共有特征图组合)。于是,机器优先选择那些包含输入信息最多的大部件,然后参考最大概率,把其他零部件组合起来。有些零部件可能组合成另外一个大部件。有些零部件可能是附加到 其中的大部件上。这些组合方式都是通过参考关系网络,按照零件之间、部件之间关系、大部件之间连接关系最强的方式去组合,最终形成一个产品(类似于记忆中的一段虚拟过程)。
机器面对这个自己创建的虚拟过程,机器把这个虚拟过程作为一种信息输入,利用关系网络来搜索和这个虚拟过程相关的记忆。通过把这些记忆也纳入目标响应的选择范围中,机器通过收益和损失评估,就能选择出符合自己动机的响应。和虚拟过程相关的过程包括:以前自己在面对类似的过程,自己的状态时什么,然后自己的响应是什么。以前自己发出类似的过程,别人的状态是什么,被人的响应是什么。这些都可以通过记忆找到,并把这些记忆纳入目标响应的组织范围。具体为:1,通过回忆以前信息源发出类似信息时的状态,理解信息源在信息之外的隐含信息。信息源发出这个信息时的状态,包含了信息源为什么发出这个信息。2,通过自己以前收到类似信息后,做出的响应,来推测信息源的目的。信息源发出这样的信息,一定是基于机器以往在这个信息下的响应,这就是信息源预期的目的。否者,信息源没有必要发出这个信息。3,通过机器在什么状态下会发出类似的信息的记忆,对信息源进行“共情”。就是调用自己发出类似信息的状态,来进一步理解信息源更多可能的隐含信息。4,通过自己发出类似信息后,收到的反馈,来评估如果满足信息源的预期下,带给自己的收益和损失结果。机器把这4类记忆都纳入相关记忆池中,并使用相关记忆池中的零部件来组合成自己的各种可能响应,并使用收益和损失系统来评估这些响应,从而选择符合自己目标的响应。
在关系网络中,零部件的组合关系就是关系网络中的重要脉络。它们在语言中,就是语言的常用语、常用词汇和常用句型。它们在动作中,就是一个动作过程中的共有关键步骤。比如买机票、去机场、安检和登机等关键步骤。这些步骤是通过记忆和遗忘机制,在一次次学习中,忘记细节并记住共有特征而形成的。这些关键步骤包含的时间和空间信息,它们是机器建立响应时可以模仿的过程框架。人类通过语言,给很多过程框架增加了语言符号。所以,这些过程框架,从表面上看,是用语言来进行组织的。但其底层的组织关系,依然是图 像特征图。机器需要把这些语言代表的过程框架(有些过程框架可能没有一个概念来代表,但可以用多个概念来代表,它们就是需要一句话或者一段话才能表达的信息),展开成对应的过程特征(就是这个过程中关键步骤的特征图),进行模仿。比如“去机场”的概念所展开的特征图可能是开车去机场,或者打的去机场等过程中,经过记忆和遗忘机制,忘记了具体细节,只保留下来的几个象征性图片式的特征图。这些象征性图片式的特征图,就会激发相关记忆,让我们进一步把这个概念展开,比如模仿过去的记忆,开始网约车,开始准备行李等等。在准备行李时,以前的记忆是使用箱子,但这次没有箱子。于是机器需要搜索在类似情况下整理行李的所有记忆,如果在随后的步骤中没有建立起来满足自己的收益和损失评估要求的响应,机器就再次展开更多的记忆,来扩大可以模仿的范围。机器对大过程框架下的每个概念都做类似处理,并参考相同部分在每个记忆中的时间和空间信息,把它们组合起来。如果无法组织起来,就进一步扩大记忆,展开概念。这个过程迭代进行,最终构成一个塔型的模仿结构,这个塔型模仿结构就是虚拟过程。机器通过评估这个塔型的模仿结构带来的收益和损失,来决定是否选用这个结构来模仿并给出响应。而用来组合这个塔型结构的零部件,初始阶段就是我们前面提及的通过4个方面调用记忆而得到的相关记忆池,然后随着概念的展开,这个记忆池的内容不断增加,被关注的内容也在不断改变。而那些进入了记忆池的记忆,也会认为被使用了一次,按照记忆曲线增加记忆值。所以这些零部件,因为它们是各种过程中的关键步骤,所以经常被调用。而反过来,这些零部件,也因为它们记忆值高,不容易被忘记,而容易被找到。所以,这是一个正反馈强化过程。这个过程就是本发明申请中提出的分段模仿过程。
需要指出,语言输出就是一个分段模仿过程。当机器模仿以前的语言经验来做出语言响应时,由于具体场景的差异,机器只能借鉴以前语言经验中的部分经验(零部件)。而这些能够被频繁模仿的语言经验就是常用句型、常用语和习惯用语。因为它们是存在于大量语言中的共同部分,比如语言中的连接词、助词、叹词、常用词汇、常用句型等,是可以在众多 情况下被模仿的对象。这些对象被一次一次的使用,并按照记忆曲线增加激活值,最终也变成过程框架。在做出响应时,机器模仿这些框架,然后扩大记忆,来把细节安装到这些框架上,就构成了语言输出。
如果机器在后续的S5步骤中,无法建立合理的响应。有可能是在S4步骤中,组织了错误的信息,还有可能是在前面任何步骤中出现差错。这时机器进入对“无法理解信息”流程的处理。也就是说,“无法理解信息”本身就是一种对信息的一种理解结果。机器根据自己的经验,建立对“无法理解信息”的响应。这些响应可能是置之不理,可能是再次提取底层特征,可能是再次识别特征图和建立关注点,可能是重新选择响应等。
在S5步骤中,机器需要根据对信息的理解,加入自己的动机,从各种可能的响应中,按照趋利避害的原则,选择出满意的响应。这一步,是机器思维中最复杂的一步。机器大部分的思维时间都用在这一步中。机器根据对输入信息的理解:信息源的目的和状态,自己的目的和状态,环境的状态和从4个方面寻找记忆而建立的初始记忆池,开始创造出各种可能的响应,然后从中挑选出一个合理的响应来对外输出。
而挑选的方法就是基于人工给机器预置的本能动机和机器对各种响应的收益和损失评估,按照趋利避害的方式来挑选响应。人的动机,从本质上说,就是维持自己良好的生存状态。那些对本能动机有利的,就是“利益”。那些对这个目标有损失的,就是“损失”。人类在出生后,本能的监测系统就开始运转,不断判断“利益”和“损失”。比如“奶”能够满足孩子的本能需求,那么这是一种“利益”。被责骂意味着对生存的威胁,这是一种“损失”。获得拥抱和关注是一种对“安全需求”的满足,是一种“利益”,而被忽视则是一种“安全需求”不能被满足的“损失”。随着学习的展开,孩子还可能总结出“食物”是一种利益,“钱”是一种利益,“支配权”是一种利益,“良好的人际关系”是一种利益,这些都是在本能动机基础上发展出来的,为本能动机服务的。同样,我们把这个机制引入到机器智能中。让机器把遵守“机器公约”作为一个收益和损失评估的基本标准。在机器的学习过程中,机器每次 存储记忆镜像空间时,会同步存储这一段记忆的收益和损失评估结果:收益值和损失值两个数字。
我们不可能去告诉机器哪些可以做,哪些不能做。这些需要通过机器自己学习来理解。我们只需要给机器预置一个代表收益的表达和代表损失的表达,或者进一步,再增加一些不同强度,在机器学习过程中,通过预置的方法告诉机器收到的是收益还是损失,它们的大致强度就可以了。当然,也可以预置和机器自身状态传感器数据相关的收益和损失,比如被撞击、缺电、有水侵入等信息连接损失;比如缺电时充电、维护自身数据在安全区间等信息连接收益。机器把代表收益和损失的两个符号存入它们被赋值的记忆中,并且把收益和损失值按照正相关赋值作为它们的记忆值。由于镜像空间中的事物彼此之间存在关系,这种关系和它们彼此的记忆值相关。那些一次次出现的同一个记忆中的收益和特定特征图之间,就通过记忆和遗忘机制,不断加强了连接关系。对损失也是类似的处理。显然,由于收益得到的记忆值就是正比于收益值,损失得到的记忆值就是正比于损失值,那些巨大的收益和可怕的损失会让机器终身难忘,而那些小的收益和损失会随时间而被忘记。那些经常带来收益的事物和收益有更加紧密的连接,而损失也是样的。机器在评估自己的响应时,把虚拟响应作为输入在进入关系网络,自然就得到收益符号上的收益值和损失符号上的损失值。然后评估。这个评估程序可以是预置的,也可以根据学习过程中得到的反馈而进行调整。所以,机器可以做到牺牲小的收益,寻求后续更大的收益;也可以做到选择小的损失,避免更大的损失。这也为人类确保机器按照自己的意愿来思考提供了实现方式。比如,遵守“机器公约”是一种带来利益的目标,帮助主人是一种带来利益的目标,违反法律是一种带来损失的目标。
另外,机器也把本能动机给输入信息赋值时的参数设置记录到对应的记忆中。本能动机给机器赋值大小代表了是一种情绪。比如警觉度,戒备心,信任程度等。它受到两个方面的调控。一是机器自身的安全状态参数,这是先天的情绪。先天的情绪是预置。二是环境的因素,包括面对收益和损失后的响应,自己面对的环境带来的情绪,这是通过后天的学习获 得的。机器通过不断调整本能动机赋值系统,来尝试扩大收益和避免损失,并逐渐把满意的赋值参数和外界刺激联系起来因为它们都存在于同一个记忆中,所以采用记忆和遗忘机制就可以实现。机器的本能动机状态可以采用一种方式外显出来,提供一种额外的交流方式,这就是表情。
当机器准备做出响应时,机器首先从前面提高的4个方面寻找记忆,建立一个相关记忆池。机器通过分段模仿来建立各种可能的响应,并评估这些响应可能带来的收益和损失。机器评估收益和损失,只需要把自己建立的响应,做一次虚拟输入。输入后,通过赋予这些信息的初始激活值,激活完成后,自然就得到了收益和损失值。机器根据这些收益和损失值,来决定取舍。当然,收益和损失还可能处于中间过渡状态,难以取舍时,机器需要在输入信息中加入更多的记忆,从而打破这个平衡状态,来做出决策。这个过程可以迭代进行。
机器在完成了S5步骤后,机器进入S6步骤。S6步骤是一个翻译过程。如果在S5步骤中,机器选用的是语音输出,这就比较简单,只需要把准备输出的图像特征图转变为语音,然后利用关系网络和记忆,采用模仿类似的语言记忆来调整它们的次序。这就是一个参考语法书(关系网络)来组织词汇变成句子的过程。然后机器调用关于每个词语的发音经验和表达情绪的经验,把信息发出去。用比喻来说,这相当于仓库管理员对组装的产品,按照客户需求,做了一层外壳,然后直接航空发送出去了。
如果在S5中,机器选用的是动作输出,那么问题就会变得复杂很多。这相当于给客户交付的产品是组织起一场活动。在S5中,仓库管理员给出产品只是一个活动计划书,它可能有主要步骤和最终目标,其余都需要在实践中随机应变。
1,机器需要把准备输出的图像特征图序列作为目标(这是中间目标和最终目标),按照这些目标涉及到不同的时间和空间。机器需要对它们在时间和空间上做划分,便于协调自己的执行效率。采用的方法是通过选择时间上紧密联系的目标和空间上紧密联系的目标作为分组,因为记忆中镜像空间是带有时间和空间信息的,所以这一步可以采用归类方法。(这一 步相当于从总剧本改写到分剧本)。
2,机器需要把每个环节中的中间目标,再次结合现实情况,采用分段模仿的方法,来构成多个可能实现的图像序列,然后再次采用收益和损失系统,来挑选出符合自己的序列。然后机器把这个挑选出来的序列,作为新输出。这个新的输出是原来大的输出框架下的一个细分实现环节,只是整个输出中的一个小环节。(这是分剧本的实现过程,还是使用一样的流程。因为分剧本也是要求组织一场活动,只是目标是中间目标而已)。
3,这个过程不断迭代下去,每次采用的方法都是一样:通过分段模仿来找到可能的解决方案。然后通过收益和损失系统,来挑选出符合自己的方案。这是一个把大目标分解成小目标,然后分解成更小的目标,层层细分下去,直到分解到机器的底层经验能够直接实现的目标。(类比于在分剧本执行中,发现还是有无法实现的情节,需要再次做分剧本,再次走向更小中间目标的活动组织流程。这个过程不断迭代进行,直到完成最终目标。)一直要细分到底层经验:对语言来说就是调动肌肉发出音节。对动作而言,就是分解到对相关“肌肉”发出驱动命令。通过这样的方法,机器最终可以实施并完成一个响应过程。
4,在这个过程中,随时可能碰到新信息,导致机器需要处理各种信息,而这些原来的目标就变成继承动机。(这就相当于组织活动的过程中,不断碰到新情况,需要立即解决,否者活动就无法组织下去了。于是导演叫停其他活动,先来解决眼前碰到的问题。解决后,活动继续进行。另外一种情况就是在这个过程中,导演突然接到一个新任务,于是导演权衡利弊后,决定活动先暂停,优先处理新任务)。
S7步骤是贯穿于所有步骤中的新记忆空间的建立和关系网络的更新过程。它不是一个单独的步骤,它是在每个步骤中维护记忆系统的过程。它的核心就是记忆和遗忘机制。
还需要说明,这里的步骤划分,是为了方便说明整个过程。把以上方法重新划分成其他步骤,依然是属于本发明专利申请的权利要求的范围内。
在以上的步骤中,涉及到了特征图的建立、识别和优化,涉及到概念的建立、识别和 优化,涉及到寻找关注点,涉及到通过关注点去寻找最相关记忆,涉及到对一段或者多段记忆进行分段模仿,也涉及到记忆数据的筛选和存储过程。这些都是实现本发明申请中第一方面的具体手段。它们是本发明申请公开的第二方面。
本发明申请公开的第二方面,包括:
在本发明申请中提出一种特征图建立过程,包括:
机器在S1步骤中通过对比局部相似性建立底层特征,底层特征也是一种特征图。机器在S3步骤中,如果发现部分特征在关系网络中找不到匹配的特征图。机器把这些特征组合作为一个简图,存入临时记忆,并赋予其一个和激活值正相关的记忆值。通过上述两种方法建立的特征图还不是类似事物或者过程中的共有特征,还需要通过在学习大量的同类事物或者过程后,在关系提取机制的帮助下,那些共有特征最终变成长期记忆而保留下来。
在本发明申请中,提出一种特征图识别过程,包括:
机器通过对底层特征的搜索,在关系网络找到相关的特征图,然后对这个相关特征图做标记。那些被多次标记的特征图就可能是候选者。机器使用关系网络中的候选者对输入底层特征做分割,并比较两者的总相似度。如果相似度达到预设标准,机器就认为识别出了特征图。另外一种特征图识别过程是采用链式激活。通过给底层特征赋予初始激活值后,然后选择那些激活值高的特征图最为候选者。机器还是使用关系网络中的候选者对输入底层特征做分割,并比较两者的总相似度。如果相似度达到预设标准,机器就认为识别出了特征图。相比本发明申请前面提出的寻找关注点的方法,差异在于寻找关注点是直接通过底层特征找最相关特征图,这些特征图可能还是包含了底层特征的特征图(比如书桌的特征图像),也有可能直接就是语音或者文字(比如书桌这个发音)。
在本发明申请中,提出一种特征图优化过程,包括:
假设底层特征图A到包含它的上层特征图W之间存在连接关系,这个连接关系每使用一次就按照记忆曲线增加A在W中的权重。同时,所有的底层特征在特征图W中的权重都会按 照遗忘曲线随时间而递减。在这种情况下,如果A是特征图W代表的事物、场景和过程中的共有特征,那么就有可能被反复找到,从而获得更多的权重。这个过程不断进行,直到那些共有特征组合变成长期记忆,而那些非共有特征,其权重逐渐降低。这就是使用记忆和遗忘机制来优化特征图方法之一。具体来讲,在记忆库中,就是每次找到一个特征图后,按照记忆曲线增加其记忆值。在认知网络中,每使用一次关系线传递激活值后,就按照记忆曲线增加其连接值。
需要指出,由于机器采用了不同大小的窗口来提取底层特征,所以底层特征和自身的大小没有关系,那些很大的特征也可能是一种底层特征。比如一个桌子本身整体可能也是一个底层特征图。它不一定是其包含的局部特征图组合而成的。在使用小窗口提取特征图时,我们看到局部特征。在使用大窗口提取特征图时,我们是从整体上来寻找特征。所以我们判断一个桌子,既有可能从一个整体底层特征来判断,还有可能是从多个局部来判断,还有可能两者的组合。还有可能是先使用大窗口识别,然后使用小窗口来进一步识别。当然,这个过程也可以反过来进行。另外,在对比底层特征相似度时,需要考虑大小缩放和角度旋转。这些都是目前图像处理里非常成熟的算法,这里不再赘述。
在本发明申请中,采用链式激活,还可以在关系网络中搜索概念和相关记忆。机器根据自己的动机,对输入信息赋予初始激活值,并启动链式激活。由于链式激活会在关系网络中传播激活值,而每个特征图多次获得的激活值是累计的,所以一个特征图中,如果有多个启动链式激活的源信息向它传递激活值,那么它就可能因为多次累计激活值而获得高的激活值,这些拥有高激活值的特征图,就是关注点。通过给单个关注点赋予初始激活值进行链式激活,那些激活值高的节点形成的局部网络就是相关概念。包含相关概念中的特征图的记忆就是相关记忆。所以,机器可以使用链式激活搜索方法,去搜索那些和输入信息,包括虚拟的输入信息相关的记忆。比如,通过对输入信息的每个信息单元都赋予激活值,得到输入信息的关注点。然后对单个关注点赋值启动链式激活,找到多个相关概念。然后对相关概念中 的每个特征图赋予初始激活值,那些包含高激活值特征图的记忆,和那些包含多个激活特征图的记忆,就是我们需要放入记忆池的记忆。
在以上的步骤中,涉及到了关系网络。而关系网络的具体形式和建立过程,是本发明申请的第三方面。
在本发明申请中,提出一种关系网络的组织方式,包括:
A,认知网络和记忆库。
认知网络可以认为是记忆库中的关系网络中常用的一部分被单独存放,用于快速搜索目的。它和记忆库共同组成整个关系网络。这种方式适合地方大脑和中央大脑的组织形式。地方大脑使用认知网络快速响应,需要时才求助中央大脑。地方大脑的角色更像一个本地快速反应神经中枢,比如用于自动驾驶。
B,只有记忆库。
在这种组织形式中,没有单独的认知网络。所有关系包含在记忆库中。这种方式适合个体机器人。
C,分布式的认知网络、记忆库或者它们的组合。
机器可以采用数据分布式存储的方法,来建立上述认知网络或者记忆库。这种比较适合大型服务型知识中心。
D,共享式的认知网络、记忆库或者它们的组合。
机器可以采用数据共享式的存储方法,来建立上述认知网络或者记忆库。这种比较适合共享共建的开源知识中心。
在本发明申请中,提出一种关系网络建立方法,包括:
尽管事物之间的关系看上去纷繁复杂,难以分类和描述。但在本发明申请中,我们提出一种描述事物之间关系的方法:只需要提取事物之间相似性关系,事物之间时间和空间关系,而不需要去进一步分析其他关系。机器对比相似性来建立机器的自建分类,这就是特征图。机 器通过记忆和遗忘机制来提取事物之间的时间和空间关系,这就是记忆帧中的关系网络。记忆帧中的局部关系网络,通过网络间的相似事物(它包括具体事物、概念和语言等)连接起来,就构成了整个关系网络。
1,相似性关系的提取,可以使用相似性对比算法,或者使用训练好的神经网络(包括本发明申请中提出的引入了记忆和遗忘机制的神经网络)来进行。这里不再赘述。
2,事物之间的时间和空间关系的提取,是通过对记忆的整理来实现的。机器认为处于同一个记忆帧中的特征图彼此之间存在关系,两个特征图之间的关系强度是这两个记忆值的函数。这里的特征图包含了本能动机、收益和损失特征图、情绪记忆和所有其他传感器数据。所以机器不需要去区分各种关系的分类和紧密程度,也不需要去建立具体的关系网络。机器只需要按照记忆和遗忘机制,对每个记忆帧中特征图的记忆值维护就可以了。
3,认知网络是记忆库中的关系网络的提取。提取的方法就是:把每个记忆帧中的特征图先建立连接线,它们的连接值是每个连接线两端的特征图的记忆值的函数。然后对每个特征图发出的连接值归一化。这样就会导致两个特征图彼此之间的连接值不是对称的。
4,把记忆帧之间的相似特征图按照相似度的程度连接起来,连接值就是相似度。
通过上述步骤后,获得的网络就是从记忆库中提取出来的认知网络。后面,我们不再区分记忆库中的关系和认知网络,统称为关系网络。
(四)本发明申请公开中的其他说明。
需要指出,在本发明申请公开中,机器的学习材料也可以从自身记忆之外的材料获得,包括但不限于专家系统、知识图谱、字典、网络大数据等。这些材料可以通过机器的传感器输入、也可以采用人工方法直接植入。但它们在机器学习中都是作为记忆来处理的。所以这和机器使用记忆来学习不矛盾。
需要指出,在本发明申请公开中所提出的所有学习步骤并不存在时间分割线,它们是相互交织进行的,每个步骤没有先后之分。划分这些步骤是为了说明方便,整个过程也可以 划分成其他步骤。
还需要指出,机器对输入信息的识别和响应,除了和关系网络有关,还和“性格”有关。这里的“性格”是指机器的各项预设参数。比如激活阈值低的机器就喜欢产生联想,思考时间长,考虑的比较全面,也有可能比较幽默。临时记忆库大的机器容易记住很多“细节”。比如在做出决定时,激活值比激活值噪声底高多少就算“凸显”,这是一个阈值。这个阈值高的机器可能优柔寡断,而这个阈值低的机器可能更容易跟着直觉走。再比如两个节点特征图(可以是具体事物、发音、文字或者动态过程)有多少相似就算相似,确定了机器的类比思维的能力,这决定了机器是属于一本正经的个性,还是一个幽默风趣的机器。不同的记忆和遗忘曲线,不同的激活值传递曲线这些都带来机器不同的学习效果。
还需要指出的是,通过本发明申请所述方法,机器学到的认知和机器的学习经历密切相关。即使学习材料相同和学习参数设置相同,但学习的经历不同,机器最终形成的认知可能有很大差异。举例说明:我们的母语可能和特征图之间是直接连接。而第二语言,可能是先和母语连接,然后间接连接到特征图。在没有熟练掌握第二语言时,甚至可能是从第二语言到第二文字,再到母语文字,再转到特征图这样一个流程。当使用这样的流程时,需要的时间大大增加,导致机器无法熟练的应用第二语言。所以,机器也存在母语学习问题(当然,也可以通过人工植入的方法,直接让机器获得使用多种语言的能力)。所以,本发明申请所述的机器学习方法,除了和机器的学习材料相关外,还和机器的对这些材料的学习次序密切相关。
在本发明申请的基础上,是否采用不同的记忆和遗忘曲线,是否采用链式激活作为搜索方法,是否采用不同的激活值传递函数,是否采用不同的激活值累计方式,是否也采用记忆和遗忘机制之外的方法的其他关系提取机制,是否采用本发明申请中的数据存储形式,是否在链式激活中采用不同的激活阈值,是否采用不同的“凸显”阈值,是否采用不同的激活值噪声底计算方法,是否在多次链式激活时对节点采用不同的时间次序,是否在单次链式激 活时对节点采用不同的时间次序,每次选取关注点的多少,按照动机采用的不同赋予初始激活值的具体方式,甚至是采用不同的硬件配置(比如计算能力,记忆容量等),具体采用哪种母语进行学习,是否采用人工干预来获得的认知等,上述这些差异都是本发明申请中,提出的实现通用人工智能框架下的具体优选方法,都是可以通过本行业公知知识来实现的,这些都不影响本发明申请提出的权利要求。
图1为本发明申请公开的实现通用人工智能的主要步骤。
图2是建立底层特征图和提取底层特征图算法模型的方法。
图3是提取底层特征图的步骤。
图4是采用链式激活来寻找关注点的流程。
图5是对输入信息的理解过程。
图6是机器组织和选择响应的过程。
图7是一种认知网络的组织形式。
下面结合附图和具体的实施例对本发明申请作进一步的阐述。应该理解,本申请文本主要是提出了实现通用人工智能的新方法和实现这些方法的主要步骤。这些主要步骤中,每一个步骤都可以采用目前公知结构和技术方法来实现。所以本发明申请的重点在于说明这些新方法和实施步骤,而不是局限于采用已知技术来实现主要步骤的具体细节上。所以这些实施例描述只是示例性的,而并非要限制本申请文本的范围。在以下说明中,为了避免不必要地混淆本申请文本的重点,我们省略了对公知结构和技术的描述。本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请文本保护的范围。
步骤S1的具体实施方式如下:
图2是一种实现S1步骤的方法。S101是通过滤波器把输入数据分成多个通道。对于图像,这些通道包括针对图形的轮廓、纹理、色调、变化模式等方面做特定的滤波。对于语音,这 些通道包括对音频、音调等语音识别方面做滤波。这些处理方式和目前行业内已有的图像、语音处理方法一样,这里不再赘述。
S102是对输入数据寻找局部相似性。这一步是对每一个通道的数据,在数据中寻找共有的局部特征,而忽略整体信息。在S102步骤中,机器首先是使用一个局部窗口W1,来滑动寻找窗口内的数据中普遍存在的局部特征。局部特征对图像就是指那些普遍存在于图形中的局部相似图形,包括但不限于局部边缘、局部曲率、纹理、色调、脊、顶点、角度、曲率、平行、相交、大小、动态模式等普遍存在于图形中的局部特征。对语音就是相似的音节。其他传感器数据也一样,判断的标准就是相似性。
机器把找到的局部相似特征放入临时记忆库中。每新放入一个局部特征,就赋予其初始记忆值。每发现一个已有的局部特征,就对临时记忆库中的底层特征的记忆值按照记忆曲线增加。临时记忆库中的信息都遵守临时记忆库的记忆和遗忘机制。那些在临时记忆库中存活下来的底层特征,达到进入长期记忆库阈值后,就可以放入特征图库中,作为长期记忆的底层特征。长期记忆库可以有多个,它们也遵从自己的记忆和遗忘机制。
S103是逐次使用局部窗口W2,W3…,Wn,其中W1<W2<W3<…<Wn,重复S102的步骤,来获取底层特征。S102和S103中,一种局部特征提取算法就是相似度对比算法。这是很成熟的算法,这里不再展开。
在S1中,机器不仅仅需要建立底层特征图数据库,还需要建立能够提取这些底层特征的模型。在S104中,是机器建立的一种底层特征提取算法模型A。这种算法模型其实在S102和S103中已经使用了,它们就是相似性对比算法。
在S105中,是另外一种提取底层特征的算法模型B。它是基于多层神经网络的算法模型。这种模型训练好后,比相似度算法的计算效率要高。
在S105中,机器采用特征图库中的底层特征,作为输出来训练多层神经网络。训练过程中选择输入数据窗口和选择输出数据的窗口,需要差不多的大小。在S105中,神经网络 的实现形式,可以是包括卷积神经网络在内的多种深度学习网络,也包括本发明申请提出的引入了记忆和遗忘机制的神经网络。在S105中,训练神经网络算法模型的过程如下:
在S105中,机器首先使用局部窗口W1来提取数据来训练神经网络算法模型。
在S106中,机器再逐次使用局部窗口W2,W3…,Wn来训练算法模型,其中W1<W2<W3…<Wn。
在优化时,一种方法是每次增加窗口大小后,就在对应的前一个网络模型上增加零到L(L为自然数)层神经网络层。对这个增加了层的神经网络优化时,有两个选择:
1,每次只优化增加的零到L(L为自然数)层神经网络层;这样,机器就可以把所有网络模型叠加起来,构成一个有中间输出的整体网络。这样计算效率最高。
2,每次优化所有的层。这样机器得到的是n个算法网络模型。在提取底层特征时,它们需要都使用。
所以,在S107中,有两种算法网络模型。一种是多输出层的单个算法网络,其优点是运算资源需求小,但对特征的抽取不如后者。多个单输出算法网络模型需要的运算量大,但特征提取更准确。
需要指出,上述方法可以对图像、语音处理,也可以对任何其他传感器的信息采用类似的方法处理。
还需要指出,由于选用大小不一的窗口,所以提取的底层特征也可能大小不一样。有些底层特征可能和整个图像一样大。这样的底层特征通常是一些图像的背景特征图或者特定的场景特征图。有些底层特征图可能是动态过程,因为动态模式也是一种底层特征。
步骤S2的具体实施方式如下:
图3是实现S2步骤的过程。S2步骤需要达到两个目的:一是需要从输入数据中提取出包含的底层特征。二是底层特征需要保持原来在时间和空间上的关系。
S201是机器面对输入信息时,机器使用S1步骤获得的算法模型A或者算法模型B, 对所有输入传感器的输入信息进行底层特征提取。机器根据自己的动机,选择需要识别的区间和选择识别窗口W1的大小。机器是带有目的地去识别环境的。这些目的通常是上一个过程中,没有完成的目标关注点。这些关注点,在新的信息识别中,就成为继承动机。这些继承动机是一些特征图,机器是知道它们的部分属性,所以才会有目的地去确定特定的识别区间,根据预期事物的大小选择取数据的窗口W1。当然,也存在机器没有目的地监控环境,这时机器可以随机的选取识别区间和识别窗口大小。
S202是通过移动窗口W1来提取局部数据。把这些局部数据输入到S1步骤获得的算法模型A或者算法模型B中。机器通过这些算法模型获得底层特征。同时,由于使用窗口来检测局部数据,所以提取的底层特征位置也是确定的。机器需要采用类似的方法,同步把所有传感器输入数据提取底层特征,并保持所有输入信息在时间和空间上的相对关系。
S202提取的底层特征图进入后续的信息处理过程后,机器对这些信息产生的响应可能是:信息还不够确定,需要继续识别。这时,机器通过分段模仿,再次发起信息识别动作,对同样的区间,但使用更小或者更大的窗口来识别事物。这个过程反复迭代,直到机器通过信息处理,产生的响应不再是继续识别为止。
步骤S3的具体实施方式如下:
图4是采用链式激活作为查找方法来实现步骤S3的流程图,包括:
S301:机器在自己感兴趣的区间,使用W窗口提取数据后,使用S1步骤中的相似度对比算法A或者神经网络模型B来提取底层特征。然后在关系网络中使用相似度对比方法来搜索对应的底层特征。
S302:机器对找到的每个底层特征,按照动机赋予它们初始激活值。每一个底层特征获得的初始激活值可以是相同的。这个值可以通过机器的动机来调整,比如机器识别这些信息的动机有多强烈。
S303:每一个被赋予了初始激活值的底层特征,如果它的激活值超过预设的激活阈值。 那么它就开始启动链式激活。
S304:机器在认知网络中启动链式激活。
S305:所有底层特征链式激活完成后,那些一到多个激活最最高,并且能凸显的特征图就是关注点。
S306:机器在记忆库中启动链式激活。
S307:所有底层特征链式激活完成后,那些一到多个激活最最高,并且能凸显的特征图就是关注点。
其中S304/S305和S306/S307的过程是平行2选1的。它们分别是采用单独的认知网络来作为关系网络和没有单独的认知网络两种情况。
以上步骤是利用了认知网络中的信息之间的“距离”,让那些相关的信息彼此传递激活值而互相支持,通过激活值累计后凸显出来。这类似于语音识别过程。但存在两点差异:1,在这里的关注点,有可能包括一个概念的多个侧面,比如语音、文字和图像等,或者和多个输入特征图相关性高的其他特征图,它们有可能被底层特征同步激活。2,关系网络中包含了大量的常识。这些常识会帮助机器识别关注点,而不仅仅是借助语言之间的关系。
在比较输入和关系网络中的底层特征和特征图的相似性过程中,机器需要处理大小缩放和角度匹配的问题。一种处理方法包括:
(1)机器把各种角度的特征图都记忆下来:记忆中的特征图,是通过对每一次输入信息提取底层特征后建立的简图。它们是在关系提取机制下保留下来相似事物的共有特征。虽然它们彼此相似,但它们可能存在不同的观察角度。机器把生活中同一个事物,但不同角度的特征图都记忆下来,构成不同的特征图,但它们可以通过学习来归属于同一个概念。
(2)用所有角度的视图,重叠这些特征图的共有部分,模仿它们的原始数据,把它们组合起来,构成一个立体特征图。
(3)在机器内部嵌入对立体图像做大小缩放和空间旋转后的视图变化程序。这一步 是业内已经非常成熟的技术,这里不再赘述。
(4)机器在记忆中寻找相似的底层特征时,包括了在记忆中寻找经过空间旋转后能匹配的特征图。同时机器把目前角度的特征图存入记忆,保留原始视角。后续再次有类似视角的底层特征输入时,就能快速的搜索到。所以这种方法下,机器是采用了不同视角记忆和进行空间角度旋转相结合的方法来寻找相似特征图,这会带来我们对熟悉视角识别更快的现象。当然,机器也可以只使用空间角度旋转后进行相似度对比的方法。
在比较输入和记忆中的底层特征和特征图的相似性过程中,机器还需要处理立体景深的问题,因为机器需要把记忆数据重建为镜像空间。而建立三维镜像空间,需要景深信息。一种处理方法包括:
机器通过多通道输入的差异(比如双目、双耳输入的差异)来建立立体景深。
同时,机器也采用对输入特征图和记忆中特征图的大小对比,来辅助建立立体景深。
在实现S3的步骤中,可能会出现有些底层特征组合没有在记忆中搜索到,那么机器就把这些底层特征组合按照原始的空间和时间位置存入记忆库,并同时把它们的激活值按照正相关比例作为记忆值。这些底层特征组合在后续的学习中,通过关系提取机制来进行优化。这也是特征图的建立过程。
所以特征图来自于多个渠道:渠道1是S1步骤建立的底层特征,它们也是特征图。在S2步骤中通过不同大小的窗口提取的底层特征,并通过记忆和遗忘机制优化这些特征图。渠道2是在S3步骤中,碰到不能识别的底层特征组合创建而成,并被记忆下来的。在所有的步骤中,都可以通过记忆和遗忘机制来优化特征图。
前面的S1步骤是建立提取底层特征的能力,这是信息理解能力的前期准备。前面的S2步骤提取底层特征,这就是信息理解的开始。提取底层特征的目的是对输入信息去除部分冗余信息。在S3步骤中,利用了语言、文字、图像、环境、记忆和其他传感器的输入信息之中的隐含的连接关系,来相互传递激活值,从而让相关的特征图、概念和记忆彼此支持而凸 显出来。它和传统的“上下文”来识别信息的差异在于,传统的识别方法需要预先人工去建立“上下文”关系库。而本发明申请中,我们提出了“相似性、同时间、同空间信息彼此存在隐含的连接”这个基础假设。在这个基础假设上,简化了形形色色的关系,从而让机器自己去建立关系网络。它不仅仅包含语义,更包含本能和常识。
在S4步骤中,主要是利用关系网络把这些输入信息翻译成机器能够理解的语言,并把它们组织起来,形成一个图像序列。这样机器就能使用这个序列,在记忆中寻找类似序列相关的记忆。寻找自己收到类似序列后的响应,和收到类似序列时的状态。寻找自己发出类似序列后收到的响应,寻找自己发出类似序列时的状态。这就是从经验中寻求对信息的理解,从“共情”中进一步理解信息的过程。这些记忆进入记忆池,作为机器用来组织输出响应的原材料。在谈话中,发出信息的人和接收信息的人,很可能省略掉很多双方都知道的信息。比如共有的认知、经历和曾经讨论过的事情等。而通过上面4个方面的记忆搜索,这些缺失的信息就能补充上。
图5是输入信息实现信息翻译和信息理解的示意图。
S401中,机器在记忆中搜索每一个关注点转换后的特征图并建立起记忆池。一种实施方法是,对记忆库中找到的输入信息特征图赋予激活值,然后启动链式激活。链式激活完成后,那些包含的激活值之和比较高的记忆,就是需要放入记忆池中的记忆。
S402是寻找可能的过程框架。具体方法就是优先使用激活值之和最高的记忆,从这些记忆中提取出过程框架。这个过程具体操作可以是:去掉那些低记忆值的特征图。它们通常是细节,等待后续补充上更加符合目前输入信息的细节。去掉低记忆值特征图后,机器可能留下的是一些代表关键步骤的特征图,这些关键步骤相关的特征图按照他们原来的时间和空间关系,就构成了一个过程框架。机器通过对记忆池中的记忆,按照总激活值从高到低的方式,重复上述过程。机器最后得到多个过程框架。这一步,相当于仓库管理员在寻找可以匹配输入图纸的、现成的中间件。
在S403步骤中,机器把S402步骤中获得的可模仿部分组合成一个大的可模仿框架。由于每一段记忆提取的框架可能包含多个关注点对应的特征图,而这些特征图之间的时间和空间关系本来就存在于这些记忆中。机器可以通过在时间上和空间上重叠类似特征图,就构成了一个可以大的过程框架。这一步,相当于仓库管理员把中间件找到接口,彼此连接起来。还有一种情况是,有一些中间件无法和其他中间件连接起来。
在S404中,机器解决这个问题的策略是通过分段模仿的方法把代表过程框架的概念展开,这样展开的过程框架就会包含更多的细节。机器再次通过把相似的特征图重叠起来,来寻找到它们之间的联系。用类比来说明:如果两个框架无法连接起来,仓库管理员就会打开每个中间件的外壳(因为外壳信息接口点少),这是把代表中间件的概念向更细节的概念展开过程。举例说明:当机器收到“开车去机场接主人回家”的指令时,它可能正在商店里面购物。机器对这些输入信息通过链式激活,可能得到的关注点是:“开车”、“去”、“机场”、“接”、“主人”、“回家”的特征图,动态特征图或者是语言,它的环境信息包括了“商店”的特征图、“没有其他安排”、“带上买好的东西”等类似信息。它可以用直接寻找关注点时激活的记忆用于记忆池,也可以重新对找到的关注点赋予新的激活值后(比如对识别后的信息再次虚拟输入一遍,这些关注点就被赋予了新的激活值并再次链式激活),再次寻找相关记忆。通常两者记忆范围大部分是重叠的。机器通过去掉相关记忆中低记忆值(如果还有的话),获得的可能是一些比较概括,但广泛存在于生活中的过程框架:“…开车…”,“去机场”,“接主人…”、“回家”、“在商店…”等过程框架。这些过程框架可能是一串特征图序列中的关键特征图,而不是语言。显然,通过借鉴不同记忆中的组织方式,把“…开车…”,“…去机场”,“接主人回家”通过参考以往的记忆,通过展开这些记忆,很容易就能连接成一串代表“开车去机场接主人回家”的图像序列。因为根据过去的记忆,“开车”后才是“去…”的图像,它们可能是存在于“开车去上班”等记忆中。“去机场接主人回家”可能存在于机器以往“自己坐地铁回家”的记忆中,所以“回家”是在“机场”特征图之后。但这里存在一个机器目 前在商店这样一个信息,它无法和其他信息连接起来。于是机器再次展开“在商场…”相关的记忆,寻找和“开车去机场接主人回家”的信息之间的连接点。于是机器回忆起自己从商场去车库的记忆,并且发现“车”是这两段记忆的共同点。机器再次参考记忆,得到“要先去车库取车”然后“开车”的记忆。这样,机器就把整个过程连接起来了。如果机器需要去执行这个过程,那么每一个中间过程都是一个目标,所以需要采用同样的分段模仿来找到每个环节的过程框架,并再次细分下去。比如去车库,通过以前的记忆,或者参考以前的从其它地方去车库的记忆,建立了一个去车库的下层过程框架。在这个框架之下,可能还需要再次细分,把去车库分解成“找电梯”、“坐电梯”和“找车”等下层过程框架。每一次细分的依据都是分段模仿。分段模仿的两个核心,一是找框架,展开框架,这个过程可以迭代进行;二是模仿记忆,采用同类替换的方式,把记忆中细节换成现实中的细节。这样机器就可以从一些大的概念,通过逐步细化而建立一个塔型特征图序列。
步骤S5的具体实施方式如下:
机器在获得对输入信息的理解后,机器需要做出响应。机器做出响应的动力来自于机器的动机。机器和人类一样,是在“欲望”的驱动下,来对外界刺激做出反应。机器的“欲望”就是人类给机器预设的本能动机。本能动机是机器的预置的,比如“安全需求”、“目标达成”、“获取支配权”、“好奇心”等,可以去掉“繁衍”,加入“遵守人类法律”“遵守机器公约”等人类希望机器拥有的本能动机。在本发明申请中,机器的本能动机是一种默认的输入信息。所以本能动机参与了关系网络的方方面面。机器只需要根据自身预置的控制系统(比如监控电量,机器自身检测系统等)给出的信息,比如电量很少了这样的信息,通过预置算法,给本能动机赋予一个初始激活值。本能动机的激活值将会在关系网络中传播。它可能会改变关系网络中,最新的激活值分布情况,导致相同的输入信息可能带来不同的激活值分别。如果机器的本能动机赋值比较高,它就可能改变纯粹信息输入下的关注点,这时获得的关注点就是目标关注点。目标关注点反映了机器本能反应的信息。由于本能动机类型很少,赋值也相 对比较简单,所以可以采用预设算法来实现,并通过学习来获得调整经验。
本能动机是一种预置动机,它获得的初始激活值大小,反应了机器对输入信息处理态度。这些赋予初始激活值的大小,反应了机器此时的状态,比如很警觉,或者很放松,或者愿意处理事情,或者拒绝处理信息,它会影响机器在记忆中搜索记忆的广度和深度,从而带来思维差异。所以它是一种情绪反应。它的不同状态,反应了机器的一种情绪,也一并存储在镜像空间中。那么当我们通过多个镜像空间重组时,每个空间都带有自己的情绪,也带有自己的收益和损失评估结果。机器自然就可以采用加权求和的方式,来预估重组后的镜像空间带个我们的情绪反应。所以影响机器做出决策的,除了本能动机外,还有情绪,还有理智。而理智就是机器的收益和损失评估系统。机器的另外一种动机是继承动机。继承动机是机器还没有完成的目标。比如机器正在处于完成这些目标的过程中,又有新的信息输入,所以机器需要暂时中断正在进行的过程,来处理这些新输入信息。这时,机器在处理这些新信息过程中,是带有原来未完成目标的,这些未完成目标就是继承动机。继承动机是作为一种输入信息来处理,它不需要特别对待。
图6是S5步骤的主要步骤:
S501是机器寻找记忆中和类似于输入信息相关的记忆。这一步,可能采用把S4步骤中识别出来的信息做一次虚拟输入,在这次输入时,作为一种主动识别信息的输入,机器可以通过预置的赋值系统,对本能动机赋予更大的激活值。这一次,这些激活值可能带来的关注点和S4步骤中的关注点有差异,这一次的关注点是目标关注点。
机器通过目标关注点,建立和S4步骤类似的记忆池。机器在记忆中,对类似目标关注点的响应可能有很多种形式:比如可能是对输入信息置之不理,可能是再次确认输入信息,可能是调用一段输入信息提及的记忆,可能是对输入信息做出语言响应,可能是对输入信息做出动作响应,还可能是通过“共情”思维,来推测信息源的弦外之音。
S502是基于记忆值最高的记忆(经验)作为框架,建立虚拟响应,这是机器对输入信 息的本能响应。
S503是寻找和本能响应相关的记忆,用于收益和损失评估。
S504中,机器评估本能反应的收益和损失情况。
S505是判断过程。如果通过,机器就把这个响应作为输出。如果不能通过,机器需要针对带来利益最大的特征图和带来损失最大的特征图,扩大搜索相关记忆,并再次组织响应流程,目的是保留最大收益,排除最大损失。保留最大收益,排除最大损失,在这时会成为一个临时目标(趋利避害)先去完成(去找到怎么能排除损失,保留收益的途径)。这时原来的目标就成为一个继承目标。找到怎么排除损失,保留收益的方法后,机器继续组织虚拟输出过程,并再次进入收益和损失流程评估。直到完成选择为止。如果这个过程中,机器迟迟找不到合适的选择,它可能发出“嗯”“啊”等临时响应来告诉外界自己正在思考,请勿打扰。或者思考时间有点长,机器需要把S4步骤中理解了的输入信息,再次输入给自己,用于刷新关系网络中的关注点,避免遗忘了自己的思考内容是什么。机器还可能采用把S4步骤中理解了的输入信息,再次输入给自己,用于排除关系网络中的其他信息激活值,避免他们干扰。这些激活值可能是之前思考过程遗留下来的。如果通过上述方式,机器依然无法选择出合适的响应,于是机器就建立面对“无法做出响应”情况的响应。这时,“无法做出响应”就成为一种输入信息,机器走同样的S5流程来建立合适的响应。
我们举例来简要说明以上过程:假设在一个陌生的城市酒店房间里,机器收到主人发出的“出去买一瓶可乐拿回来”这样的指令。通过S2步骤,机器提取了很多底层音节输入和很多环境信息的底层特征。经过S3步骤,找到的关注点可能是:“酒店”、“出去”、“买”、“一瓶”、“可乐”、“拿回来”、“傍晚”、“电不多了”、“还没有交房费”等,并把这些特征图翻译成机器方便处理的形式,比如图像特征图。在S4步骤中,机器开始理解这些信息。通过把输入信息组织起来,机器建立了一个理解序列,包含了“出去”的图像特征图,“买”的图像特征图,“回来”的图像特征图,“一瓶”的图像特征图、“可乐”的图像特征图等,并建立了先 后次序。在S5步骤中,机器的初始值赋值系统,查询本能动机的状态(比如这个机器有没有因为前面的经历,进入了沮丧的状态),给S4中的信息序列赋予初始激活值,然后寻找到相关记忆。这是链式激活寻找方法。也可以通过对比相似性来寻找相关记忆,建立记忆池。在这一步中,机器可以提高本能动机的初始激活值,从而让本能响应能够优先被识别出来。
机器根据记忆中自己或者别人在类似指令下的响应,意识到主人是需要自己做出类似的响应。机器通过记忆中主人发出类似指令时的状态,通过比较识别到主人的状态。机器根据自己发出类型指令时的相关状态,通过类比就能够体会到主人此时的需求来源(生理需求,可能是渴了)和情绪,这是“共情”思维。
机器开始评估本能响应“出去买一瓶可乐拿回来”,发现处于收益和损失的边缘(因为这时自己的电量并不充足),于是机器再次寻找其他可能响应。有可能找到之前给主人拿可乐是从冰箱里拿出的,于是机器建立了一个“从冰箱里拿出可乐给主人”这样一个可能的虚拟输出过程,这个过程通过收益和损失评估时,结果非常好。于是机器开始继续深入评估这个过程。进一步评估的方法就是把虚拟输出再次作为输入来一遍。于是,机器把这个目标作为新的S5流程,把前面的目标“出去买一瓶可乐拿回来”包含的目标序列转化为继承目标。要实现“从冰箱拿可乐给主人”这个新目标,需要分解成“找冰箱”、“拿可乐”、“给主人”等其他目标序列。机器再次把“找冰箱”这个目标作为新的S5流程,再次评估选用什么方案来响应。这一次,有可能得到的收益和损失比最好的是“用眼睛看”。这是一个可以直接分解到底层经验的目标,所以机器可以开始执行这个动作,寻找冰箱,因为根据过去的记忆总结的经验,这是本次大目标的第一个必须实现的目标。
假设房间里面有冰箱,在找到冰箱后,“拿可乐”成为第二个目标。机器根据以往的记忆,比如自己说“主人,那儿有一个…”这样的语言,主人的响应是“把注意力转到那儿”。还有其他记忆中“主人走到他(她)注意到的地方”,还有“主人从冰箱里面拿可乐”等记忆,把这些记忆串起来,机器组织了一个虚拟的输出“主人自己去冰箱那儿拿可乐”。这个响应最 符合利益和损失(因为耗电最少)。机器再次根据经验,发现以前类似情况下,自己一般都是提醒主人一下,主人就可以把注意力转到那儿,而提醒的方式一般是用手去指。
于是在经过层层记忆调用,收益和损失评估后,机器最终通过模仿记忆,在S506步骤中确定了输出计划:用手指向冰箱,并发出“主人,那儿有一个冰箱”这样的语音。
步骤S6的具体实施方式如下:
S6步骤是机器对外输出。如果是语言输出,那么就是一个翻译过程和一个简单动作模仿过程(模仿过去经验发出音节或者输出文字)。如果输出的是动作过程,那么整个过程就非常复杂。这相当于一个导演组织一场活动,涉及到方方面面,下面举例说明。假设在上面的例子中,机器的响应是出去买一瓶可乐拿回来,我们通过这个例子来分析机器在动作输出下的简要流程。
机器并没有在这个城市、这个酒店、这个房间出去买可乐并拿回来的经验。所以它没有一段完整的记忆可以用于模仿。即使机器有这样的一段记忆,但由于外部条件变化(比如时间不同),或者由于内部条件变化(比如机器自身的日程安排)等原因,机器在模仿这段记忆时,就会发现记忆和现实不匹配。
于是机器开始建立分剧本。分剧本的标准就是在时间和空间上划分,使得模仿可以进行,并且有效率。机器分剧本的方式就是把每个计划目标(包括中间目标),作为单独的目标,来确定目前能模仿的目标。确定的方法可能是新的链式激活,也可以是记忆和现实相似度对比。显然,在这个计划目标序列中,和目前环境(酒店房间空间上)匹配的目标就是“出去”这个目标。于是机器开始把“出去”作为一个目标来实现。实现的方法就是:把“出去”作为理解后的信息,重新放入S5步骤中,去寻找各种可能的方案,并根据动机、根据收益和损失情况来做出决策。所以S5和S6步骤可能是不断交叉进行的。因为实现一系列的目标,就是一个不断对目标细分并实现的过程,它每一个过程都是一样的处理方式,但是迭代进行,层层细分,一直细分到机器的底层经验才可以具体执行。
举例说明:机器对这段指令的第一个模仿概念是“出去”这个概念。机器在模仿“出去”这个概念时,这个概念是一个很简化的框架,机器需要对“出去”这个概念进行细分。细分的方法是:机器把“出去”这个概念作为一个单独的输入指令,寻找关于“出去”这个图像特征图相关的记忆中,和目前情况相似的记忆。于是,机器建立了一个可以模仿的二级框架:走出门去。然后,机器开始对这个二级框架进行模仿。
在模仿这个二级框架时,机器可能发现第一个需要模仿的中间目标是“走到门那儿”。于是机器把“走到门那儿”这个概念作为一个单独的输入指令,寻找关于“走到门那儿”这个图像特征图相关的记忆中,和目前情况相似的记忆。于是机器建立了一个可以模仿的三级框架:门在哪儿。
在模仿“走到门那儿”的过程中,“门”成为一个中间目标。机器需要定位“门”的位置。机器通过“门”这个概念下面包含的各种特征图,在环境中搜索门。机器可以搜索关于这个房间的记忆,也可以是直接在环境使用S2步骤开始搜索,这取决于机器是否对整个环境做过特征图提取。
在定位了“门”的位置后,机器继续使用分段模仿,把自己的位置、门的位置和去门那儿作为输入的信息,和环境信息合并后,作为整体的输入,开始寻找最相关的特征图、概念和记忆。在模仿“走到门那儿”这个框架时,机器可能发现“走”。模仿“走”时,发现不匹配。因为自己是坐下的。所以通过一样的流程,建立“走”下面需要模仿的四级框架中第一个概念:“站立”。于是机器需要对“站立”这个概念再次细分。把“站立”这个指令变成一个可以模仿的五级框架。然后,机器开始对这个五级框架进行模仿。
在模仿“站立”这个五级框架时,机器可能发现需要模仿的概念是“从沙发上站立起来”。于是机器需要对“从沙发上站立起来”这个概念再次细分。在模仿“从沙发上站立起来”这个五级框架时,机器可能发现需要模仿的概念是“腿使劲”、“身体前倾”、“保持平衡”、“手伸开保护自己”等一串更加细节的目标。于是机器需要对每一个细节目标再次细分。然 后,机器开始对这些六级框架进行分段模仿。
在模仿“腿使劲”这个细节目标建立的六级框架时,机器通过把“腿使劲”作为一个指令,通过寻找类似情况下,记忆中相关经验,通过组合这些相关经验,把“腿使劲”这个目标变成对各个肌肉发出一系列驱动命令。这些驱动命令本身也是通过大量的、在类似环境下的模仿,通过强化学习和记忆和遗忘机制而得到的记忆。这些记忆经过反复的模仿,已经变成永久记忆,它们就是经验。我们在搜索和使用它们时基本都不会意识到这个过程。
在经过上述的步骤后,机器从沙发上站起来了。但机器并没有完成“出去”这个概念的模仿。机器通过对记忆的模仿,发现记忆中“出去”都是从“门”那儿出去的。到了“门”那儿,机器继续通过模仿记忆中“出去”的过程。这些过程中可能存在“打开门”的过程特征。于是“打开门”就是机器模仿的对象。但机器并没有在这个房间里面打开门的经验。于是,机器把“打开门”作为一个概念,在整个记忆中搜索。得到的最高的关注点可能是一个简化的“打开门”的过程特征,里面的图像可能还是基于打开自己家里房间门的图像。在这个记忆图像里,是通过按住门把手,然后旋转,然后向后拉动来开门的。但机器并没有在这个酒店房间的门上找到一样的门把手。
于是机器不得不再次使用分段模仿的方法。把“门把手”这个概念和目前现实环境合并作为整体输入,在记忆中寻找最相关特征图和最相关记忆。然后机器可能得到在这个房间的门上,有一个特征图获得很高的激活值,成为关注点。那么这个和“门把手”在门上位置相似,形状相近的东西,可能就是这个房间门上的门把手。这就是机器通过分段模仿,把以前关于门把手的经验用于现实环境,找到了门把手。然后通过门把手这个概念,通过记忆模仿,把以往使用门把手的方法移植到这个新找到的门把手上,这就是知识泛化的过程。机器按照记忆中门把手使用方法,开了门,并走出去,这就完成了“出去”这个概念的模仿。
以上的过程就是机器通过不断迭代使用分段模仿的方法,把一个由概念组成的框架过程一步步加入符合现实的细节,最终变成机器丰富多彩的响应过程。分段模仿的本质是机器 对概念的展开和类比。概念是从生活中提取而来,取自于生活。概念的运用又是在概念的框架下把概念展开,把记忆中的细节替换成现实中的细节,来模仿这些概念。概念包括特征图和过程特征、语言等组成的局部网络。它是机器用来组成过程的部件,是一种广泛使用的部件。概念可能有对应语言,也可能没有对应语言,概念可以对应一个词、常用语、一句话甚至一段语言,这种情况在不同的语言中还不一样。
在机器的分段模仿过程中,还有可能碰到各种新的信息输入的情况。比如在完成了规划去门那儿的路径后,机器开始模仿“走”这个动作来实现去门那儿的过程。在这个过程中,机器可能发现新情况:“在自己的规划路线上有障碍物”。那么,面对这些新的输入信息,机器在保持原来的目标情况下,把原来的目标暂停,进入处理新信息的过程中,而这些原来的目标,就变成新过程的继承目标。
机器在这一次面临的模仿框架和现实情况不匹配的问题时,碰到了新的输入信息。机器不得不从S2步骤来处理新的信息输入。这些信息是机器后面找到解决方案的基础。比如,机器需要分析障碍物的各种属性(比如大小、重量和是否安全等)。这一步需要走S2到S4的整个信息理解过程。然后机器根据自己的动机,来选择和实施解决方案。这一步需要走S5和S6的过程。
步骤S7的具体实施方式如下:
S7步骤是贯穿于整个S1到S6步骤中的,它不是一个单独的步骤,是对前面步骤中的关系提取机制的应用。
在S1步骤中,建立底层特征主要是使用记忆和遗忘机制。机器通过局部视野每发现一个相似的局部特征,如果特征图库中已经有相似的底层特征或者特征图,就按照记忆曲线增加它的记忆值。如果特征图库中没有相似的局部特征,就把它存入特征图,并赋予它初始记忆值。所有特征图库中的记忆值随时间或者训练时间(随训练样本数量增长)而按照遗忘曲线逐渐递减。最终那些广泛存在于各种事物中的,共有的简单特征会拥有高记忆值,成为 底层特征或者特征图。
在S2步骤中,每发现一个底层特征或者特征图,如果临时记忆库或者特征图库中已经有相似的底层特征或者特征图,它的记忆值就按照记忆曲线增加;临时记忆库或者特征图库中所有的底层特征或者特征图,遵从记忆和遗忘机制;在S2步骤中,机器首先把镜像空间存入到临时记忆库。机器在记忆库中存储这些镜像空间时,会同时存储镜像空间中的特征图和它们的记忆值,这些特征图的初始记忆值和其存储时的激活值正相关。在镜像空间中,只有当特征图激活值发生了超过预设阈值的变化时,才需要在镜像空间中更新记忆值。在镜像空间中,只有当镜像空间发生了和前一个镜像空间相比,相似度的改变超过了预设阈值时,才需要建立新的镜像空间。我们称之为发生了一个事件,这就是记忆存储的事件机制。
在S3、S4、S5和S6步骤中,在认知网络中,节点(包括底层特征和特征图)之间连接关系遵从记忆和遗忘机制;在S3、S4、S5和S6步骤中,记忆库中底层特征和特征图记忆值遵从记忆和遗忘机制;
在以上的步骤中,涉及到了认知网络和记忆库的组织形式,涉及到了具体的链式激活过程,涉及到了记忆和遗忘机制。这些内容的具体实现,是本发明申请中第一方面和第二方面提出方法的进一步细化。它们是本发明申请的第三方面。
本发明申请中,提出一种根据动机来给特征图赋予初始激活值的方法,包括:
在本发明申请中,机器的S2步骤是机器提取底层特征的步骤。机器需要根据动机来选择识别区域和使用的窗口大小。这里的动机来自于继承动机。比如之前的活动中,机器对信息的响应是“对特定区域进一步识别信息”,那么这个特定区域就是机器选择的识别区域。机器对这些特定区域进一步识别信息时,预期识别事物的尺寸大小,就决定了机器选用的窗口大小。机器按照本能动机来赋予提取到的底层特征初始激活值,根据对预期事物的收益和损失属性来调整这些底层特征的初始激活值。另外,本发明申请中,本能动机是作为一个底层特征来处理的,是一个频繁被激活的底层特征,它在记忆中和其他特征图广泛存在连接关系。比如 “安全需求”是机器预置的动机,这个动机在经验中,就可能扩展到“保护家人不受伤害”、“保护自己的财产”等经验。
所以,机器根据动机给特征图赋予初始激活值的方法,包括两个方法:1,机器的继承动机本身就是一种带有激活值的特征图,它们存在于关系网络中,机器在寻找目标关注点时,可能会选择它们,也可能不会选择到,这取决于激活值高低。2,在S3步骤中,那些底层特征被动机赋予的初始激活值,实际上来两个部分:一是自于本能动机对输入信息赋予的初始激活值,它们通常是机器根据动机强度赋予输入的一个统一初始值。二是来自于本能动机传播过来的激活值,这些激活值不是初始值。但它们会和初始值累计,从而让输入信息拥有不同的激活值。
一种认知网络的具体实施方式如下:
图7是一种认知网络组成形式的示意图。假设苹果的特征图编号为S42。假设苹果纹理是特征1,假设特征图编号是S69;苹果形状中某一段曲线是特征2,特征图编号是S88。……苹果的底层几何特征N,特征图编号是Snn。在图7中,S42是中心特征图。S69、S88到Snn是其他与S42存在连接关系的特征图。S42_S69/S42_S88/S42_Snn分布代表S42到S69/S88/Snn的连接值。
在图7中,第一个中心节点为S42,从中心节点S42到S69的连级值编号是S42_S69,从中心节点S42到S88的连级值编号是S42_S88。而在以S69为中心的数据条目中,S42是它的特征。S69到S42的连级值就是S69_S42。而在以S88为中心的数据条目中,S42是它的特征。S88到S42的连级值就是S88_S42。这样,S42,S69和S88就建立了双向的连接关系。因为我们采用图7这样的数据条目来存储认知网络,所以有时我们也称一个数据条目中特征图为认知数据库的索引,它所有的特征称为属性,其对应连接值称为属性的连接值。大量这样的数据条目就能建立起认知网络。而特征图和特征图编号可以通过表格的方式对应起来。
一种建立关系网络的具体实施方式如下:
关系提取机制应用于3个智能体系层:
1,感知层:感知层建立关系的唯一标准就是相似性。机器是通过对比相似性,把那些能够重复出现的相似性数据组合认为是底层特征;所以机器在S1的步骤中,提取底层特征采用的一种关系提取机制,可以是数据间的相似度对比算法。无论是图像,还是语言,还是其他数据,关于相似度对比的算法众多,都是很成熟的算法,这里不再赘述。在S1步骤中,获得的底层特征,需要放入特征图库,并按照记忆和遗忘机制来对这些底层特征取舍。在S2步骤中,机器也可以按照数据间相似度对比算法,来从输入数据中提取底层特征。在S2步骤中,机器可以采用的另外一种算法是使用神经网络模型。这些神经网络模型,可以是目前主流的任何神经网络算法,也可以是本发明申请中提出的引入了记忆和遗忘机制的神经网络算法。
2,认知层:这是在感知层建立的特征图基础上,通过学习来建立特征图之间的连接关系。所以它们建立的基础是记忆和遗忘,获得关系的方法是重复记忆,获得正确的关系是通过遗忘实现的。
3,应用层:应用层是对感知层和认知层中产生的成果不断应用,并按照记忆和遗忘机制来优化这些成果。在特征图库中,机器每发现一个底层特征或者特征图,如果特征图库中已经有相似的底层特征或者特征图,它的记忆值就按照记忆曲线增加;特征图库中所有的底层特征或者特征图,记忆值随时间或者训练时间(随训练样本数量增长)而按照遗忘曲线逐渐递减;在认知网络中,每当节点之间连接关系被使用过一次,对应的连接值就按照记忆曲线增加;同时,所有认知网络的连接值随时间按照遗忘曲线递减;在记忆库中,每当底层特征或者特征图被使用过一次,对应的记忆值就按照记忆曲线增加;同时,所有底层特征或者特征图的记忆值随时间按照遗忘曲线递减;
在本发明申请中,还提出了多种对现有神经网络改进的方法,具体实施方式如下:
本发明申请提出一种理解多层神经网络的工作原理的方法:
我们可以认为输入数据是在冲击函数坐标基底下的坐标分量系数。每一次层间变换,就是一 次信息表达方法的变换过程。比如,第一次变换,就是把输入的冲击函数坐标基底,线性变换到另外一个坐标基底。这个坐标基底是隐含的,可以改变的。这个坐标基底的坐标分量系数就是第一个中间层的线性输出(使用非线性激活函数之前)。如果两者维度相同,那么这两个坐标基底的信息表达能力是相同的,输入层到第一个中间层没有信息损失。
但多层神经网络的目的,就是把输入信息中的干扰信息(冗余信息)去掉,保留核心信息(有用信息),所以整个网络必须去掉那些干扰信息。而去掉这些干扰信息方法是通过坐标基底变换,把核心信息和干扰信息分别变换到不同的坐标基底上去,让它们成为不同坐标基底的分量。然后,机器通过抛弃那些代表干扰信息的分量,来去掉这些信息。这是一个降低信息表达的维度的过程。
为了达到这个目的,一种很方便的方式就是采用非线性激活函数。通过非线性激活函数来把部分坐标基底上的信息分量置零,比如ReLU函数,就是把一半的坐标基底信息去掉。比如各种变形ReLU函数,或者其他激活函数,它们的本质都是去掉部分坐标基底或者压缩这些坐标基底上的信息来实现去掉干扰信息的目的,比如Leaky ReLU,就是通过对一半的坐标分量进行信息压缩来去掉冗余信息。
每个中间层神经元输出,都可以看作是信息在一个对应的隐含坐标基底上的分量投影。多层神经网络的优化过程,就是优化中间层对应的坐标基底。每一层由于非线性激活函数,都会带来部分基底上的信息分量损失。所以激活函数的非线性、中间神经元的数量和层数之间是相互制约的。激活函数的非线性越强,信息损失越多,这时需要使用更少的层数,更多的中间层神经元数量来保证核心信息的层间传递不受损失。假设输入信息包含的信息量是X,输出信息包含的信息量是Y,中间层每一次映射的信息表达能力损失率为D,那么需要的层数就是L>ln(Y/X)/ln(1-D),其中L是需要的层数。
在坐标基底维度缩减到1/K的情况下,每层信息表达的能力缩减到1/K
2;需要指出,这里是指信息表达能力损失率,不是指信息损失率。在很多时候,针对特定的信息,维度过 高的坐标基底可能有冗余的维度。当这个信息从高维转向低维坐标基底时,如果去掉的都是那些冗余的维度,那么这个信息本身就没有损失。在坐标基底维度不变的情况下,假设R为非线性激活函数的输入输出取值范围之比,那么每层信息表达的能力缩减到1/R
2;所以在本发明申请中,可以通过上述约束条件来约束每一层到下一层的信息表达损失率,从而决定采用的激活函数、神经元数量和层数。
基于上述工作原理分析,本发明申请提出了多种对现有多层神经网络改进的方案,它们的具体实施方式如下:
(A)线性变换+降低维度。
本发明提出一种方法是:层间采用线性变换,但逐步降低每一层神经元的数量,这也是一种逐步降维的过程。但线性激活函数+去掉一些神经元,本质上依然等效为一个非线性激活函数。但这个等效激活函数可以是一种新的激活函数,甚至可以是难以用数学形式来表达的非线性激活函数。
(B)对输入数据做非线性变换预处理,去掉部分维度上的分量。
由于输入数据的坐标基底是已知的(可以看作多维冲击函数基底),机器可以对输入数据直接做坐标基底线性变换,然后按照预设方法来丢弃部分维度上的分量。这种对数据的预处理,可以看作是数据通过了一个非线性滤波器,非线性来自于主动丢弃了部分分量。它的目的是选取数据特征的某一个方面。不同滤波器的输出可以认为是不同侧重点的数据,可以分别进入S1步骤的底层特征提取模型中。
具体坐标基底变换形式需要根据实践来优选。丢弃的数据也需要根据实践来优选。具体的非线性滤波器形式可以人为设定(比如卷积就是一种这样的变换),也可以限定范围来让机器自己通过尝试来优化。由于线性变换本身是非常成熟的计算方法,这里不再赘述。
(C)对映射路径遗忘。
1,通过随机遗忘一些样本,来引入遗忘机制:
对总的数据样本库随机的分成多个组,每个组都是在总样本的基础上,随机放弃一些样本。每一组采用同样的参数进行优化。在所有的样本中,那些带来非共有特征映射的样本,因为是样本中的非共有特征,所以它们必定是少数派。在某些分组内,有可能使得一些带来问题的样本被随机放弃了。那么在这个组,带来问题的样本数量占比就可能急剧减小。那么通过这个组获得的网络,它在参数优化过程中,中间层使用的坐标基底最有可能变成正交基底,最终获得稀疏化的神经元层输出。再以这一组为基础,对所有样本或者其余样本纳入优化过程。
2,通过随机遗忘一些映射路径,来引入遗忘机制:
可以随机地将一些神经元输出权重系数w置零来实现,也可以随机地通过把特定神经元的偏置项b设置到很大的值,用来确保对应神经元输出为零来实现。
3,对映射路径的渐进遗忘,来引入遗忘机制:
在使用非线性激活函数时,通过对神经元输出的权重系数w,引入让它们在每次神经网络系数更新后,对所有权重系数w的绝对值减小一个delta值,然后再次进行权重系数优化。也可以通过对神经元输出的偏置项b,引入让它们在每次神经网络系数更新后,对所有偏置项b改变一个delta值,使得神经网络的输出向零靠近,然后再次进行神经网络系数优化。其中delta是一个大于或者等于零的实数;每一次减小的delta值可以是不同的值。
4,机器也可以随机遗忘一些神经元,这就是Drop-out方法。Drop-out方法不在本发明申请的权利要求中,这里不再赘述。
(D)对优化梯度正交化。
在多层神经网络中插入一个到多个线性变换层(或者弱非线性变换层);这些线性变换层(或者弱非线性变换层),它们可以采取不同的线性或者弱非线性激活函数;线性变换层可以在优化开始之前插入,也可以在优化过程中再插入。
这些线性变换层(或者弱非线性变换层)引入目的是:在保持信息不变(或者损失很 小)的情况下,增加神经元层(相当于增加坐标基底变换次数)让模型有机会选取正交坐标基底。由于正交基底的分量是彼此独立的,在优化过程中,如果信息有机会放到正交坐标基底上,那么对信息的每一个维度的优化就是彼此独立的,从而有机会走到全局最优点上。
另外一种方法就是通过对表达维度的限制来让机器尽可能选择中间正交坐标基底。由于正交基底意味着它们的维度彼此正交,在去掉冗余信息的过程中,很多维度上的坐标输出将是零。意味着神经元层输出的稀疏化,常常就代表其选择的隐含基底是正交化的。我们通过对神经元输出做限制,奖励其走向稀疏化,就是奖励其选用隐含的正交坐标基底。
在目前的神经网络中,由于每一层都信息损失,那么层数必然受到限制。那么信息被映射到正交坐标基底上的概率就会降低。当坐标基底不是正交时,改变一个坐标分量的系数,会同时影响其他坐标基底的分量系数,这就会带来优化问题,把优化引入局部最优点或者带来有用信息的损失。需要说明,以上方法的组合方法,依然在本发明的权利要求范围内。
本发明申请中,我们通过如下实施例来说明如何使用本发明申请中提出的方法来实现通用人工智能:
比如在一个下午时分,机器妈妈和机器孩子在家,机器孩子正准备出门和找朋友踢球。下面是他们的对话,以及在对话过程中的思维步骤。
环境信息是:时间是一个下午时分,环境是家庭客厅,天气晴朗,温度在20摄氏度,家里有妈妈和孩子两人,孩子正在穿球鞋….
妈妈通过S1步骤,已经具备了提取底层特征的能力。这种能力表现为:1,可以使用大小不同的窗口来选取输入数据,并在这些窗口中,通过对比输入数据和特征库中的底层特征数据的相似度,来提取底层特征。2,或者是:可以使用大小不同的窗口来选取输入数据,并对窗口数据使用已经训练好的神经网络来提取底层特征。
在家庭环境下,悠闲的时刻,机器妈妈的本能动机“安全需求”会按照预设程序,定期给关系网络中,代表本能动机的特征图赋予一定激活值。这个激活值的大小,是一种经验 值。这种经验值,是妈妈在生活中通过“响应和反馈”的奖罚机制,采用强化学习而获得的。它也可以是人类给她的预设经验。
由于妈妈处于家庭环境下和悠闲的时刻,机器妈妈的本能动机“安全需求”获得的激活值并不高。在妈妈的关系网络中,“安全需求”通常和“查看环境”联系紧密。所以,在只有本能动机输入的情况下,S2步骤得到的就是本能动机,S3步骤不需要去识别输入信息,S4步骤可能获得的关注点就是“查看环境”。在S5步骤中,如果这时妈妈内嵌的自检系统发出信息:累,需要休息。妈妈内部预置的程序就会发出休息的指令,这也是一种预置的“安全需求”。这种动机也会在关系网络中传播激活值。这时,由于有新信息输入(“累,需要休息”),妈妈需要中断当前信息处理过程。妈妈转而处理新信息,这一次可能收益和损失系统觉得继续查看环境收益更大,于是妈妈继续查看环境。
通过关系网络和分段模仿,机器妈妈可能开始执行“看的简图”和“听的简图”。由于这两个步骤输出的形式是动作,所以可能需要通过分段模仿把它们分解到具体的底层经验上去。机器妈妈开始分段模仿“看”和“听”这两个概念。机器妈妈需要把“看”这个概念细分下去,细分到底层经验。“看”的底层经验就是对很多肌肉发出命令,和对一些神经发出命令。这些命令的参数是过去经验的不断总结。也可以是预置经验。同理,“听”也是一样的处理。
于是,机器妈妈进入下一轮信息处理。她开始处理视觉输入和听觉输入的信息。机器妈妈进入新的S2步骤中。在S2步骤中,首先需要确定需要识别区域和使用的多大的窗口来识别底层特征。机器需要根据动机来选择识别区域和使用的窗口大小。这里的动机可以来自于继承动机。比如上一个活动中,机器对信息的响应可能是“对特定区域进一步识别信息”,那么这个特定区域就是机器选择的识别区域。机器对这些特定区域进一步识别信息时,预期识别事物的尺寸大小,就决定了机器选用的窗口大小。在这里,由于机器妈妈没有明确的目的,只是对环境的随机看和听。所以机器妈妈很可能随机的选取一片区域,并随机的选取使 用的窗口大小来提取底层特征。这些行为和人类在同样环境下的行为是类似的。
由于妈妈已经处于这样的环境中,所以她可能已经建立了这个环境的镜像空间。在S2步骤中,机器妈妈提取到底层特征后,按照和原始数据最匹配的大小、角度和位置来放置,这样就保留了原始数据中的时间和空间信息。假设妈妈的输入视频数据中,出现了窗户和窗帘。妈妈通过在S1步骤中已经建立好,并内置于她的信息处理中心中的底层特征提取算法,提取出来窗户的底层特征(它们可能是多个大小不等局部轮廓特征、多个整体框架特征)和窗帘的底层特征(它们可能是多个大小不等局部轮廓特征、多个大小不等的局部纹理特征,多个整体框架特征),还可能提取出窗户和窗帘作为一个整体的底层特征(因为窗户+窗帘在数据中很常见,机器在使用大小不等的局部窗口提取窗口内的局部相似性时,它们有可能作为一个整体相似性被提取了),这个组合底层特征可能是部分窗户的底层特征和部分窗帘的底层特征的组合,或者它们简化版本的组合。
这时,机器妈妈进入S3步骤。她使用每个提取到的特征,开始使用相似度对比算法,在关系网络中搜索相似的特征。找到后,给它们赋予一个初始激活值。这个初始激活值是按照目前动机的强度,按照预设程序赋予它们的。同这些视频输入一起的,还有代表机器本能动机的底层特征,它们会加入任何输入信息并从预设程序那儿直接获得初始激活值。
这里需要特别指出,由于存在激活阈值,所以即使传递系数是线性的,累计函数也是线性的,但由于激活阈值的存在,无论是在单次链式激活过程中,还是在多次链式激活过程中,相同特征图和相同初始激活值,但因为激活次序选择不一样,最终的激活值分布是不一样的。这是因为激活阈值的存在带来的非线性。不同的传递路径,带来的信息损失是不一样的。在单次链式激活过程和多次链式激活过程都有这种现象。激活次序选择的偏好,这相当于机器个性的差异,所以在相同输入信息下,产生不同的思考结果,这个现象和人类是一致的。
另外,关系网络中的关系强度和最新的记忆值(或者连接值)是相关的。所以机器会 出现先入为主的现象。比如拥有同样的关系网络的两个机器,面对同样一个特征图和同样的初始激活值,其中一个机器突然处理了一条关于这个特征图的输入信息,那么这个机器在处理了额外的这条信息后,它会更新关系网络中的相关部分。其中某一个关系线可能会按照记忆曲线增加。这个增加的记忆值在短时间内不会消退。所以在面临同样的特征图和同样的初始激活值,处理了额外信息的机器,将会把更多的激活值沿刚刚增强了的关系线传播,从而出现先入为主的现象。
采用类比的方法,我们可以近似地把机器妈妈处理视频的过程,和目前流行的卷积神经网络(CNN)做类比。机器对输入数据提取底层特征的过程,可以近似看作卷积的过程。机器从底层特征开始传播激活值,最终找到关注点,可以近似类比于多层神经网络的映射过程。记忆和遗忘机制,可以近似的看作梯度优化过程。但两者的差异也是显著的:关系网络中没有明确的分层神经元,它是一个整体网络。关系网络中每一个单元都是可见的,是有意义的。在关系网络中的图像处理过程,对人类而言是,每一步都是可以理解的,可以看到的。从更本质上来概括两者的差异,可以这么认为:目前的多层神经网络,是一个只能看到输入和输出特征图的关系网络。而关系网络,它类似于一个采用从底层特征到概念(从简单到复杂素材)的逐层训练网络。机器每增加素材,就增加网络层数来重新训练。而且层间映射权重系数是双向的,不是单向的。另外,它的中间层是可以输出的。神经网络采用误差反向传播算法来优化,而关系网络采用记忆和遗忘机制来优化。神经网络采用全部训练数据来训练,分为训练过程和应用过程,而关系网络没有训练过程和应用过程的区分,它需要的学习样本远远小于神经网络。
机器妈妈在提取了窗户和窗帘的底层特征,并给它们赋予初始激活值后,机器的本能动机也会给窗户和窗帘的底层特征传播激活值。显然,传播过来的激活值是很低的,因为在关系网络中,它们和安全连接并不紧密。所以这些在记忆中的窗户和窗帘的底层特征激活值并不高。它们能发起的链式激活范围很有限。
在链式激活完成后,机器妈妈从关系网络中寻找关注点。结果很可能是窗户和窗帘的特征图是关注点。所以,S3步骤是一个输入信息的识别过程。随后机器妈妈进入S4步骤:利用分段模仿的方法来理解输入信息。具体的过程如下:机器妈妈使用窗户和窗帘这两个关注点来搜索最相关记忆。可能找到几段和窗户和窗帘相关的记忆。寻找的方法是:机器使用窗户和窗帘特征图,在记忆库中搜索。显然理解窗户和窗帘这两个关注点,并不需要从记忆中找到很多记忆来协助理解。是否会引入更多的记忆,则需要由很多因素决定:比如最新的关系网络中,窗户和窗帘是否连接了一个高记忆值记忆,或者有一个相关的继承目标。机器妈妈可能只是调用了关系网络中一段长期记忆。这段长期记忆是经过记忆和遗忘机制淘汰后,只保留了能重复出现的部分,其余部分都已经被遗忘了的记忆:这种组合和“窗户”、“窗帘”的语音存在一个记忆帧中,并且两者记忆值都比较高。于是机器妈妈理解了输入信息。她进入S5步骤。在S5步骤中,由于机器妈妈处于安全和悠闲的环境,她的动机预置程序对动机赋值很低,她可能没有特别的动机,可能只是“安全需求”的动机被激活,但激活值很低。这时机器妈妈有可能出于习惯,模仿这段记忆,在头脑里默念了“窗户”、“窗帘”的语音。也有可能没有任何输出。
机器妈妈在随机的看和听的过程中,视频和音频数据不断的输入,机器妈妈的预置程序有可能发出“省电”的需求。这时机器妈妈有可能使用很大的窗口,对环境的细节基本忽略。这时,机器妈妈对提取的底层特征统一赋予一个预设的初始激活值,同时底层动机是一直激活的,它们会周期性的被赋予激活值,在机器的关系网络中传播。
假设机器妈妈随机观察环境时,突然提取到人弯腰的外形轮廓、衣服轮廓和颜色等底层特征。这些底层特征,不需要翻译,直接进入关系网络中进行处理。机器给这些底层特征赋予一个初始激活值,这些激活值在关系网络中传播,最终关注点可能是“人”、“弯腰”、“红衣服”等特征图。显然,这些特征图可以直接开始通过分段模仿来把它们组合起来。通过类似的记忆分段模仿,“人穿红衣服,弯腰”就可能是信息的理解过程。
机器妈妈在识别了信息后,做出自己的响应。她模仿记忆中的一段或者多段记忆,在这些记忆中,通常是做出进一步识别信息的动作。这是一种和“安全需求”相关的经验动机,在妈妈的成长过程中反复出现,所以类似记忆就成为永久记忆,可以无意识地调用。这是一种本能反应。
于是机器妈妈模仿以前的经验(这些记忆也可以是预置的本能经验):去识别一个区间更具体的信息。于是她开始按照这些经验,发出各种肌肉命令,把自己的眼睛和耳朵的关注力移到这片区域。机器妈妈划定的识别区间就是包含了人的区间,使用的识别窗口,就是在类似距离上,通常用于识别“人”使用的窗口。
假设机器妈妈随后发现了特定的头发发型,手伸向鞋子,机器妈妈按照类似的信息处理过程,对特定的发型,手伸向鞋子相关的底层特征赋予激活值,最后可能得到的关注点是“我的孩子”、“穿鞋”等特征图。在类似的动机驱动下,并能通过收益和损失系统的评估,妈妈会继续对信息识别。她会采用更小的窗口,来仔细识别相关区间内的图像。
在新的输入中,有可能增加了“Nike”鞋子的商标。给“Nike”鞋子的商标特征图赋值并激活后,这一次机器妈妈得到的关注点是“Nike”球鞋。这时,综合前面的信息,妈妈的关系网络中,激活中比较高的是“我的孩子”、“穿鞋”、“Nike商标”等,在妈妈“识别信息”的动机下,妈妈通过分段模仿,把这些信息组合起来,就是“我的孩子正在穿Nike鞋”。得到这个信息后,在“安全需求”的驱动下,保护孩子安全是她一种强烈的经验动机。在这种动机的驱动下,目标关注点可能是“保护孩子”。机器妈妈通过分段模仿,发现在现有环境下,通常是进一步识别危险因素。于是,她调整了动机的赋值系统参数,加大了对识别的环境区间。这一次,她发现了“足球”在孩子的旁边。孩子和足球相互赋予比较高的激活值,于是它们的激活值就会比其他激活值高,成为关注点。机器妈妈在搜索记忆后,找到了多段最匹配的记忆。它们可能是孩子多段下午出去踢球的记忆,可能是家附近的球场的记忆,可能是机器妈妈以前自我总结的经验:“孩子习惯在下午,天气好的时候出去踢球”。自我总结 的经验也是记忆的一部分,因为它也是一种活动。并且机器有可能有意识的重复这种活动,刻意增加这些总结的记忆。这是机器为了更好的适应环境而通过自我学习得到的方法。
这里需要特别指出,关系网络中的激活值也会随时间而递减。如果一些关注点长时间没有被处理。它们就可能被忘记。关系网络中的激活值如果随时间消退较慢,太多的激活信息形成彼此干扰,使得机器无法合理地找到目标关注点。这时,在节省能量的动机下,机器妈妈可能会本能地进行激活值刷新。这种方法就是把目前的重点信息转换为输出信息,但有可能并不会输出,而是把这个信息再转入输入信息,再次激活这些重点信息,让那些非重点信息加快遗忘。这就是思维整理过程。这是机器突出重点信息的一种方式。这和那些“嗯…啊…”等缓冲思维过程的辅助词是类似的,它是给机器在思考过程中的一种行为。这些输出可能不会真的输出,也有可能有实际的输出,比如喃喃自语等。机器在学习和生活中,都会碰到自己或者其他人使用这样的方式,所以机器碰到这种缓冲思维过程的辅助词或者是一个人喃喃自语就会有正确的理解。
由于人类交流最频繁的是语音和文字,所以一个概念的局部网络中,语音和文字通常和这个概念中所有属性相连。概念的属性就是概念的所有特征图。这些特征图从关系网络的各个支路获得激活值,并都向语音或者文字传送,所以通常的关注点就是概念的语音和文字。所以,这种机器的自我信息过滤的方法,中间输出通常是语音,因为这是最常见的输出方式。机器输出它们用的能量是最低的。当然,这和一个人的成长过程密切相关。
假设妈妈在经过分段模仿,把相关的特征图组合后,就是“孩子要出门去踢球”。这种信息是图像特征图的一种序列,在没有组织成输出形式之前,它是一种潜意识。这时在机器妈妈的关系网络中,激活的特征图很多,调用的记忆也很多,导致机器妈妈信息太多,运算效率低。这时机器妈妈在经验的指导下,需要一个思维缓冲,这是思维的一种自我保护,可以是经验,也可以是预置经验。
为了突出重点信息,机器通常是把重点信息强调一次或者多次,这样那些非重点信息 的激活值和调用的记忆,它们的激活值就会消退。假设妈妈对“孩子要出门去踢球”这个信息,自我输出输入了一遍,这样就把其他信息的激活值相对降低了。“孩子要出门去踢球”这个信息经过语音信息的虚拟输出流程后,在关系网络中变得突出起来。妈妈在本能动机的赋值下,“安全需求”倾向于不让孩子去踢球,因为孩子可能在踢球中受过伤。而在另外的记忆中,有专家说了这和“增强体质”的目标是符合的,应该让青少年多锻炼。于是选择哪种响应,就需要收益和损失系统来评估。
这里假设这两种响应都没有达到妈妈对收益和损失评估的要求。于是,妈妈通过分段模仿,来创造另外一种响应:她采用折中的选择目标关注点:同意、踢球、注意、安全、及时回来等。假设在模仿组织这些目标关注点时,“安全”这个词带来了一段记忆是“有一次孩子因为淋雨而感冒了”。于是,妈妈需要重新寻找自己的响应。她分析出了带来最大损失的是感冒,但感冒是淋雨的结果。所以,她需要把“淋雨”从达成的目标中排除出去。这样,就能把整个过程的收益最大化。
“排除淋雨”这个目标,显然通过分段模仿,很快找到现成的经验:“查看天气、”“带上伞”。这些经验成为中间目标。在这些记忆信息的驱动下,机器妈妈可能抬头看了看天气。为了达到“排除下雨”这个中间目标,妈妈可能找到的响应方案是带上伞。
于是,妈妈最终组成的输出是“希望孩子出门踢球时带上伞”。但妈妈根据过去的经验,在屋子里面只有自己和孩子的情况下,她省略了主语。她也意识到孩子要出门踢球,根据经验,这个信息她没有必要重复。在以前的记忆中,妈妈发出指令,孩子会遵守。所以,这一次妈妈发出指令,同样参考了以前的经验,认为孩子可能会遵守。否者,妈妈在自己的动机驱动下,就不会发出指令,而是采用其他办法。正是根据经验判断孩子会遵守,所以在动机选择时,激活值最高的选择是“给他指令”。于是,机器妈妈有可能最终输出“带把伞”这样的语音。
孩子接收到这个信息后,经过信息识别,确定了语音的正确词汇。经过翻译,并分段 模仿自己过去的记忆,补上了妈妈发出的信息中,主语应该是自己。孩子的思维处理过程,可能是他的本能动机驱动下,是顺从妈妈的目的,同意带上伞,并通过分段模拟来组织输出,评估带上伞后的收益损失情况。
孩子建立了自己带上伞的图像,并且使用踢球的部分记忆,和朋友们在一起的记忆,孩子也把自己在踢球时,看到其他人带来伞的反应作为输入,这是一种“共情”的方法,是“易位”思考,这也是本发明申请中提出机器智能的特征。这些信息并行或者串行,或者混合方式做了一次虚拟的输出转输入,然后这些输入信息都会通过链式激活向收益符号和损失符号传递收益值和损失值。在输入结束后,孩子看到了高的损失值。
由于这些用于处理的信息量可能比较大,他需要一个思维缓冲和把重点信息突出。于是他把可能发生的情况,作为一个输入信息再来预演了一次,这相当于对这个可能的结果再做了一次信息强调,做了更加全面的分析。重新寻找那些相似情况下自己和他人的响应。这这一次在收益和损失评估中,他发现带伞会带来很大的损失。于是他该变了主意,重新根据新的动机选择,按照排除带来最大损失的目标,他模仿过去的经验,组织输出,发出了“不”这个语音。
妈妈收到孩子的反馈后,经过同样的处理流程,识别出了这是孩子拒绝了自己的要求。在动机的驱动下,她的目标关注点可能还是“带上伞”、“为什么”、“进一步确认”、“表达不满”、“维护支配权”、“保护孩子”等。她需要采用分段模仿,把这些目标达成。但这些目标太多,她无法采用一个完整的过程把它们都实现。于是她再次寻找经验来处理这个问题。她有可能是把目标分成多个组,来一步一步实现。而其他目标就暂时转化为“继承目标”,留在妈妈的记忆里,作为后续计划达到的目标。
在分段模仿中,她可能根据长期以来积累的经验,第一步应该是“弄清楚情况”。于是她模仿这些经验,发出“你说啥”的疑问,而且在语言中表达了自己的不满。而“维护支配权”、“带上伞”、“保护孩子”成为了“继承目标”,在随后的时间里,妈妈还会继续试图来 实现。
孩子接收到这个信息后,经过信息识别,模仿自己发出这个信息时的目的,和自己接收到这个信息时通常的响应,孩子识别出来妈妈的目的是让自己说明为什么不带伞。于是孩子根据自己的动机,认为顺从比较好。在收益和损失评估中也得以通过,于是孩子组织语言,开始表达自己的原因。孩子做出这样的响应,目的就是让妈妈理解自己。因为根据长期的经验,孩子以前在解释后,妈妈就理解了自己。如果在孩子的生活中,妈妈很少能理解自己,那么孩子根据经验,发现解释是无法达到自己的目的的,收益很低,于是孩子选择的响应可能就是沉默。
假设孩子给出的反馈是:“那样朋友们会觉得我看上去很蠢…”。
这个反馈不在妈妈的预期内。妈妈收到这个信息后,经过翻译、处理后,她从自己发出类似信息时的感受,和其他人经验(比如从育儿专家那里听来的经验),或者自己之前类似的经验,她感受到了孩子的关注点,在保护孩子的本能动机驱动下,她决定给孩子讲一些道理,让孩子有正确的认识。这是一个大的目标,她开始搜索记忆,分解这个大目标到一系列小目标。她模仿记忆(可能是电视里,育儿专家的讲座里面给出的处理类似情况的经验),她觉得“是时候和孩子好好谈一下了…”
于是妈妈深吸一口气,模仿很久以前训练她的主人的习惯动作,开始整理自己的思路…。
以上就是本发明申请中,演示的如何通过本发明申请提出的方法和步骤,去实现通用智能。可以看到,这里的对话和目前语音交互系统的本质差异在于:对交流的信息是否是真正的理解,还是只是机械的模仿。所以,本发明申请提出的方法和步骤,是可以实现类似于人类的思维过程的,它是建立的信息总结、信息模仿和动机驱动三个要素的基础上的。
Claims (16)
- 一种建立关系网络的方法,其特征包括:提取3种基本关系来建立关系网络,它们分别是1,信息的相似关系;2,信息的时间关系;3,信息的空间关系;关系网络中的关系每被使用一次,关系强度就增加;关系网络中的关系随时间而递减。
- 一种信息存储方法,其特征包括:机器存储数据时,保留数据之间原来的相似性、空间关系和时间关系;机器使用数值或者符号作为记忆值,来表示这些数据能在数据库中存在的时间;这些记忆值按照数据每被使用一次就增加,按照时间增加而递减;同一时段存储的数据,彼此之间存在关系;其中任意两个数据之间的关系强度和这两个数据的记忆值相关。
- 一种信息存储时对信息的筛选方法,其特征包括:机器对数据的存储,是先存入一个临时记忆库;临时记忆库采用单独的记忆遗忘曲线;数据在临时记忆库中的记忆值达到预设标准后,信息可以转移到长期记忆库;只有在输入数据和前一次输入数据发生了超过预设阈值的改变,这时机器才需要重新存储这些数据,建立新的数据存储;只有在数据对应的记忆值发生了超过预设阈值的改变,这时机器才需要重新更新这些记忆值。
- 一种信息存储的组织方法,其特征包括:机器存储的数据包括机器在同一时段从外部输入数据或者从外部输入数据中提取的数据特征、机器内部监测系统给出的数据和机器根据这些数据做出的分析数据,还包括这些数据的记忆值;并认为这些存储的数据彼此之间存在联系,且任意两个数据之间的关系强度和这两个数据的记忆值相关。
- 根据权利要求4所述的方法中,其特征包括:机器在记忆中存储对内外输入信息的分析数据时,这些分析数据的记忆值和分析数据本身的数值大小相关。
- 一种数据特征选取方法,其特征包括:机器使用窗口选择数据,寻找选出数据彼此之间的局部相似性;找到的彼此相似的数据作为一种数据特征;存储这种数据特征到数据特征库中的同时,赋予其预设的记忆值来代表其能在数据库中存在的时间;这种记忆值按局部相似性重复出现的次数而增加,按照时间而递减;机器可以使用大小不同的窗口对相同数据重复进行上述操作。
- 一种训练神经网络识别输入数据中是否包含有数据特征的方法,其特征包括:机器使用窗口来选取输入数据,并对数据中包含的数据特征做标记;做标记的方法可以是使用所选窗口内的数据和数据特征做相似性对比来实现;机器训练神经网络识别这些带有标记的数据;机器使用从小到大的窗口对相同的数据重复进行上述操作,每次增大数据选取窗口后,机器就在原先训练好的神经网络上,增加零到多层神经元,作为新的神经网络;在新的训练过程中,如果机器只训练新增加的神经元层,那么机器最后得到的是部分中间神经元层也是输出层的单个神经网络;如果机器训练所有神经元层,那么机器就保留之前的神经网络,机器最终获得的是每个窗口都有一个对应的神经网络。
- 一种从输入数据中提取数据特征的方法,其特征包括:机器使用窗口选取输入数据,并确认所选数据中是否包含数据特征库中的数据特征,并通过窗口的位置确定这些特征在输入数据中的位置;机器可以使用大小不同的窗口对相同数据重复进行上述操作;
- 一种在关系网络中进行链式激活方法,其特征包括:在关系网络中,当特征图i被赋予初始激活值,如果这个值大于自己的预设激活阈值Va(i),那么特征图i将被激活,它会把激活值传递到和它有连接关系的其他特征图节点上;如果某个特征图收到传过来的激活值,并累计上自己的初始激活值后,总激活值大于自己节点的预设激活阈值,那么自己也被激活,也会向和自己有连接关系的其他特征图传递激活值,这个激活过程链式传递下去,直到没有新的激活发生,整个激活值传递过程停止,这个过程称为 一次链式激活过程;在单次链式激活过程中,一但特征图i到特征图j发生激活值传递后,特征图j到特征图i的反向传递就被禁止。
- 根据权利要求9所述的方法,其特征包括:特征图A到特征图B的激活值传递系数和A与B之间的关系强度成正相关,也与A到B的关系强度占A所有关系强度的权重正相关。
- 一种在关系网络中进行激活值传播的方法,其特征包括:关系网络中的节点的激活值,可以在关系网络中传播,并且关系网络中的激活值,会随时间而递减。
- 一种机器利用记忆数据的方法,其特征包括:机器使用不同的记忆数据中的局部信息,来重新组合,并加入和输入信息相关的信息,构成一个新的信息序列;机器通过模仿这个新的信息序列来实现对输入信息的响应;机器在模仿的过程中,继续使用同样的方法来组织下层新的信息序列,并通过模仿这个下层新的信息序列来实现模仿过程中的中间目标;这个方法可以迭代进行。
- 一种机器利用记忆数据的方法,其特征包括:机器在重组记忆和输入信息的过程中,可以把部分重组的新信息序列作为自己的输入,来调整重组过程。
- 一种神经网络,其特征包括:通过对神经元输出的权重系数w,引入让它们在每次神经网络系数更新后,对所有权重系数w的绝对值减小一个delta值,然后再次进行权重系数优化;也可以通过对神经元输出的偏置项b,引入让它们在每次神经网络系数更新后,对所有偏置项b改变一个delta值,使得神经网络的输出向零靠近,然后再次进行神经网络系数优化;其中delta是一个大于或者等于零的实数;每一次减小的delta值可以是不同的值。
- 一种神经网络结构,其特征包括:在多层神经网络中插入一到多个线性变换层(或者弱非线性变换层);这些线性变换层(或者弱非线性变换层),它们可以采取不同的线性或者弱非线性激活函数;线性变换层可以在优化开始之前插入,也可以在优化过程中插入。
- 一种人工智能实现方法,其特征包括:它通过对记忆的整理来获得事物之间的关系网络;它通过对记忆和输入信息的重组来理解输入信息,来建立输出响应;它通过趋利避害的动机来选择不同的重组结果,并通过模仿或者类比模仿选中的记忆重组结果来做出响应。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010370939.2 | 2020-04-30 | ||
CN202010370939.2A CN111553467B (zh) | 2020-04-30 | 2020-04-30 | 一种实现通用人工智能的方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021217282A1 true WO2021217282A1 (zh) | 2021-11-04 |
Family
ID=72000250
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/000108 WO2021217282A1 (zh) | 2020-04-30 | 2020-05-15 | 一种实现通用人工智能的方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111553467B (zh) |
WO (1) | WO2021217282A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118503893A (zh) * | 2024-06-06 | 2024-08-16 | 浙江大学 | 基于时空特征表示差异的时序数据的异常检测方法和装置 |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112016664A (zh) * | 2020-09-14 | 2020-12-01 | 陈永聪 | 一种实现类人通用人工智能机器的方法 |
WO2021218614A1 (zh) * | 2020-04-30 | 2021-11-04 | 陈永聪 | 通用人工智能的体系建立 |
CN112231870B (zh) * | 2020-09-23 | 2022-08-02 | 西南交通大学 | 一种复杂山区铁路线路智能化生成方法 |
WO2022109759A1 (zh) * | 2020-11-25 | 2022-06-02 | 陈永聪 | 一种类人通用人工智能的实现方法 |
CN113626616B (zh) * | 2021-08-25 | 2024-03-12 | 中国电子科技集团公司第三十六研究所 | 航空器安全预警方法、装置及系统 |
CN115359166B (zh) * | 2022-10-20 | 2023-03-24 | 北京百度网讯科技有限公司 | 一种图像生成方法、装置、电子设备和介质 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109202921A (zh) * | 2017-07-03 | 2019-01-15 | 北京光年无限科技有限公司 | 用于机器人的基于遗忘机制的人机交互方法及装置 |
CN109657791A (zh) * | 2018-12-14 | 2019-04-19 | 中南大学 | 一种基于大脑神经突触记忆机制的面向开放世界连续学习方法 |
CN110070188A (zh) * | 2019-04-30 | 2019-07-30 | 山东大学 | 一种融合交互式强化学习的增量式认知发育系统及方法 |
CN110163233A (zh) * | 2018-02-11 | 2019-08-23 | 陕西爱尚物联科技有限公司 | 一种使机器胜任更多复杂工作的方法 |
WO2019232335A1 (en) * | 2018-06-01 | 2019-12-05 | Volkswagen Group Of America, Inc. | Methodologies, systems, and components for incremental and continual learning for scalable improvement of autonomous systems |
CN110705692A (zh) * | 2019-09-25 | 2020-01-17 | 中南大学 | 一种基于空间和时间注意力的长短期记忆网络对工业非线性动态过程产品质量预测方法 |
CN110909153A (zh) * | 2019-10-22 | 2020-03-24 | 中国船舶重工集团公司第七0九研究所 | 一种基于语义关注度模型的知识图谱可视化方法 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5736530B1 (ja) * | 2015-02-17 | 2015-06-17 | オーナンバ株式会社 | 太陽光発電システムの未来の電流値または発電量の低下の時期を予測する方法 |
EP3471623B1 (en) * | 2016-06-20 | 2023-01-25 | Butterfly Network, Inc. | Automated image acquisition for assisting a user to operate an ultrasound device |
CN107609563A (zh) * | 2017-09-15 | 2018-01-19 | 成都澳海川科技有限公司 | 图片语义描述方法及装置 |
CN107818306B (zh) * | 2017-10-31 | 2020-08-07 | 天津大学 | 一种基于注意力模型的视频问答方法 |
US10885395B2 (en) * | 2018-06-17 | 2021-01-05 | Pensa Systems | Method for scaling fine-grained object recognition of consumer packaged goods |
EP3617947A1 (en) * | 2018-08-30 | 2020-03-04 | Nokia Technologies Oy | Apparatus and method for processing image data |
US11200424B2 (en) * | 2018-10-12 | 2021-12-14 | Adobe Inc. | Space-time memory network for locating target object in video content |
CN109492679A (zh) * | 2018-10-24 | 2019-03-19 | 杭州电子科技大学 | 基于注意力机制与联结时间分类损失的文字识别方法 |
CN109740419B (zh) * | 2018-11-22 | 2021-03-02 | 东南大学 | 一种基于Attention-LSTM网络的视频行为识别方法 |
-
2020
- 2020-04-30 CN CN202010370939.2A patent/CN111553467B/zh active Active
- 2020-05-15 WO PCT/CN2020/000108 patent/WO2021217282A1/zh active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109202921A (zh) * | 2017-07-03 | 2019-01-15 | 北京光年无限科技有限公司 | 用于机器人的基于遗忘机制的人机交互方法及装置 |
CN110163233A (zh) * | 2018-02-11 | 2019-08-23 | 陕西爱尚物联科技有限公司 | 一种使机器胜任更多复杂工作的方法 |
WO2019232335A1 (en) * | 2018-06-01 | 2019-12-05 | Volkswagen Group Of America, Inc. | Methodologies, systems, and components for incremental and continual learning for scalable improvement of autonomous systems |
CN109657791A (zh) * | 2018-12-14 | 2019-04-19 | 中南大学 | 一种基于大脑神经突触记忆机制的面向开放世界连续学习方法 |
CN110070188A (zh) * | 2019-04-30 | 2019-07-30 | 山东大学 | 一种融合交互式强化学习的增量式认知发育系统及方法 |
CN110705692A (zh) * | 2019-09-25 | 2020-01-17 | 中南大学 | 一种基于空间和时间注意力的长短期记忆网络对工业非线性动态过程产品质量预测方法 |
CN110909153A (zh) * | 2019-10-22 | 2020-03-24 | 中国船舶重工集团公司第七0九研究所 | 一种基于语义关注度模型的知识图谱可视化方法 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118503893A (zh) * | 2024-06-06 | 2024-08-16 | 浙江大学 | 基于时空特征表示差异的时序数据的异常检测方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
CN111553467B (zh) | 2021-06-08 |
CN111553467A (zh) | 2020-08-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021217282A1 (zh) | 一种实现通用人工智能的方法 | |
Bhattacharya et al. | Text2gestures: A transformer-based network for generating emotive body gestures for virtual agents | |
Marsella et al. | Computationally modeling human emotion | |
Boden | AI: Its nature and future | |
WO2021226731A1 (zh) | 一种模仿人类记忆来实现通用机器智能的方法 | |
Wu | Simulation of classroom student behavior recognition based on PSO-kNN algorithm and emotional image processing | |
WO2021223042A1 (zh) | 一种类似于人类智能的机器智能实现方法 | |
Reva | Logic, Reasoning, Decision-Making | |
US11715291B2 (en) | Establishment of general-purpose artificial intelligence system | |
CN111949773A (zh) | 一种阅读设备、服务器以及数据处理的方法 | |
CN112215346B (zh) | 一种实现类人通用人工智能机器的方法 | |
Ayesh et al. | Models for computational emotions from psychological theories using type I fuzzy logic | |
WO2022109759A1 (zh) | 一种类人通用人工智能的实现方法 | |
Yu | Robot behavior generation and human behavior understanding in natural human-robot interaction | |
Takaki | Thought-provoking ‘contamination’: applied linguistics, literacies and posthumanism | |
de Paula et al. | Evolving conceptual spaces for symbol grounding in language games | |
CN113962353A (zh) | 一种建立强人工智能的方法 | |
Chen et al. | Comparison studies on active cross-situational object-word learning using non-negative matrix factorization and latent dirichlet allocation | |
CN112016664A (zh) | 一种实现类人通用人工智能机器的方法 | |
Chen | Entertainment robots based on swarm intelligence algorithm applied in remote dance performances | |
Seymour | Artificial intelligence is No match for human stupidity: Ethical reflections on avatars and agents | |
Vircikova et al. | Neural approach for personalised emotional model in human-robot interaction | |
Silver et al. | The Roles of Symbols in Neural-based AI: They are Not What You Think! | |
Farinelli | Design and implementation of a multi-modal framework for scenic actions classification in autonomous actor-robot theatre improvisations | |
Weng | The living machine initiative |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20933768 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20933768 Country of ref document: EP Kind code of ref document: A1 |