WO2021223042A1

WO2021223042A1 - Method for implementing machine intelligence similar to human intelligence

Info

Publication number: WO2021223042A1
Application number: PCT/CN2020/000107
Authority: WO
Inventors: 陈永聪; 曾婷; 陈星月
Original assignee: Chen Yongcong
Priority date: 2020-05-06
Filing date: 2020-05-15
Publication date: 2021-11-11
Also published as: CN111563575B; CN111563575A

Abstract

A learning method for imitating a human learning process. By means of seeking for various recombination schemes via information summarization, information combination, and motives, and dividing a process into a plurality of intermediate sections to find simulatable experience, a machine gradually obtains responses ranging from simple to complex and from input to output and has emotion expressions similar to those of human beings.

Description

A realization method of machine intelligence similar to human intelligence

Technical field

The application of the present invention relates to the field of artificial intelligence, in particular to the field of establishing general machine intelligence similar to human intelligence.

Background technique

Current machine intelligence is usually designed for specific tasks, and there is no general machine that can complete a variety of uncertain tasks. For example, in deep learning, multi-layer neural networks use reverse error transfer to find the multi-layer mapping with the smallest error function. The machine does not understand the meaning of the input information, nor can it predict the possible subsequent development of this information. The convolutional neural network is obtained by preprocessing the data of the multi-layer neural network, and it has the same problem. The current knowledge graph project helps the machine to connect different things when searching by extracting the associations between texts or concepts from big data. However, these relationships lack quantification, and there is no way to help machines use these relationships to speculate the reasons for the occurrence of information, and to predict the possible results after the occurrence of the information. Through learning, human beings can guess the reason, predict the result, make choices and respond to the input information. Therefore, the current machine intelligence and human learning methods are very different, and they cannot produce general intelligence similar to humans.

The application of the present invention believes that the intelligence of the machine should be based on information extraction and experience, rather than data processing methods, which serve to facilitate information reuse. Therefore, the learning method proposed by the present application is to imitate the human learning process. By summarizing information, reorganizing information, finding various reorganization schemes through motivation, and implementing responses through imitation, the machine gradually obtains general intelligence similar to humans. All these show that there is a huge difference between the machine learning method proposed in the present application and the existing machine learning method in the industry. The method proposed in the present application is aimed at realizing a machine intelligence that is similar to or even surpasses human intelligence, and is similar to humans in terms of emotions and motivations, and there is no similar method in the industry.

Summary of the invention

Human intelligence is a result of evolution. Our ancestors, when they explored the world before the production of language symbols, they must use the information obtained by basic sensors such as images, sounds, smells to recognize the world, and use this information to sum up their experience. In the application of the present invention, we use the same method to restore all the input information to the way of thinking of our ancestors for information processing. Then use language as input and output.

Human beings understand the relationships between things, so they can make choices that suit their own interests based on these relationships, and implement these choices. This is the form of human intelligence. In the application of the present invention, the machine is the same. It processes the input information, uses the relational network to reorganize the information response, uses the evaluation system to select the optimal information response, and uses gradual imitation to achieve the optimal information response output. Let's explain separately below.

1. The establishment of similarity.

In the application of the present invention, the first basic assumption is: "If some attributes of two pieces of information are similar, other attributes contained in the two pieces of information may also be similar." This is the starting point of machine learning. Fortunately, the world we live in is exactly such a world. For example, if two apples have similar textures, colors, and shapes, they may also have similar other attributes. For example, taste, weight, price, or hardness, as well as related information before the discovery of this information, such as all growing on apple trees, all mature in autumn, etc.; also including information after predicting this information, such as what they will be under natural conditions. It gradually rots away and can be stored for a long time in freezing. The similarity is also manifested in the dynamic process. For example, for two pieces of information about "one person going to buy something", we can reasonably speculate that the previous information may be "she (he) needs this product and is currently lacking", or later The possible messages are "she (he) needs to pay and take the goods back". This kind of local similarity to infer a larger range of similarities is the starting point for our learning. In essence, "similarity" implies the premise that we use the same resolution to compare. For example, we continue to increase the resolution, it can be considered that no two apples in this world are the same. But we continue to reduce the resolution, it can be considered that all apples in this world are the same, they are all "apples." It is even further extended to the fact that all objects in the world are the same because they are all "objects". Therefore, we can use different resolutions to find the similarities of things, scenes, and processes, and based on the similarities, reasonably infer other attributes (such as causes and results) at this resolution. resemblance. This is the summary of experience.

1.1 Look for static similarity.

To compare similarity, first determine the resolution of the comparison. For example, two houses, from a rough comparison, their shapes are similar, so they have similarities. In terms of details, their windows are different and the colors are also different, so there is no similarity between them.

To solve this problem, the present application proposes a local similarity comparison method. Specifically, windows of different sizes are used to fetch data, and then the data in the window is processed (such as convolution, contour extraction, various coordinate base transformations and filtering, etc.). Different windows can use different data preprocessing algorithms. These algorithms It is a very mature algorithm for image processing at present, and it is not in the claims of the present invention, so it will not be repeated here). Then compare the similarity of the processed graphics. The machine may need to repeatedly use different windows for the same data to compare similarities according to different resolutions.

In data processing, every time the machine finds a similar partial data, the machine puts this data into the temporary memory bank as a candidate for the feature map, and assigns a memory value to the candidate for the feature map. The machine uses windows of different sizes and iteratively uses the above process on the data, so that the machine can obtain a large number of feature map candidates in the temporary memory.

In the temporary memory bank, we use memory and forgetting mechanisms to maintain these feature maps. Specifically: every time a similar feature map candidate is found, the memory value of this feature map candidate increases its memory value according to the memory curve. At the same time, all memory values in the temporary memory bank follow the forgetting curve and gradually decrease over time. If the memory value decreases to zero, then the feature map candidate is deleted from the temporary memory bank. If the memory value of a feature map increases to the preset standard, then this feature map is moved to the long-term memory bank and becomes a long-term memory. Here, the memory value represents the time that the corresponding feature map can exist in the database. The larger the memory value, the longer the existence time. When the memory value is zero, the corresponding feature map is deleted from the memory bank. The increase or decrease of the memory value is carried out in accordance with the memory curve and the forgetting curve. And different databases can have different memory and forgetting curves.

In the training process of the machine, in daily life, the above process is used continuously, and finally a large number of feature maps are obtained.

In the same way, we can do the same processing for sensor information other than the image. For example, for speech, we can distinguish the frequency composition and relative intensity of different speech as static features, and find local similarities from them. Similar methods can be used for tactile and sensory data. We only need to find similarities in different dimensions of these data and at different resolution scales to establish similarity comparison results at different resolutions, thereby establishing Its static feature map. It needs to be pointed out that the static feature map is established based on the resolution, which represents the machine's self-built classification of things based on similarity. For example, two tables may belong to the same category at a rough resolution, but they may have multiple categories at a fine resolution. Our ancestors established linguistic symbols for some classifications to represent them, and used them to conveniently express these classifications in information exchange.

1.2 Looking for dynamic similarity.

In dynamic images, there are two similarities. One is the similarity between the images it contains and the images in other processes. The machine only needs to process the feature maps in the process and the feature maps in other processes according to the static feature map extraction method. They are essentially static feature maps. But in the dynamic process, there is another kind of similarity, that is, the similarity of the movement pattern. The motion mode means that the machine ignores the details of the composition of the moving objects, and focuses on comparing their motion modes. Similarly, this also has a comparative resolution problem. For example, a person walks towards us, or slides over, or runs over. At a rough level, we will not even notice the difference in these motion modes, so at this time, We think their exercise patterns are the same. But when we increased the resolution, we found that the person who slid over moved smoothly, and that the person who walked over and the person who ran over had a variety of motion characteristics, including the relative motion and movement of various parts of the human body. The overall movement of the human body as a whole also includes the speed of change, so we will find that their movement patterns are different.

To solve this problem, the present application proposes a dynamic local similarity comparison method. Specifically, windows of different sizes are used to track different parts of things. For example, if a person runs over, walks over or slides over, we can use different windows to represent different resolutions. For example, when we use a large window to treat the whole person as a whole, we track the movement pattern of this window, and we find that the movement patterns are the same in these three cases. But when we use a smaller window to extract the human hands, legs, head, waist, buttocks and other parts of the movement mode separately, we distinguish the difference of these three movement modes. Furthermore, if we use more windows to focus on the movement pattern of the hand, we can get a finer resolution movement pattern.

In addition to the spatial resolution, the machine also needs to establish different temporal resolutions. For example, we describe the constant flow of people on the street, which is a mode of crowd movement. But from a more subtle time resolution, we can find the peak of crowd flow during the morning and evening shifts. We compare the changes of the motion trajectory at different time resolutions to get the rate of change. The rate of change is an important dynamic feature of movement in time.

Therefore, the extraction of motion patterns is based on a certain time resolution and a certain spatial resolution. The machine processes a large amount of dynamic data to find common dynamic features.

Every time the machine finds a similar movement pattern, the machine puts the data representing this movement pattern into the temporary memory bank as a candidate for the dynamic feature map, and assigns a memory value to the candidate for the dynamic feature map. The machine uses windows of different sizes and iteratively uses the above process on the data, so that the machine can obtain a large number of dynamic feature map candidates in the temporary memory bank.

Like the static feature map, the machine also uses the memory and forgetting mechanism to survive the fittest on the extracted dynamic feature map. Those movement patterns that are widely present in various moving objects will be discovered again and again, thereby increasing the memory value again and again, and finally entering the long-term memory bank and becoming our long-term memory.

In the same way, we can do the same processing for sensor information other than the image. For example, for speech, we can use time windows of different sizes as the resolution, take some specific language attributes (a certain feature) as the object, and then compare the change pattern (motion pattern) of the observed object to find the local change pattern. Similarity (such as rising pitch, falling pitch, vibrato, popping, etc.). In the same way, a similar method can be used for data such as touch and sensation. We only need to use a certain feature as the observation object in different dimensions of the data according to different resolution scales to find the change pattern of the observation object. The similarity between these objects can establish the dynamic feature map of these objects.

It should be pointed out that the dynamic feature map is established based on the dual resolution of space and time. It represents the machine's self-built classification of dynamic processes based on the similarity of dynamics. They have nothing to do with the static characteristics of the observed object.

In life, since dynamic features have nothing to do with the objects that implement these dynamic features, the repetitive use of dynamic features in our lives is very high. In memory, they get a high memory value because of their high repeatability. When we search and use these dynamic feature maps, we don't even notice it. Moreover, it is precisely because the dynamic feature map has nothing to do with the implementation object, so the machine can easily use the analogy method (same as the replacement within the concept) to generalize the application scope of these dynamic features. So the dynamic feature itself is the key tool for our generalization experience.

2. The establishment of a network of relationships.

Through acquired learning, human beings have given language symbols to these classifications established at different resolutions to better express these classifications. This is the basic concept. Also through learning, adjusting the resolution, merging or expanding these categories, to build more categories, and use more language symbols to represent these new categories. This process can be carried out iteratively, so humans have established general concepts and abstract concepts (they are concepts established as operating objects). And established a network of relationships between all categories, this is knowledge. Our ancestors passed on this knowledge to us through language. Based on this knowledge, we continue to discover new classifications and discover new relationships, thereby expanding human knowledge and passing it on to our descendants through language symbols.

Our ancestors discovered two types of relationships between things through observation and summary. The first category is similarity, which is based on the comparison of different resolutions. The second type is the connection relationship. The things connected by this kind of relationship are not similar, but our ancestors discovered in their lives that there are connections between dissimilar things, and these relationships are closely related to their lives. So they summed up these relationships as experience. And use language to pass on these experiences to future generations. Suppose a beast rushes to our ancestors. At this time, there is not only a static feature map of the beast, but also a movement pattern of the beast (dynamic feature map), a specific sound, and a specific sound change (dynamic feature). Figure), there may also be specific scenes (static feature maps, such as the edge of a pond), and there may also be specific scene change patterns (dynamic feature maps, such as other animals running around). This information enters the information processing system of our ancestors at the same time. After many similar processings, our ancestors will connect these repetitive information as experience to better adapt to the environment. In the present application, we refer to these relationships as environmental relationships. The network established by the environmental relationship and the similarity relationship is called the relationship network.

In the present application, the second basic assumption is that "things in the same environment have a connection relationship with each other". Our ancestors, when they first encountered a beast, they connected the beast to the entire environment. In the second encounter with the beast, those same information will further increase the memory. With the gradual increase of similar processes, the information that can be repeated will further increase the memory, and those that cannot be repeated, the occasional information will gradually be forgotten. For example, the beast movement pattern may appear every time, and when a beast appears, the message such as a flower next to it may be forgotten. For example, "fish" always appears in the water, so the connection between fish and water will be strengthened step by step. The completion of such a choice is the memory and forgetting mechanism. The mechanism of memory and forgetting is a gift brought to us by evolution, because it is suitable for realization on nerve cells, and it is an efficient way of summarizing experience. In machine learning, we also introduce this mechanism. But other mechanisms that can realize similar rule summarization can also be used as machine intelligence rule summarization mechanisms.

For each input information, the machine selects the region of interest and uses the resolution of interest of the machine to extract the data feature map. And search for the extracted feature maps (static feature maps and dynamic feature maps) in memory. If a similar feature map is found in a memory, it means that this feature map is repeated in this memory. The machine increases the memory value of this feature map in memory according to the memory curve. At the same time, the machine follows the forgetting curve to decrease the memory value of all memories with time. In this way, only those recurring feature maps can have their memory value in the relevant memory for a long time.

If a feature map extracted from a piece of input information has multiple feature maps found in the same segment of memory, it means that the relationship between these feature maps can be repeated. Then, according to the memory and forgetting mechanism, we will directly increase the memory value of each feature map. In the present application, the machine does not need to deal with these recurring relationships. In fact, these relationships are also very complicated and difficult to deal with. Therefore, in the application of the present invention, we propose the third basic hypothesis "The feature map in the same memory, the strength of the connection relationship between any two feature maps is positively correlated with the memory value of these two feature maps in this memory ( It is not necessarily a linear relationship)”. Therefore, the repetitive combination of feature maps, because the memory value of these combinations in the same segment of memory is increased, so that the strength of the connection relationship between them is also increased. Each memory feature map (static or dynamic feature map) constitutes a local area network. And these local area networks are connected with each other through the similarity of feature maps. In this way, a three-dimensional memory network composed according to the time relationship is formed, and they are the relationship network.

3. The establishment of the concept.

Our ancestors invented languages and used these languages to represent the categories established by comparing similarities, such as stones, trees, figs, rabbits, and lions that are closely related to life. Language is also used to represent those dynamic classifications established by comparing similarities, such as running, jumping, knocking, grinding, planing, throwing, and flowing dynamic patterns closely related to life. After having these languages, we can organize these languages and express our thoughts through certain organizational methods. This is a process of convention.

The concrete method of the machine to establish the concept, adopts the same way as the human. For example, when a certain image feature map is input into the machine, we give it a language that represents the image feature map simultaneously. Then the machine can combine this image feature map with the corresponding image feature map in the relational network after multiple repetitions. The language feature map establishes a closer connection. Because of the similar image feature maps that exist in different memories, their similarity in different memories may not be as high as the similarity of language in different memories. When we concatenate different memories through images and languages, those language symbols (such as voice or text) are frequently used (resulting in high memory value) and have high similarity to each other (resulting in a large transfer coefficient between memories), so the same Among the information contained in a concept (such as various Apple images, various Apple voices and various Apple texts), language symbols are likely to have the highest memory value (because of frequent use and high similarity). When searching for concepts in memory, we often find language symbols first and use language symbols to represent concepts.

4. Extension of the static concept.

The extension of a static concept is to extend the object of similarity to the concept.

In the use of language, it may be very cumbersome or even difficult to express some information using these concepts that express entities (images, actions, or human-perceivable features). For example, if we open a restaurant, we need to describe that we can sell pizza. Think about how cumbersome it is that we only use words such as wheat, meat, grinding, cutting, and heating to describe the whole process. So we must combine the frequently used information, use a symbol to represent it, and form a consensus among the group. In this way, when we exchange information, we can use this symbol to concisely represent this string of information combinations. This is to create new concepts on the basis of concepts.

The way to create new concepts is to extend the objects of similarity to concepts. We can attribute different concepts to one concept, it must be because these different concepts contain certain common attributes. These common attributes are the similarities between concepts. Through this similarity, we think that these concepts are similar to each other, so we use a concept to represent this concept group.

For example, we collectively refer to the people who come to eat as customers, and collectively refer to the various amounts of money that customers give us as tips. This is to reduce the resolution of things and only retain their common attributes, so they are similar to each other and are summarized. As a concept. In the same way, we also divide apples into Red Fuji apples, American snake fruit and Yantai apples. This is to increase the resolution of things to distinguish differences. With these expanded categories, humans can either create a new language symbol to represent them, or combine the original language to represent them. For example, we can call people who come to eat as "customers", or as people who come to eat. For example, we can say "sweet love" and "bitter life". This is to extend the classification object from food to the whole concept, and extend the taste attribute to the sensation brought by taste. Based on this "similar to tasting food" The “feelings obtained afterwards” attributes are classified. Only when the machine expands both the object of comparison and the attributes of use, can the machine understand "sweet love" and "bitter life".

We can think of the extended concept as creating a new concept from the original concept. This process can be carried out iteratively, that is, these expanded concepts can be further changed in resolution to form more abstract or more specific concepts. Therefore, there is not only a parallel relationship between concepts. They may also be contained and contained, partially contained, overlapped, or partially overlapped.

5. Expansion of the dynamic concept.

The expansion of the dynamic concept is to extend the object that recognizes the dynamic mode to the concept.

The extraction of dynamic features is a crucial part of machine intelligence. Because the dynamic feature is a dynamic way of movement, it has no necessary connection with the subject of this way of movement. Therefore, the subject of movement characteristics is a generalized subject. The machine can use mass points or three-dimensional graphics to represent abstract moving subjects. It is precisely because the subject of motion is the subject of generalization, so that the machine can bring any entity and concept into the characteristics of motion, so as to realize the generalization ability of experience. For example, when we say "After all the information is separated and filtered, and then aggregated and processed, we get a product with a solid foundation." Obviously, we take the information as an object and bring it into the dynamic model we have established to "filter" , "Summary", "processing", and the result of information processing as an object, using "solid foundation" and "product" to describe. It is precisely because the abstract concept can be brought into the motion characteristics as the subject, so the machine can understand and use the truth of information such as "recently high emotions", "the bow has no turning back arrow", "he slides into the abyss step by step", etc. meaning.

The concept of expressing the relationship between things is also a dynamic feature. It considers the objects at both ends of the relationship as a virtual whole. Therefore, in the application of the present invention, by assigning a dynamic feature to the concept representing the relationship, the machine can correctly use the concept representing the relationship through this dynamic feature. For example, the relationships represented by languages such as "although...but...", "but...", "though...", "but..." can be represented by a dynamic feature of transition. Parallel concepts such as "on one side... on the other side..." and "both... and..." can be represented by dynamic characteristics of parallel operations. The relational concept of "contained in" can be expressed by the dynamic feature of inclusion.

The specific methods for establishing the dynamic characteristics of this relationship are: 1. The machine uses memory and forgetting mechanisms for a large number of languages to find their common points. These common points are usually the concept of dynamic patterns or relationships, because they are related to specific objects. Irrelevance, leading to them can be widely used. The organization of these words has gradually become common words, common sentence patterns, and grammar. This method is similar to the current method of language organization in artificial intelligence, and is a method of mechanical imitation. 2. In the application of the present invention, the machine needs to further understand the meaning of these concepts. The method of machine understanding is to memorize the specific static feature maps and dynamic feature maps associated with each use of these concepts, and then save these concepts through the memory and forgetting mechanism. Because in the process of describing the relationship, the specific object always changes, and what does not change is the dynamic characteristic of the relationship. For example, in relational applications such as "one side... one side...", it is often used in the dynamic characteristics of the parallel activities of two objects. Therefore, after accumulation, the machine can express the words "on one side... on the other side..." as two specific objects and a dynamic feature representing "two objects side by side activity". The next time the machine receives a message like "one side... one side...", the dynamic feature it calls is still the dynamic feature of "two objects moving in parallel", but the two specific objects may have changed. Through repeated such repetitions, the machine finally establishes a close connection between the words "on one side... on the other side..." and the dynamic characteristics of "two objects moving side by side", but not with specific specificity. The subject establishes a close relationship. When the machine needs to use words like "on one side... on the other side...", the machine can refer to past experience. Even if the machine is facing new things, it can reasonably bring the new things into the relationship by replacing the same attributes. Only in this way can the machine correctly understand and use the meaning contained in the relational concept of "on one side... on the other side...". In the same way, the relationships represented by "although...but...", "but...", "though...", "but..." can be a dynamic feature of "turning". "Both...and..." is a superimposed dynamic feature. Therefore, from the perspective of language understanding and organization, the language processing method proposed in the present application is essentially different from the currently known language processing methods. We don't need to build a semantic database artificially, but also allow the machine to truly understand the meaning of the language.

Another aspect of dynamic expansion is: in our lives, many processes are composed of multiple entity concepts or expanded abstract concepts, which constitute a generalized movement mode, which we call process characteristics. Process feature is an extended dynamic feature. Its features are: 1. Multiple observation objects, they are not necessarily a whole. 2. There is no clear repeating trajectory in the whole movement mode. For example, the processes of going home, going on business, washing hands, cooking, etc., are multiple physical concepts or expanded abstract concepts that constitute a generalized movement mode. It is called a pattern because these concepts can be repeated in our lives. Since it can be repeated, it means that there are common features in the process of representation of these concepts. Otherwise, it is impossible for us to use a concept to represent them.

For example, we decompose a business trip into "departure", "on the road" and "arrival". It can also be decomposed into "departure", "drive to the airport", "arrival at the airport", "buy a ticket", "pass security", and "board a plane". "On the way", "arriving at the target airport", "out of the target airport", "flight to the target hotel" more detailed links. It depends on the time and spatial resolution of the machine. The intermediate link of a process can be considered as an intermediate state that can be repeated in a similar process. Through these intermediate states, we can divide a large number of similar processes into the same multiple links. Each link may contain multiple common intermediate states. These common intermediate states of the next level divide a single link into multiple next-level links. In this way, we step by step to subdivide a type of process into many similar links in series. The result of decomposition is a tower structure. The lowest common link is the most subtle time resolution and spatial resolution, and the top link is the roughest time resolution and spatial resolution. The lowest links are usually connected with specific details. The objects they operate on are usually specific things. When imitating these links, they often involve specific things. At higher levels, the objects they operate are usually concepts and abstract concepts. The opportunities for them to be imitated are wider. When a machine imitates, it usually starts with a large time resolution and a large spatial resolution, first using the concept to imitate, and then unfolding the concept layer by layer. When understanding information, you may only need to expand this tower structure to a specific image (the level where the machine can use similarity for comparison, which is the underlying language of the machine to process information). When imitating execution, it may be necessary to expand this tower structure to the bottom experience of the machine (the bottom experience is the machine through preset programs, call experience parameters to imitate the experience of emitting a single syllable or making a single action).

Process characteristics are usually dynamic processes involving large space and long time. The specific details of its implementation are closely related to the environment, so it is difficult to find similarities. But these links are usually represented by language symbols. Therefore, when we look for a process feature, we can first look for the repetitiveness of the language symbols of each link involved in the process of each trip to the airport. Each time the machine goes to the airport, the language symbols corresponding to each link form a gradually unfolding tower-shaped conceptual relationship. For example: the top level of this concept is "going to the airport", the next level is "ready to go", "on the way", "arrival", and the next level is "preparing luggage", "finding a car", "farewell to friends", "By car", "On the way", "Arriving at the airport garage", "Out of the garage", "Arriving at the airport entrance". The next level is "Prepare clothes", "Prepare toiletries", "Prepare money", "Prepare related materials".... This process can be subdivided continuously. At the beginning, the distinction of each link can be arbitrary. But every time we go to the airport, we get a tower-shaped conceptual organization. This tower-shaped conceptual organization goes through a memory and forgetting mechanism, and finally at each resolution level, only a small amount, indispensable, and frequently appearing concepts can be retained in memory. They are process characteristics at the corresponding resolution. These process characteristics are a series of concepts, organized in a temporal and spatial order. Especially on the ground floor, usually only static feature maps and dynamic feature maps that may be available every time you go to the airport can be left. These feature maps are few in number, but they are indispensable. These are static feature maps or dynamic feature maps that represent key links, such as "security check" or "boarding". The upper-level concepts connected to the key links are also indispensable (they may be fewer in number). Push upwards one by one, and in the end there is only a top-level concept of "going to the airport". Therefore, the establishment of process characteristics is realized through the mechanism of memory and forgetting from positive selection (the link deliberately memorized by learning from other people's experience) and adverse selection (the upper link corresponding to something every time).

These preserved tower-shaped concepts and underlying feature maps are the objects of imitation every time we go to the airport. We only need to put specific things in the real environment into the characteristics of this process according to the analogy method, and we can build up the ability to plan goals at all stages of going to the airport from anywhere. In the specific implementation, it is necessary to use segmented imitation to unfold these abstract concepts layer by layer, adding more links in line with the reality. In this way, we have established the ability of the machine to go to the airport in a variety of different environments.

The essence of segmented imitation is a process of reorganization using memory and input information, and it is a process of creation. It uses some dynamic characteristics and process characteristics in memory to organize one or more reasonable processes together with the input information. The content that can exist for a long time in the memory is usually the content that is often used, such as dynamic features and process features. Because they have nothing to do with specific objects, they are widely used. They are common words, common actions, or common ways of expressing and organizing, etc. These frequently used combinations are equivalent to the process framework of things, scenes, and processes. They are formed by the survival of the fittest through memory and forgetting mechanisms. The machine borrows these process frameworks and adds its own details to form a variety of new processes. The machine removes the low memory value and the static feature map that has nothing to do with reality by taking the most relevant memory it finds, and the rest is the required framing process. Then fill in the actual information in the frame. This process is called segmented imitation. Segmented imitation is an iterative process. Each upper-level link is expanded into multiple lower-level links that meet realistic conditions through segmented imitation. Then in the process of imitation, continue to use the same method to expand each lower-level link into multiple lower-level links that meet the realistic conditions. This process continues to iterate until the machine can actually take action.

6. Expansion of the network of relationships.

The expansion of the relationship network is to use the concept as the object of operation to establish the relationship network.

In the application of the present invention, the third basic hypothesis is "the feature map in the same segment of memory, and the strength of the connection relationship between the two feature maps is positively correlated with the memory value of the two feature maps in this memory". Here, we believe that this assumption holds true for the concept as well. The purpose of this: 1. To treat language symbols as entities. Language symbols (phonetic or text) in the same memory, their relationship is positively related to the memory value of these two symbols in this memory (not necessarily a linear relationship). 2. Treat concepts as entities. Introduce dynamic feature diagrams (including relational concepts) and process features for conceptual operations. These operated concepts include the motivation of the machine, the demand type and state data of the machine, as well as the emotional type and state data of the machine. They are all connected with other information in the same memory.

Since humans have accumulated a large amount of relationships (knowledge) between concepts, the relationships between concepts are largely obtained directly through learning. The specific method is: the machine first learns the concepts of those specific things, and connects the language symbols of these concepts with the information forms that the machine can use for calculations (forms other than language symbols, such as images, sounds, smells, touch and other sensor forms) stand up. The method of connection is: 1. When these messages occur, they are given language symbols at the same time. 2. Directly learn the explanation of these concepts. The interpretation of a concept is what the concept contains. In this way, these concepts are connected through indirect methods and the form of information that machines can use for computing.

Among the specific methods of machine learning, one method is to imitate human learning and help memory through repetition. It is to let the language and the corresponding content of the language appear in a memory, and use repetition to improve the memory value. In order to improve efficiency, humans can directly give machines related memories. For example, the various languages (including different languages, dialects, and intonations) and various images of "wheat" under the concept of "wheat" are directly put into the memory of the machine, and they are given high memory value, so that the machine Directly have the ability to identify "wheat". Furthermore, it is possible to put all kinds of knowledge about "wheat" into the same memory. It can also be put into different memories, and these different memories can be connected by common information related to "wheat" to form a large network. With this form of memory implantation, the learning efficiency of machines will far exceed that of humans. Since all knowledge exists in memory in the application of the present invention, different machines can directly share memories and use these memories in the same way. Therefore, the method proposed in the present application can create intelligence with far beyond the knowledge possessed by individual humans.

The expansion of relationship concepts refers to the use of linguistic symbols to represent the relationships between concepts. These relationships include but are not limited to "contains or partially contain", "juxtapose", "opposite", "overlap or partially overlap", "turn", " "Repetition", "arrangement", "symmetry", "increase", "decrease", "gradual change", "mutation", etc. are methods that indicate the relationship between things. The way humans learn these relationships is to use dynamic features to represent the relationships between objects. For example: when we learn about the relationship "increase", we remember a lot of the process of the relationship "increase". In these processes, the language symbol "increase" has appeared, and the dynamic feature of adding has also appeared, but the objects of dynamic feature operations may be different. For example, it started with "water", "milk", and "food", then "exam scores" and "banknotes". Later, we discovered that the operation objects could also be "love", "time" and "life". There is no entity, so we use the memory and forgetting of these relationships. In these relationships, we find that the shared feature map with the highest memory value is a dynamic feature map (a dynamic pattern with an increasing number), and other parts of the shared feature map The feature can only be that they are all "some kind of objects", so that we can connect the language symbol of "addition" with a form (dynamic feature map) that can be understood by machines, and can be generalized to any concept. In this way, the machine can correctly understand and use the language symbol "increase". It is with the relationship of generalization that we can feel the beauty brought to us by symmetry, parallelism and rhythm in literature. This is because they are the same as the dynamic relationship adopted by real objects. Evolution has given us the ability to appreciate the beauty of these dynamic patterns and triggers our corresponding emotions. In the present application, we record emotions and all related information in the same memory. When the machine faces the same beautiful literary forms, through the network of relationships, the dynamic features contained in these forms will also transfer activation values to the corresponding types of needs (such as symmetrical beauty), and the satisfaction of needs will also affect emotions. In this way, the machine can also feel the beauty of literature.

In order to improve search efficiency, we can separate the relationship network from memory and build a separate network. One possible method is to first establish a connection line for the feature maps in each memory frame, and their connection value is a function of the memory value of the feature maps at both ends of each connection line. Then normalize the connection value sent by each feature map. This will cause the connection values between the two feature maps to be non-symmetrical. Then the similar feature maps between the memory frames are connected according to the degree of similarity, and the connection value is the similarity. After passing the above steps, the obtained network is the cognitive network extracted from the memory bank. We can put the cognitive network alone in a quick search library (a kind of memory library) for some instinctive responses that require fast, such as in autonomous driving applications, or in some simple smart applications (such as production lines) . The memory and forgetting in the cognitive network adopt the mechanism of remembering and forgetting the connection value: each time the relationship is used, the connection value increases according to the memory curve. And all the connected values decrease with time according to the forgetting curve. It should be pointed out that establishing a separate relationship network in any way, as long as the relationship network is based on the basic assumptions proposed in the present application, is a variant of the relationship network in the present application, and is the same as in the present application. There is no essential difference between the proposed relationship networks, so they are still in the claims of the present application.

7. Understanding and response to the input information.

The machine's processing of input information is carried out by imitating their or their own experience. Imitation is the ability of human beings to exist in genes. For example, for a babbling child, if every time he (she) returns home, we greet him (her) and say "you are back." After several times, when he (she) goes home again, he (she) will take the initiative to say "you are back". This shows that he (she) has begun to imitate others to learn without understanding the meaning of the information.

In the same way, we let machine learning use the same method. The machine also imitates the experience of others or its own to understand and respond to the input information. The specific method is:

When inputting information, the machine first finds one or more segments of the most relevant memories in the memory. These memories are past responses to similar input information, or past responses to multiple pieces of information that are partially similar to the input information. The sender of these responses can be either the machine itself or other things. The machine takes the most frequently-occurring response between itself and the information source and related to the input information as the purpose of the information source. If there is no frequent interaction between the machine and the information source, then the machine considers the response most used by others as the purpose of the information source. This is reasonable, because the purpose of the information source is to get a response. The information source has preset possible responses based on its own experience. These pre-determined responses are established based on the interaction between the information source and the machine or the interactive experience of the information derived from others. When the machine understands the purpose of the information source, it also understands the input information.

After the machine understands the purpose of the information source, the machine needs to establish a corresponding response. The method for a machine to establish a response is: the machine finds out the process characteristics of these responses in one or more segments of the most relevant response memory. Process characteristics are dynamic processes, and they have nothing to do with the specific objects of dynamic process operations. Therefore, the past experience can be generalized by the dynamic process machine. If the machine uses the dynamic process in its own experience, the machine can replace the dynamic process objects in the memory with the input information by referring to the common actions in the memory and the connection relationship of the objects by adopting the principle of the same attribute under the same concept. Object. If the machine uses the dynamic process in the experience of others, the machine needs to first replace others with itself according to the principle that the same attributes can be substituted under the same concept, and then refer to the connection relationship between the commonly used actions and objects in the memory, and replace the memory in the memory. The dynamic process object is replaced with the object in the input information. A more concise way to achieve the above purpose is to remove the most relevant memory found, remove the feature maps with low memory value, and remove the static feature maps that are not related to the input information, and then use the remaining part as a process framework. This kind of process framework is composed of process characteristics plus action objects that match reality in memory. In the same way, the machine can establish a reasonable information response after bringing in suitable objects through the generalization ability of dynamic characteristics. The basic assumption established by the above method is that dynamic processes are usually independent of the object, and they are repeated more frequently in life, so the memory value is usually higher. Deleting the content with low memory value usually means deleting the details with few repetitions, and what remains is the correct process framework.

The machine needs to make an assessment of “seeking advantages and avoiding disadvantages” of the responses it has established. Only after the evaluation is passed will it be output. The method of evaluation is to assume that the output has occurred, the machine is in memory, and the feedback memory obtained after the output of the search and hypothesis has occurred. The machine may find the feedback memory in a completely similar situation, or it may not have the feedback in a completely similar situation, but the machine can always find the feedback memory in a partially similar situation. These memories may be about yourself, or they may be about others. The machine replaces these memorized objects with itself, and uses the relational network to judge: if these responses do occur, then what kind of demand status changes it might get. Therefore, according to the principle of "seeking advantages and avoiding disadvantages", it is determined whether the planned response is truly output. If the evaluation fails, the machine will seek to eliminate static things or dynamic processes that bring negative results, and after eliminating them, it will construct its own output response again in the same way. This process will be repeated until it can find a response that passes the "prosperity and avoidance" assessment. If it still cannot be found, the machine enters the process of "nowhere to process information".

In the application of the present invention, we propose an information processing process as shown in Figure 1: S1 is a machine that selects information features according to different resolutions, and establishes an algorithm for extracting information features from input data. S2 is that the machine uses the algorithm in S1 to extract the features in the input information and establish the environment space. S3 is an explanation of the concept and the process of establishing a network of relationships. S4 is the machine looking for the memory related to the input information sequence through the relational network. Based on these memories, the machine infers the purpose of the information source. S5 is that the machine combines its own response plan based on its own experience, and evaluates different response plans through the evaluation system to determine the final choice. S6 is the machine imitating its own experience (it can be the extraction of its own past memory; it can also be obtained by others, such as others informed, knowledge learning, etc.), using the method of segmented imitation to expand the concepts layer by layer until static Feature map and dynamic feature map. Then the machine imitates experience and combines these static feature maps and dynamic feature maps into a series of language or action responses of its own. This completes an information processing process. S7 is a database update process that runs through the entire information processing flow.

It should be pointed out that in the disclosure of the present application, machine learning materials can also be obtained from materials outside of their own memory, including but not limited to expert systems, knowledge graphs, dictionaries, network big data, etc. These materials can be input by the sensors of the machine or directly implanted by manual methods. But they are all handled as memories in machine learning. It should be pointed out that all the learning steps proposed in the application of the present invention do not have a time division line, they are interwoven with each other, and each step has no priority. The machine's feedback on the information processing process is processed in accordance with the new input information. Therefore, this process continues to iteratively, which constitutes the process of interaction between the machine and the outside world. The most essential difference between the intelligence displayed by the machine in this process and the existing machine intelligence is: 1. The machine intelligence proposed in the present application and the process of responding to information is based on its true understanding of the information. Instead of mechanical imitation. 2. The machine intelligence proposed in the present application can be seen, understood, and intervened in every step for humans. Therefore, the machine intelligence proposed in the present application is controllable and controllable for humans. Understandable. The current artificial intelligence's information processing process for machines is more of a black box theory. 3. The machine intelligence proposed in the present application can have emotional responses similar to humans.

It should also be pointed out that the recognition and response of the machine to the input information is not only related to the relationship network, but also related to the "personality". The "personality" here refers to the preset parameters of the machine. For example, a machine with a low activation threshold likes to produce associations, takes a long time to think, considers more comprehensively, and may be more humorous. A machine with a large temporary memory bank is easy to remember many "details". For example, when making a decision, how much higher the activation value is than the noise floor of the activation value is considered "highlighted", which is a threshold. A machine with a high threshold may be indecisive, and a machine with a low threshold may be easier to follow intuition. Another example is the similarity between two node feature maps (which can be specific things, pronunciation, text, or dynamic processes). Even if they are similar, this determines the analogy thinking ability of the machine, which determines whether the machine belongs to a serious personality or a humorous one. machine. Different memory and forgetting curves, and different activation value transfer curves all bring about different learning effects of the machine.

It should also be pointed out that through the method described in the present application, the cognition learned by the machine is closely related to the learning experience of the machine. Even if the learning materials are the same and the learning parameter settings are the same, but the learning experience is different, the cognition formed by the machine may be very different. For example: our native language may be directly connected to the feature map. The second language may be connected to the native language first, and then indirectly connected to the feature map. When you are not proficient in the second language, it may even be a process from the second language to the second language, to the native language, and then to the feature map. When using such a process, the time required is greatly increased, resulting in the machine being unable to proficiently use the second language. Therefore, the machine also has the problem of native language learning (of course, it can also be artificially implanted to directly allow the machine to acquire the ability to use multiple languages). Therefore, the machine learning method described in the present application is not only related to machine learning materials, but also closely related to the machine's learning order of these materials.

On the basis of the application of the present invention, whether to use different memory and forgetting curves, whether to use chain activation as the search method, whether to use different activation value transfer functions, whether to use different activation value accumulation methods, whether to also use memory and forgetting Other relationship extraction mechanisms other than the mechanism, whether to use different activation thresholds in chain activation, whether to use different "highlight" thresholds, whether to use different activation value noise floor calculation methods, whether to use multiple chain activations When using a different time sequence for the nodes, whether to use a different time sequence for the nodes in a single chain activation, different initial activation value assignment methods, or even different hardware configurations (such as computing power, memory capacity, etc.), Which native language is used for learning, whether manual intervention is used to obtain cognition, etc. These differences are the specific preferred methods proposed in the application of the present invention to realize the general artificial intelligence framework, which can be obtained through the knowledge of the industry To achieve this, these do not affect the claims filed in the present application.

Description of the drawings

Fig. 1 is a schematic diagram of the information processing process proposed in the application of the present invention.

Figure 2 is a schematic diagram of information feature extraction methods at different resolutions.

Figure 3 is the process in which the machine processes the input information and uses the information to establish the environment space.

Figure 4 is the process of information processing in the relational network.

Figure 5 is the process of the machine establishing a response.

Figure 6 is a schematic diagram of a module for realizing general machine intelligence.

Detailed ways

The application of the present invention will be further described below in conjunction with the drawings and specific embodiments. It should be understood that the text of this application mainly proposes the main steps to realize general artificial intelligence. Among these main steps, each step can be implemented using currently known structures and technologies. Therefore, the focus of this application text is on these steps and their composition, rather than being limited to the details of implementing each step using known technologies. Therefore, the description of these embodiments is only exemplary, and is not intended to limit the scope of the application text. In the following description, in order to avoid unnecessarily confusing the key points of the text of this application, we omit the description of well-known structures and technologies. All other embodiments obtained by those skilled in the art without creative work shall fall within the scope of protection of the text of this application.

1. Preliminary preparation for machine information processing.

1.1 Selection of information features.

We believe that in our world, there cannot be two exactly the same things. When we say that two objects are of the same kind, we mean that they are the same at the information resolution we use. Therefore, in the application of the present invention, we need to gradually use different resolutions to identify information from details to abstraction.

At the same time, we believe that in the history of evolution, when organisms recognize information, they evolve in the direction that saves most energy. Because for living things, saving energy consumption means a higher chance of survival. Therefore, we also introduce this idea into machine learning.

Combining the above two aspects, we propose that the selection criteria for information features are: 1. These features are widely present in our world. In this way, we can reuse these features in the information processing process, which saves energy the most. 2. The same data has different data characteristics under different resolutions. In this way, we can compare the similarities between the two at different resolutions.

1.2 The establishment of information characteristics.

We propose a method for establishing information features as shown in Figure 2. In S201, the input data is divided into multiple channels through a filter. For images, these channels include specific filtering for the contour, texture, tone, and dynamic mode of the graphic. For speech, these channels include filtering for speech recognition such as audio composition and pitch change (a dynamic mode). These preprocessing methods can be the same as the existing image and voice preprocessing methods in the industry, so I won't repeat them here.

S202 uses a specific resolution window for the data in each channel to find local similarity. This step is to find the common local features in the data window for the data of each channel, while ignoring the overall information. In step S202, the machine first uses a local window W1, and searches for local features that are commonly present in the data in the window by moving W1. For images, local features refer to those locally similar graphics that are commonly found in graphics, including but not limited to the lowest-level features such as points, lines, surfaces, gradients, and curvatures, and then the local edges formed by the combination of these lowest-level features , Local curvature, texture, hue, ridge, vertex, angle, parallel, intersection, size, dynamic mode and other local features that are commonly found in graphics. For speech, it is similar audio, timbre, tone, and their dynamic patterns. The same is true for other sensor data, and the criterion for judgment is similarity.

It needs to be pointed out here that windows of different resolutions can be time windows or space windows, or a mixture of the two. When comparing the similarity of the data in the window, the similarity comparison algorithm is used. In the similarity comparison algorithm, it may involve preprocessing the data again, and may involve the use of segmentation and comparison on the data again. Different windows correspond to different resolutions. The similarity comparison algorithm at each resolution requires practice. Preferred. This step is equivalent to our attempt to achieve human innate feature extraction capabilities. The human feature extraction ability is established through constant trial and error in the process of evolution. Similarly, in the application of the present invention, the machine also needs to be assisted by humans to establish similarity comparison algorithms at different resolutions through constant trial and error. Although these algorithms need to be optimized through practice, these algorithms themselves are very mature algorithms that can be implemented by professionals in the industry based on public knowledge, so I will not repeat them here.

The machine puts the found local similar features into a temporary memory bank. Every time a new local feature is added, its initial memory value is assigned. Every time an existing local feature is found, the memory value of the underlying feature in the temporary memory bank is increased according to the memory curve. The information in the temporary memory bank complies with the memory and forgetting mechanism of the temporary memory bank. Those low-level features that survived in the temporary memory bank, after reaching the threshold of entering the long-term memory bank, can be put into the feature library as long-term memory features. There can be multiple long-term memory banks, and they also follow their own memory and forgetting mechanisms. In S203, the partial windows W2, W3,..., Wn are successively used, where W1<W2<W3<...<Wn (n is a natural number), and the steps of S202 are repeated to obtain the bottom layer features.

In S1, the machine not only needs to build a bottom-level feature map database, but also needs to build a model that can extract these bottom-level features. In S204, it is a low-level feature extraction algorithm model A established by the machine. This algorithm model is an algorithm for finding local similarities: comparing similarity algorithms. In S205, it is another algorithm model B that extracts the underlying features. It is an algorithm model based on a multilayer neural network. After this model is trained, it is more efficient than the similarity algorithm.

In S205, the machine uses the selected information features as possible outputs to train the multilayer neural network. Since there are not many information features at the bottom level, for example, in an image, it is mainly the most essential features such as points, lines, surfaces, gradients, and curvatures, and then the image features combined by these features. So we can use a layer-by-layer training method. In S205, the machine first uses the local window W1 to select the data interval, and uses the data in the interval to train the neural network. The output of the neural network selects information features selected at a resolution similar to that of the W1 window.

In S206, the machine then successively uses the local windows W2, W3,..., Wn, where W1<W2<W3<...<Wn (n is a natural number) to train the algorithm model. In the optimization, one is to increase the neural network layer from zero to L (L is a natural number) layer on the corresponding previous network model every time the window size is increased. When optimizing the neural network with the added layer, there are two options: 1. Only optimize the increased zero to L (L is a natural number) neural network layer each time; in this way, the machine can superimpose all network models to form An overall network with intermediate outputs. This is the most efficient calculation. 2. Copy the current network to the new network every time, and then optimize the new network that adds zero to L layers. In this way, the machine finally gets n neural networks. Each neural network model corresponds to a resolution. When extracting features in information, the machine needs to select one or more neural networks according to the purpose of extracting information this time. Therefore, in S207, the machine may obtain two kinds of neural networks for extracting information features. One is a single algorithm network with multiple output layers. Its advantage is that it requires less computing resources, but its ability to extract features is not as good as the latter. The other is multiple single-output neural networks. This method requires a large amount of calculation, but the feature extraction is better.

It should be pointed out that the above method can process images and voices, and can also process information from any other sensors in a similar way. It should also be pointed out that choosing different resolutions means choosing different windows and different feature extraction algorithms. So the extracted feature size is also different. Some underlying features may be as large as the entire image. Such underlying features are usually background feature maps of some images or specific scene feature maps.

The extraction of dynamic features takes the things in the spatial resolution window as a whole, which can be considered as a mass point to extract the similarity of its motion trajectory. When the motion trajectories are determined, these trajectories can be viewed as static data. Therefore, the selection of motion features and the extraction algorithm of motion features are similar to static data. The rate of change is a motion feature extracted by time resolution (time window). It samples the entire process according to time, and determines the rate of change by comparing the similarity differences of the motion trajectories between different samples. Therefore, the motion feature has two resolutions. One is space. We use a spatial sampling window to realize the data in the window as a mass point. One is time. We sample through the time window, and determine the rate of change of motion through the changes in the motion trajectory in these samples.

2. The machine processes the input information and establishes the environment space.

Figure 3 is the process in which the machine processes the input information and uses the information to establish the environment space. In S301, the machine determines the resolution it needs and the information interval that it needs to recognize.

When the machine needs to process the input information, the machine first needs to determine the resolution it needs and the interval that needs to be recognized according to the inheritance target. The inherited goal comes from the unfinished goal that the machine produces in the previous information processing process. Machines usually have common time and space resolutions for these inherited targets, and this information is all stored in memory. Similarly, the interval that needs to be identified is also the result of the previous information processing process of the machine. This is the behavior of the machine consciously to recognize a specific interval. For example, in the last information processing cycle, the response generated by the machine was "further identification of information in a specific interval." If the machine does not inherit the target and plan to identify the interval, then the machine may randomly choose a coarser resolution to identify the surrounding environment under the underlying motivation of "safety requirements".

S302 is a process for the machine to extract information features. When the information is input to the machine, after the information is preprocessed through multiple channels of information, the machine extracts features for each channel of information according to the resolution chosen by itself. The extraction method here is to follow the process from S201 to S207, but here the machine does not need to use different resolutions to extract the same data again, and only needs to use either feature extraction algorithm model A or algorithm model B. .

S303 is the process of the machine establishing the environment space. It is precisely because we need to preserve the similarities and environmental relationships between things, so we use a method called environmental space to store data. When the machine extracts information features from the input, the machine needs to use these features to build the environment space. The machine first adjusts the position, angle and size of the underlying features according to the position, angle and size with the highest similarity to the original data by scaling and rotating the extracted features, and places them overlapping the original data so that these can be retained. The relative position of the underlying features in time and space, and the establishment of the environmental space. When the machine is recalling the memory, it can use the input of different angle sensors, such as video and audio, to use the parallax or the auditory difference to reconstruct the three-dimensional environment space. At the same time, the machine also uses the size comparison between the input feature map and the memory feature map to assist in the establishment of a three-dimensional depth of field.

Because gravity sensing is continuous input information, it exists in all memories. It has connections with all things in memory, and these relationships are optimized by memory and forgetting mechanisms. The directional relationship between these images and gravity sensing is widespread in these memories, so we are very sensitive to upside down, but not so sensitive to left and right upside down. This is because upside down leads us to break away from the familiar combination of feature maps and the direction of gravity. When we use the extracted feature maps to overlay the input data and establish the environment space, a default reference coordinate system is the direction of gravity. When it is upside down, it breaks away from the memory stacking method, making the object's local coordinate system and the entire large coordinate system mismatch when borrowing past experience to place it. This makes us have to improve our attention for the second recognition. In the second time, we may find the corresponding feature map by expanding the memory search range or rotating the angle, which requires us to pay more attention. This is why we are so sensitive to upside down.

S304 is a process in which the machine stores other relevant information in the memory. There are three types of data in the memory stored by the machine, and each type has its own memory value. The first category is the information characteristics of external input, including the characteristics of all external sensor input information. They include visual, auditory, smell, touch, taste, temperature, humidity, air pressure and other information. These information are closely related to the specific environment. They are based on the original The organization of data storage can reconstruct the three-dimensional environment space; they maintain their memory value according to the memory and forgetting mechanism. The second category is internal self-information, including power, gravity direction, body posture, operation of various functional modules, etc. These information have nothing to do with the environment, and their memory values are set according to a preset program. The third category is data on the state of machine needs and needs, including data such as safety value, dangerous value, profit value, loss value, goal achievement value, dominance value, and own body state evaluation value; it also includes data related to these needs and needs. Status data. At the same time, the machine also generates various emotions based on the satisfaction of its own needs. The relationship between these emotions and the situation where one's own needs are met is set through a preset program. At the same time, the machine can also reversely use the relationship between internal conditions, external conditions and the state in which its own needs are met to adjust the preset program parameters of emotion generation, thereby using its own emotions to influence the outside world. In order to achieve this goal, the method we adopted is to establish different symbolic representations of the machine's own demand type and emotional type. When an event occurs in the environment space of the machine, the machine needs to store the current environment space in the memory bank. The machine stores all feature maps (including feature maps, demand symbols, and emotional symbols) and their initial memory values (positively correlated with the activation value when the storage occurs, but not necessarily linear) in memory. We call the memory value obtained by the demand symbol and the demand symbol together as the demand state.

The requirements of the machine can be varied, and each type of requirement can be represented by a symbol. Such as safety and danger, gains and losses, dominance and dominance, respect and neglect, etc. The difference and amount of the demand types do not affect the claims of the present application. Because in the present application, all requirements are handled in the same way.

The emotions of the machine can be varied, and each type of emotion can be represented by a symbol. Such as excitement, anger, sadness, tension, anxiety, embarrassment, boredom, calmness, confusion, disgust, pain, jealousy, fear, happiness, romance, sadness, sympathy and satisfaction. The difference and amount of emotion types do not affect the claims of the present application. Because in the present application, all emotions are handled in the same way.

S305 is a memory screening mechanism used by the machine to store the environment space: an event-driven mechanism and a temporary memory bank mechanism. In the environment space, every time an event occurs, the machine takes a snapshot of the environment space and saves it. The preserved content includes features in the environment space (including information, machine states, needs, and emotions) and their memory values. Their memory value is positively related to the activation value when the storage occurs, but not necessarily linear. A snapshot of the environment space stores data, which we call a memory frame. They are like movie frames. Through continuous playback of multiple frames, we can reproduce the dynamic scene when the memory occurs. The difference is that the information in the memory frame may be forgotten over time. An event in the environmental space means that the combination of features in the environmental space and the previous environmental space have a similarity change that exceeds the preset value, or the memory value in the environmental space has changed beyond the preset value. Memory bank refers to the database that stores these memory frames. The temporary memory bank is a kind of memory bank, and its purpose is to filter the information stored in the memory frame. In the temporary memory bank, if a memory frame contains features whose memory value reaches the preset standard, then this memory frame can be moved to the long-term memory bank for storage. In the application of the present invention, we use a limited-capacity stack to limit the size of the temporary memory bank, and use the fast memory and fast forgetting methods in the temporary memory bank to screen the materials to be put into the long-term memory bank. When the machine is faced with a large amount of input information, those things, scenes and processes that are already accustomed to, or things, scenes and processes far away from the focus of attention, the machine lacks the motivation for in-depth analysis of them, so the machine may not recognize these data, or The activation value assigned to them is very low. When the machine stores information in the temporary memory bank in an event-driven manner, the memory value assigned by the machine to each information feature is positively correlated with the activation value when the storage occurs. Those memories with low memory value may soon be forgotten from the temporary memory bank and will not enter the long-term memory bank. In this way, we only need to put the information that we care about into the long-term memory, instead of memorizing the trivial things that do not need to extract the connection relationship every day. In addition, because the capacity of the temporary memory bank is limited, the temporary memory bank will passively accelerate the forgetting speed because the stack capacity is close to saturation.

3. The establishment of a network of relationships.

Although the relationship between things looks complicated, it is difficult to classify and describe. However, in the present application, we propose a method to describe the relationship between things: 1. Extract the similarity relationship between things; 2. Extract the environmental relationship between things. In the application of the present invention, we only need to extract these two relationships, and do not need to analyze other relationships.

The similarity relationship refers to the first hypothesis proposed in the application of the present invention: "If some attributes of two pieces of information are similar, other attributes contained in this information may also be similar." According to this basic assumption, the machine establishes classification based on the similarity of features at different resolutions. These classifications include static attribute classification and dynamic attribute classification.

The environmental relationship refers to the other two basic assumptions proposed in the application of the present invention: "things in the same environment have a connection relationship with each other", "the feature map in the same memory, and the strength of the connection relationship between any two feature maps" The memory value of these two feature maps in this memory is positively correlated (not necessarily linear)". It needs to be pointed out that memory also contains demand information and emotional information. In this way, the information in the same memory frame constitutes a local area network. The information in these local area networks is connected with other local area networks (other memory frames) through similarity, and their connection strength and similarity are positively correlated (not necessarily linear).

The relationship between the two high memory values in the same local network is close, but the two high memory value feature maps A and feature map B in two different memory local area networks are connected through the local area network 1 The feature map A inside is connected to the feature map B in the local area network 1, and then the feature map B in the local area network 1 is connected to the feature map B in the local area network 2. Although A has a high memory value in local area network 1, and B has a high memory value in local area network 2, there is no close connection between them. This reflects that although feature map A and feature map B often appear repeatedly, they rarely appear together, which reflects that the connection between them is not close, and it can also reflect the actual situation in life. For example, taking a bath is something we keep repeating in our lives, and driving is also something we keep repeating in our lives, but the two rarely appear in the same memory, so the connection between them is not close. When inputting information about bathing into our brains, it is difficult for us to directly think of driving up. Bath and water, shampoo, soap and bath towels frequently appear together in a memory, so they are more closely related in the same memory. When the information about bathing is input into the brain, through the transmission of multiple memories, water, shampoo, soap and bath towels will all be activated. We only need to accumulate these activation relationships to clearly reflect the closeness of the connections between things. It should be pointed out that when the activation values of the same feature map in different memories are accumulated, the specific accumulation algorithm needs to be optimized through practice, such as adding or accumulating according to the memory curve, or other accumulation functions. In this way, we have established a three-dimensional network of relationships through stored memories. The storage of this network is in chronological order, but the use is global.

In the application of the present invention, the machine only needs to maintain the memory value in the memory frame to automatically establish a relationship network without special processing. The following respectively explains how to maintain the memory value of the three types of data in the memory frame.

In the relational network, the concept is a local network composed of closely connected feature maps. The attributes of a concept are all feature maps and their combinations contained in the concept. These feature maps may contain many similar image features and combinations in memory. In addition to images, they may also be voice, smell, touch, and so on. These feature maps obtain activation values from each branch of the relationship network and transmit them to speech or text (because they are used most frequently and have the highest memory value), so usually in the partial network of concepts, we use speech or text to represent concepts. Therefore, the machine can determine the range of the concept represented by a language symbol or a feature map by setting a requirement for the tightness of the connection value.

In the process of comparing the similarity between the input feature map and the feature map in the relational network, the machine may need to deal with the problems of size scaling and angle matching. One processing method includes: (1) The machine memorizes feature maps of various angles. The feature map in memory is a simplified map created by extracting the underlying features of each input information. They are the common features of similar things retained under the relationship extraction mechanism. Although they are similar to each other, they may have different viewing angles. The machine memorizes the feature maps of the same thing in life but from different angles to form different feature maps, but they can belong to the same concept through learning. (2) The machine uses views from all angles, overlaps the common parts of these feature maps, imitates their original data, and combines them to form a three-dimensional feature map. (3) Embedded in the machine the view change program after the size scaling and spatial rotation of the stereo image. This step is a very mature technology in the industry, so I won't repeat it here. (4) When the machine searches for similar underlying features in the memory, it includes searching for a feature map that can be matched after spatial rotation in the memory. At the same time, the machine saves the feature map of the current angle in memory, keeping the original angle of view. When the underlying features with similar perspectives are input again later, they can be quickly searched. Therefore, in this method, the machine uses a combination of different perspective memory and spatial angle rotation to find similar feature maps, which will bring us to the phenomenon of faster recognition of familiar perspectives. Of course, the machine can also only use the method of comparing the similarity after rotating the space angle.

Search for feature maps in the memory frame through similarity comparison, and mark every time one is found. In order to improve efficiency, the machine can search only those memory frames that contain memory values greater than the preset value. When the mark contained in a certain concept in the memory reaches the preset threshold, it is considered that it may be a candidate for the corresponding concept. The machine refers to the feature combination contained in this concept to segment the input features, and further compares the similarity of the feature combination between the two. This process continues, and all concept candidates can be found. Then according to the degree of connection between these feature map candidates, in the case of multiple candidates corresponding to one input, the concept that is most closely connected to other information is selected as the most likely concept. They are the focus of attention. This is the recognition of input information. result. Here, we define focus as the concept most relevant to the input information.

The above process can determine the concept based on the label and the connection relationship after all the input features are processed, or it can be recognized first when any feature map reaches the preset standard. In this process, whenever a feature map similar to the input is found in the memory, its memory value is increased according to the memory curve. This updates the network of relationships in memory.

In addition to the similarity comparison, another method for finding and inputting related concepts is proposed in the present application: the chain activation method. This is a method for searching feature maps, concepts and related memories based on the relational network proposed in the application of the present invention. In the relational network, when the feature map i is given an initial activation value, if this value is greater than its preset activation threshold Va(i), then the feature map i will be activated, and it will pass the activation value to the connection relationship with it Other feature map nodes; if a feature map receives the passed activation value and accumulates its own initial activation value, and the total activation value is greater than the preset activation threshold of its own node, then it will be activated, too. The activation value is transferred to other feature maps that have a connection relationship with itself. This activation process is passed on in a chain until no new activation occurs, and the entire activation value transfer process stops. This process is called a chain activation process; in a single chain During the activation process, but after the activation value transfer occurs from feature map i to feature map j, the reverse transfer from feature map j to feature map i is prohibited.

When chain activation is required, the machine assigns an initial activation value to the input information feature map according to its own motivation by giving the extracted input information feature map. These initial activation values can be the same, which can simplify the initial value assignment system. After these nodes get the initial activation value, they will start the chain activation process. After the chain activation process of all input information is completed, the machine selects the highest activation and highlights 1 to N (natural numbers) feature maps, and takes the concepts they represent as the focus. This method makes full use of the relationships in the relationship network and is an efficient search method.

Highlighting means: when chain activation is used as a search method, if the activation value of some feature maps is higher than the activation value of the entire relationship network by a predetermined threshold, then we consider these feature maps to be "highlighted" . The activation value noise floor of the relationship network can be calculated in different ways. For example, the machine can use the activation value of a large number of background feature map nodes in the scene as the activation value noise floor. The machine can also use the average value of the activation values of the currently activated nodes as the noise floor. The machine can also use its own preset number as the activation value noise floor. The specific calculation method needs to be optimized in practice. These calculation methods only involve basic mathematical statistical methods, which are well-known knowledge for practitioners in this field. These specific implementation methods do not affect the framework claims for the methods and steps of the present application.

It needs to be pointed out here that due to the activation threshold, even if the transfer coefficient between the feature maps is linear, the cumulative function of the feature maps is also linear, but due to the existence of the activation threshold, whether it is in a single chain activation process or In the process of multiple chain activation, the same feature map and the same initial activation value, but because the activation order is selected differently, the final activation value distribution is different. This is because of the non-linearity caused by the existence of the activation threshold. Different transmission paths bring different information losses. The preference of activation order selection is equivalent to the difference in machine personality. Therefore, under the same input information, different thinking results are produced. This phenomenon is consistent with human beings.

In addition, the strength of the relationship in the relationship network is related to the latest memory value (or connection value). Therefore, the machine will be preconceived. For example, if two machines with the same relationship network face the same feature map and the same initial activation value, one of the machines suddenly processed an input information about this feature map, then this machine is processing this additional piece of information Later, it will update the relevant part of the relationship network. One of the relationship lines may increase according to the memory curve. This increased memory value will not fade in a short time. Therefore, when facing the same feature map and the same initial activation value, the machine that processes the additional information will spread more activation values along the newly enhanced relationship line, which will lead to a preconceived phenomenon.

In addition, in order to process the sequence of information input reasonably, to ensure that the activation value brought by the information input later will not be shielded by the previous information. In the application of the present invention, the activation value in the chain activation will change over time. Decreasing. Because if the activation value in the relational network does not fade with time, the activation value changes brought about by the following information will not be obvious enough, which will cause interference between information. If the activation value does not fade, after the subsequent information is entered, it will be strongly interfered by the previous information, resulting in the inability to find one's focus correctly. But if we completely clear the memory value of the previous information, then we will lose the possible connection relationship between the two pieces of information before and after. Therefore, in the present invention, we propose to adopt a method of gradual fading to achieve a balance between the isolation and connection of the preceding and subsequent segments of information. This regression parameter needs to be optimized in practice. But this brings about the problem of maintaining the active state of a message. If we find the focus in S3, but in step S4, we are unable to complete the information understanding, or in S5, we are unable to find a response plan that meets the machine evaluation system. As time goes by, these activation values will be Will subside, causing the machine to forget these concerns and forget what it wants to do. At this time, the machine needs to refresh the activation values of these attention points again. One way to refresh is to turn these concerns into virtual outputs, and then use this virtual output as information input, and go through the process to emphasize these concerns. This is why we are thinking, why sometimes, when we don’t understand, or looking for When you are not in a train of thought, I like to mutter to myself or mutter in my heart. This kind of virtual input, like the real input process, can also search for memories and update memory values. Therefore, this method can be used for machines to deliberately increase the memory of certain information. This is the method of using reading aloud or silently to increase memory. In addition, in this case, if new input information appears, the machine has to interrupt the thinking process to process the new information. Therefore, from the perspective of energy saving, machines tend to complete thinking and avoid waste. At this time, the machine may take the initiative to send out buffer auxiliary words such as "Hmm...ah..." to send out output information, indicating that you are thinking, please do not disturb. Another possibility is that the thinking time given to the machine is limited, or there is too much information, and the machine needs to complete the information response as soon as possible. At this time, the machine can also adopt the method of output and then input. In this way, the machine emphasizes useful information and suppresses interference information. These methods are commonly used by humans, and in the application of the present invention, we also introduce them into the thinking of machines. The machine can determine whether the current thinking time exceeds the normal time based on the built-in program, or its own experience, or a mixture of the two, and need to refresh the attention information, or tell others that they are thinking, or emphasize the key points, and eliminate interference information.

In addition, in chain activation, in order to correctly determine the activation value transfer coefficient between the feature map and the feature map, one method is: although the strength of the connection value emitted by the same feature map is not limited to each other, in the activation process In order to correctly handle the relationship between the feature map and its attributes, the activation value transfer function of the feature map can be considered normalized transfer: assuming that the activation value of the feature map X is A, the sum of the connection values of all its emitting directions If it is H, its transfer value to the feature map Y is Txy, then a simple activation value transfer is Yxy=A*Txy/H. Among them, Yxy is the activation value transferred from the X feature map to the Y feature map.

Since the most frequent human communication is voice and text, in a local network of a concept, when other feature maps obtain activation values from each branch of the relationship network and transmit them to voice or text, the usual focus is on the concept Voice and text. Therefore, the virtual output of the machine's self-information filtering or emphasizing method is usually speech, because this is the most common output method. The machine outputs them the least energy. Of course, this is closely related to a person's growth process. For example, people who learn about life from books may convert information into words and then re-enter it.

The search method using chain activation uses the implicit connection relationship among the input information of language, text, image, environment, memory and other sensors to transfer activation values to each other, thereby allowing related feature maps, concepts and memories Support each other and stand out. The difference between it and the traditional "context" to identify information is that the traditional recognition method needs to manually establish a "context" relation database in advance. In the application of the present invention, we put forward the basic assumption of "similarity and implicit connection between information in the same environment". Based on this basic assumption, all kinds of relationships are simplified, allowing the machine to build a network of relationships on its own. It contains not only semantics, but also common sense. It should be pointed out here that chain activation is a search method, which itself is not a necessary step in the application of the present invention, and can be replaced by other search methods that can achieve similar purposes. When using chain activation, the machine can consider the feature map of each memory whose activation value exceeds the preset value as having been used once, and maintain their memory value according to the memory and forgetting mechanism in the memory bank to which the memory belongs.

Because in the memory frame, the machine not only stores external input information, but also stores two other types of information. They are the internal state data of the machine, the demand of the machine and the emotional data. In S402, the initial activation value assigned by the machine to the input information will also be propagated to the machine's demand and emotional data through the relational network, resulting in the machine's instinctive response to this information. The demand and emotional data of machines are a very important type of "anthropomorphic" data. It is closely related to external input information and one's own internal information. Their relationship is as follows:

When external data or internal data is input, the machine will respond, and these responses will get external feedback and change the internal state (for example, the battery becomes less). In the application of the present invention, we give the machine a need type similar to that of a human and a demand gain value that represents the situation in which the demand is satisfied. At the same time, in order to better communicate with humans, we use preset programs to connect the satisfaction of the machine's needs with the emotions of the machine. The machine only needs to store its own demand state and emotional state into memory when storing external information or internal state information. These demand states and emotional states will connect them with external input information and internal state information through the establishment of a relationship network mechanism. The connection strength is optimized by the memory and forgetting mechanism, and the machine can naturally learn the connection relationship between the demand state and emotional state and internal and external information, which is a very important part of the relationship network.

The specific implementation method can be: in the process of training the machine, humans use preset symbols (such as language, action or eye contact) to tell the machine which environments are safe and those environments are dangerous, or can tell the machine further Different grades of machines. Just like training a child, just tell it "very dangerous", "more dangerous" and "a little dangerous". In this way, the machine can gradually increase the connection strength between the dangerous environment or the common features in the process and the built-in demand symbol of danger through training, memory and forgetting (because of the increased number of repetitions). Then when the machine processes the input information next time, after giving the input information the same initial activation value, the activation value of some features is closely connected with the danger symbol, and it transmits a large activation value to the danger symbol. The machine is immediately aware of the danger and will immediately process this dangerous information based on its own experience (which can be preset experience or self-summed experience). Of course, since humans already have a lot of experience to pass on, during training, we can also directly tell the machine how dangerous those specific things or processes are. This is a way to preset experience for the machine. The preset experience can use language to allow the machine to establish a memory frame to connect the dangerous factors with the danger, or it can be realized by directly modifying the existing relationship network of the machine (modifying the memory value of the danger symbol in the corresponding memory frame). The two values of safety and danger tell the machine how to identify safety and danger factors, so as to learn how to protect itself. The benefit value and loss value tell the machine which behaviors we encourage and which behaviors will be punished. This is a reward and punishment system. Just like training children, we only need to reward or punish them after they perform certain behaviors. Or when rewards and punishments happen, just tell them why. Of course, we can also preset experiences (such as telling it in advance that those behaviors will be rewarded and those will be punished, or directly modify its brain neural connections to achieve the goal. The brain neural connections of the machine are the relationship network). Achieving a goal and bringing happiness (rewarded) is a gift that evolution brings to us. This is the driving force for our race to continue to develop. We can also give machines similar instinctive motives, allowing them to build up the motivation for self-development. Therefore, when the machine achieves a goal, it can either be rewarded by humans or be rewarded by a preset program, thereby inspiring the motivation of the machine to keep trying. Domination and being dominated is to tell the machine the range it can control through gains and losses. This range changes with different environments and different processes. It is also a reward and punishment system. But the difference between it and the loss-of-interest system is that the loss-of-interest system focuses on the result of behavior, while domination and dominance focus on the scope of behavior. It uses the same training method as the loss-of-profit system. We can also associate the machine's own body state evaluation value and needs with emotions and external input information, the purpose is to let the machine understand the relationship between the machine's own body state evaluation value and them. For example, on a rainy day, if the machine finds that its power or other performance is rapidly declining, it will store these memories. If the same situation is repeated many times, the machine will establish a closer connection between performance degradation and rain. These connections will activate the rain feature when the subsequent machine chooses its own response process, which will bring a larger loss value to the loss symbol. The loss value is one of the indicators used by the machine to evaluate which response to choose, so the machine may tend to choose a solution that excludes the loss value caused by rain. Therefore, in the present invention, we only need to put the rewards and punishments together with all external and internal information into the memory, and the machine can incorporate these rewards and punishments into its own thinking, without having to make many "rules". To tell the machine how to recognize the environment, what to do and how to express emotions (this is actually an impossible task).

The emotion of the machine is an important way for the machine to communicate with human beings. Therefore, in the application of the present invention, we also take the emotion of the machine into consideration. Human emotional response is an innate response to whether one's own needs are met, but through acquired learning, we have gradually learned to adjust this response, control this response, and even hide this response. In the same way, we use preset programs to link the emotions of the machine with whether the needs of the machine are met. For example, when a danger is identified, the emotions of the machine are "worry", "fear" and "fear", depending on the degree of danger. For example, the various internal operating parameters of the machine are in the correct range, which gives the machine emotions such as "comfort" and "relaxation". If some parameters are out of the correct range (equivalent to the machine is sick), the machine's expression may be "uncomfortable" and "worry". Therefore, using this method, we can assign all the emotions that humans have to the machine. The emotion itself is expressed through the facial expressions and body language of the machine. In the same way, these instinctive emotions of the machine will be adjusted by the reward and punishment mechanism. In the machine's life, in different environments or processes, the trainer will continue to tell the machine its emotional performance, which ones are rewarded, and which ones are punished. You can also directly tell it what the appropriate emotion is in a particular or process. Of course, you can directly modify its neural network connection to adjust its emotional response. Therefore, in this way, the machine can adjust emotions to a degree similar to that of humans, and further, because emotions and other memories are stored together, in the same memory. When a machine needs a certain result, it will imitate the memory that brought that result. For example, a certain type of behavior brings a certain result that can be repeated, then the machine will imitate the memory that contains this type of behavior, and of course it will also imitate the emotions in these memories, so it will adjust its emotions for a certain purpose. This is a way of using emotions.

It needs to be pointed out that the thoughts and emotions of the machine intelligence established by the method proposed by the present application are visible and controllable to humans, and are completely understandable. Therefore, such machine intelligence will not bring humanity to humans. It is dangerous, which is also a feature of the general artificial intelligence implementation method proposed in the present application.

4. Understand the input information through the relationship network and memory.

Figure 4 is the process of information processing in the relational network. In S401, the machine preprocesses the input information according to the required resolution, and extracts the static feature map and the dynamic feature map according to the resolution. S402 is the feature map obtained by the machine to find the correct concept. A language feature map may have a lot of ambiguous information. For example, a language input may be ambiguous. The strategy adopted by the machine is to use the relational network as a semantic library, and find the correct concept through the connection of context. This step can be achieved by identifying the tightness of the connection between the input information. A quick search method to achieve "identifying the tightness of the connection between input information" is to assign initial activation values to all input information characteristics, and start chain activation to find the focus. Find the 1 to N (natural number) feature maps with the highest activation value and highlight. Those feature maps that are connected to the language feature map, and the concept that contains it is the correct concept.

S403 is a step for the machine to establish an environment space. When we are in the real environment, we call the concepts identified in step S402, and through these concepts under other image feature maps (that is, the previous similar feature maps in memory. Because they are similar, they are under the same concept) and the current input The feature map of is stacked by zooming and rotating according to the maximum similarity. Obviously, to achieve such a stacking, there must be global coordinates and local coordinates. Local coordinates are the customary coordinates of specific objects, which are commonly used local coordinates in memory, and are usually established along the edge or center of the object. The global coordinates are usually established along the horizon, the direction of gravity, and the depth of field. The method for superimposing the feature map and the original data can be a preset program. The specific implementation method is a very mature algorithm in the industry and a well-known technology, so I will not repeat it here. After the environmental space is established, the machine overlaps the memory space and the real space by searching for spaces similar to or partially similar to the environmental space in memory, so that we can understand the real space based on the other parts of the memory space that are being used for reference. The part that is not currently visible. For example, when we look at a familiar cabinet, we seem to be able to see the image inside the cabinet. But this is actually because we have superimposed the memory image in the cabinet. This is a way for machines to understand the environment. All activities and decisions of the machine are based on a specific environment, so identifying the environment is the first step for the machine to process information from the outside world.

The specific storage method of data in the environment space is to store the data every time an event occurs. We can approximately think that the feature extraction of input information is the compression of two-dimensional data, and the event storage mechanism is the compression of data in time. The data compression method can also be replaced or partially replaced by other data compression methods. But no matter which method, the similarity of things and environmental relations must be preserved. These different compression methods will not affect the claims of other methods in the present application.

S404 is the machine organizes the feature maps into a reasonable order. The machine adjusts the feature map representing the input information in an appropriate order, and forms a reasonable sequence by adding or subtracting part of the content. The basis for adjustment is to imitate the combination of these concepts in memory. We can use metaphors to illustrate. This process is like a warehouse manager who takes the input drawings (S401) and finds the correct parts according to the relationship between the parts on the drawings (chain activation) according to the current workshop (environment). (S402 and S403).

If we regard memory as a three-dimensional space containing countless feature maps, then the network of relationships is the context in this space. The emergence of these contexts is due to the memory and forgetting mechanism. The relationships that cannot be repeated are forgotten, while those that can be repeated are strengthened. Those feature maps that are connected through the coarse relationship context constitute the concept. It connects images, voice, text or any other form of expression of similar information. Because these forms of expression frequently appear together and frequently switch to each other, the connection between them is closer. The tightest local connection relationship constitutes the basic concept (including static feature map and its language, dynamic feature map and its language); a bit looser than the basic concept is the static expansion concept and the dynamic concept expansion concept (including the representative relationship Concept and process characteristic diagram), looser than concept is memory. In the relational network, those static feature maps (or concepts) are usually small parts, and those dynamic feature maps (including concepts that represent relationships) are connectors, and those process features are large frames, which are multiple small parts (static Objects) and connectors (dynamic features) and organized according to a certain time and space order. These are the key components when we organize information. These parts are often called because they are common parts of various things, scenes and processes. And every time it is used, the memory value is increased according to the memory curve. Conversely, because of their high memory value, they are not easy to be forgotten and can be found often. Therefore, the formation process of the correct concept is a positive feedback strengthening process.

After finding the correct parts, the machine will first look for dynamic concepts (action features, relational concepts, or process features) in this information. They are usually connected to multiple objects, and the objects can be generalized, so they It usually appears more frequently in life than static feature maps, so the memory value is usually higher. Therefore, the dynamic process is a crucial way to generalize the experience of the machine. These dynamic processes serve to connect different objects. Through them, the machine can connect the static image and the dynamic image of the input information to form a series of feature map sequences that the machine can understand.

The combination of dynamic feature map and static feature map determined by the machine is to imitate the similar memory in memory, using the same concept and the same attribute substitution method to determine. For example, a person receives input information such as "eating steak". Although others have no relevant experience of "eating steak", he searches and finds that the most relevant memory is "eating." There is also a "pizza" that has a relatively high activation value. This is because when the feature map of "steak" is activated, the activation value will be transferred to the feature map of foods such as "pizza". And "Steak" will also pass the activation value to "Pizza" through the concept of "Western food". At the same time, the environment of "western restaurant" will also transfer activation values to "pizza" through the network of relationships. So he may choose the memory of "eating pizza". He refers to the way of connecting the static feature map of "eating pizza" and the dynamic feature map, and combines the feature maps of the input information into a sequence of feature maps such as "eating" and "steak". If there are multiple concepts representing dynamic features in the input information, the machine may form multiple feature map sequences. At this time, the machine needs to use the concept of the relationship expressed in the input information to determine the time and space relationship of these feature map sequences. For example, the received message is "you eat pizza first, then dessert", obviously, "...first...then..." This dynamic feature indicating the relationship has arranged the sequence of the two processes. order. If the concept of relationship is used, these multiple feature map sequences cannot be formed into a single feature map sequence from the input information. Then, the machine needs to use memory to determine the time and space relationships of these feature map sequences. For example, the message received is "You pay the bill and go home after eating the steak". There are two feature map sequences in this piece of information, but the time sequence cannot be determined by the relationship of the information itself. The machine needs to determine the intention of the information source based on the information source and its own common memory or other information channels. For example, in this restaurant, you pay first, then serve the food, then the machine refers to the memory and understands that you pay the bill first, eat the steak later, and then go home. If this restaurant serves the food first and pays later, then the machine refers to the memory and understands it as eating the steak first, paying the bill later, and then going home.

Therefore, the sequence of feature maps that the machine uses to recall and combine reality through memory, and to mimic and recombine by segmentation has its own time and space location. After they are combined, it is a three-dimensional, continuous dynamic process. When the machine uses them as an input again in order to understand this series of feature map sequences, the machine is actually watching a "movie" created by the recombination method of "memory + reality". This is because the environment reconstructed by the machine through memory is three-dimensional, and the memory reconstructed by the machine through dynamic features (including relational concepts) is also dynamic. There is no difference between the machine's understanding of these reconstructed dynamic memory processes and the machine's understanding of the real process. It's just that the three-dimensional dynamic process of reconstruction through memory only has partial information, and the information in the memory that has low memory value has been forgotten. In the reorganized "movie", language (text and speech) also exists, but they exist as images and sounds. The machine needs to re-recognize the language-related images and sounds in the "movie" to understand its meaning. This is because the recognition of language is a higher level established by the brain on the basis of the underlying information form.

When the machine reconstructs the memory environment from the environmental information in the memory, the same environment may have multiple memories from different angles. The method of machine processing is to create a three-dimensional environmental space through the memory of these different angles. This space may include parts of the machine that are not currently visible. The specific realization method of machine reconstruction of the three-dimensional environment is a very mature technology in the current industry, especially widely used in electronic games. When a machine reconstructs dynamic features (or process features) in a three-dimensional environment space, many times the relevant object of these dynamic processes is the machine itself. Therefore, the machine also needs to reconstruct one of the objects of the dynamic process according to the needs of the dynamic process: the image of the machine itself. The process of the machine's reconstruction of itself is the same as the machine's reconstruction of the environment: it is also through the memory of itself from different angles to build a three-dimensional figure that represents itself. And the three-dimensional graphics representing the machine itself can have different resolutions. For example, under the reconstruction of high-resolution dynamic features, the machine may need to reconstruct its own hand movements or even finger movements. At lower resolutions, you may only need to reconstruct a whole object that represents yourself.

The dynamic characteristics of the machine to the outside world can be obtained by observation and can be reconstructed by vision. But many times, when humans need to reconstruct their own movement process, humans have no vision of some of their own movements, such as the movements of our hands out of sight. At this time, we are reconstructing based on our own gravity sensing, posture sensing, and tactile sensation data in the memory at the time of the action. In the present invention, we also introduce the same mechanism to the machine. The machine stores visual motions and gravity sensing, posture sensing, and tactile data in a memory frame. When our actions are outside of vision, the machine looks for visual memory images that are closely connected to similar data such as gravity sensing, posture sensing, and touch, and uses such memory images to reorganize actions that we can't see. So we can seem to see the movement of our hands behind us. The same is true for machines.

In this way, in the reconstructed three-dimensional environment, the dynamic process between the reconstructed machine's own three-dimensional image and other objects in the memory can be reconstructed. Therefore, in the reconstruction of the animation process, their components are derived from the reorganization of multiple memories. So to call the memory itself is to call the reorganized memory. We use different memories to piece together the necessary information for us to understand the information and make decisions. So our memory itself may be wrong. In the application of the present invention, the machine adopts the same method and makes the same mistake.

After the machine creates a three-dimensional environment and a three-dimensional self-image, it also reconstructs the dynamic process in memory. It is possible for the machine to create "animated movies" composed of multiple memories as needed, and watch these "animated movies" from a third-party perspective. The reason we can observe ourselves from a third-party perspective is because we create an "object" based on memory to represent ourselves to realize the dynamic process. And according to needs, give this object different resolutions. At the same time, based on the internal data of similar gravity sensing, posture sensing, and tactile data in the memory, reconstruct the object's movements under similar data, even if these movements are not in our visual memory. This, similar to human beings, we can also observe our own activities from behind us in our memory. The machine takes the created dynamic process as a virtual input, looks for the causes and consequences of the similar dynamic process from memory, and can understand the input information. In addition, when the machine creates a virtual response, the machine also uses the same method, taking the response plan created by itself as an input information sequence, and then reconstructing the dynamics representing the sequence by reconstructing the three-dimensional environment and the three-dimensional self-image related to the sequence. Process, and observe these dynamic processes from a third-party perspective, and look for the consequences of similar dynamic processes from memory to evaluate gains and losses. A quick way to realize the above evaluation process is to use the chain activation method to obtain the evaluation results quickly by using the relevant information in this dynamic process. Therefore, the chain activation method is a search method, which is not a necessary step for realizing general machine intelligence in the application of the present invention, but a specific method for realizing certain steps.

S405 is the purpose of the machine using the feature map sequence established in S404 to understand the information source. The so-called understanding of information is to understand the purpose of the information source. The information sent by the information source must be based on the machine's previous response to this information. This is the intended purpose of the information source. Otherwise, there is no need for the information source to issue such a message. Because the way it fails to achieve the purpose, it will soon be abandoned by the information source. Therefore, the machine takes the most frequently-occurring response between itself and the information source and related to the input information as the purpose of the information source. If there is no frequent interaction between the machine and the information source, then the machine considers the response most used by others as the purpose of the information source. When the machine understands the purpose of the information source, it also understands the input information.

6. Establish a response to the input information through the network of relationships and memory.

Figure 5 is the process of the machine establishing a response. In S501, the machine needs to use the feature map sequence after the input information is combined to find the memory related to the similar sequence in the memory. 1. Look for the response after receiving a similar sequence; 2. Look for the response of others after receiving a similar sequence; 3. Look for the response received after sending a similar sequence; 4. Look for the response received by others after sending a similar sequence. When specifically looking for these memories, the machine does not need to distinguish them. The machine only needs to directly use the feature map sequence after the input information is combined, combine them into a dynamic process as input, and give the initial activation value again. After the chain activation process is completed, look for the 1-N (natural number) memory frames with the highest sum of activation values, which are the memory frames in the above four aspects. In the present application, we call them the memories most relevant to the input information. Because the memory frames in the above four aspects are the memory values most relevant to the input information sequence. The purpose of searching for the sum of activation values is to find the memory frames that contain higher activation values and to find the memory frames that contain more activation values. Therefore, it is not necessary to adopt a summation method, and other methods that can achieve the above objectives are also possible. In order to eliminate interference information, the machine can repeat the above process one or more times in step S501.

By looking for responses to information from these experiences, machines find answers not only from experience, but also from “empathy”. Because in these referenced memories, there are also the state of the machine itself when it sends out similar information sequences and the response it obtains. In the subsequent creation of the machine's response, these memories will also be used to create the machine's response through reorganization together with the real information. These responses may contain the machine's response through "empathy." In addition, in the communication, the person who sends the message and the person who receives the message are likely to omit a lot of information that both parties know. Such as shared cognitions, experiences, and things that have been discussed. And through the memory search above, these missing information can be supplemented.

The machine's response to the input information may take many forms: for example, it may be to ignore the input information, it may be reconfirming the input information, it may be recalling a memory mentioned in the input information, it may be a verbal response to the input information, or it may be Responding to the input information may also be through "empathy" thinking to infer the overtones of the information source. When the specific response form is adopted, the machine needs to create a virtual response, and then determine whether it is appropriate by evaluating the virtual response, and finally can select a suitable response. The standard for the machine to determine whether a response is appropriate is to "see the advantages and avoid the disadvantages."

S502 is a process in which the machine establishes a virtual response. This process is a process of creation and evaluation, and is the most concentrated embodiment of machine intelligence. In the information exchange, in order to get the response they need, the information source must specify the range of information in the message sent, so that the machine can expect the correct response. Therefore, the machine needs to extract the range of information from the input information. These ranges include static feature maps in the input information and dynamic feature maps that connect these static feature maps (including concepts representing relationships). Because the operating objects of dynamic feature maps can be generalized, they exist more widely in memory. The machine uses the most relevant memories found in S501, and according to the organization of dynamic features in these memories, for dynamic feature operation objects, the input-related static feature maps are brought in by concept substitution, and the resulting feature map sequence is Virtual response sequence established by the machine. These sequences are responses formed by the machine after reorganizing past experience and reality information with reference to past experience and its own motives. This response belongs to the usual response of the machine. The usual response is the response that meets the expectations of the information source. But whether the machine makes such a response, the machine still needs to be evaluated before it can be determined.

S503 is the evaluation value of the virtual response established by the machine to S502. In the process of S503, the specific method for the machine to evaluate the virtual response established in S502 is to use this virtual output as an event that has already occurred, and evaluate the possible consequences of the virtual output. The machine's evaluation of possible consequences is to evaluate the impact of its consequences on its various needs based on experience. The specific method used by the machine is:

1. Use the feature map sequence output by the machine plan to find the result after the similar situation with this sequence occurs: the relevant memory after the similar situation occurs. If there is no complete similarity, select multiple locally similar feature map sequences, and look for results related to these locally similar sequences (results after occurrence).

2. The memory related to the consequences contains the demand state of the machine (their memory value is positively related to the corresponding demand value when the memory is stored). After the machine accumulates them, it can determine if the plan responds to the real output Later, possible consequences (influence on your own demand status).

A quicker way to find these memories and get the impact on demand is chain activation. The machine converts the output sequence into an input, and performs chain activation on these input feature maps in the relational network. After the activation is completed, the cumulative demand status obtained by the machine can see the possible consequences. Because in the chain activation process, the most relevant memories get the most activation values, they will spread the activation values along the tightness of the connection between the feature map and the demand state in these memories, so as to correctly reflect the possible changes in the demand state.

Because in our relational network, when all the memory frames are stored, the demand symbols of the machine at the time and the corresponding memory values are stored at the same time. These memory values are positively related to the state value of the demand symbol at the time. For example, if the machine receives blame after a certain behavior. Because blame is a loss (this experience can be preset, expressed through the language of the trainer, or directly modified by the relationship network), and the degree of blame (such as the words in the language that express the degree) brings different effects to the machine The loss value. The stronger the blame, the higher the memory value assigned by the machine to the loss symbol in memory. Then in this memory, since the memory value of the loss symbol is relatively high, all other feature maps with higher memory value in this memory frame have a stronger connection with the loss symbol. If in a similar environment, a similar action sends out an object or accepts an object, and a behavior similar to being blamed occurs again, then the loss-causing feature map and loss symbol themselves in this memory frame have been repeated, and their memory value is in this memory. The frames are all increased according to the memory curve, thereby increasing the relationship between the loss-causing feature map and the loss symbol. Through repeated repetitions, the relationship between the feature map and the loss symbol that actually caused the loss was selected according to the memory and forgetting mechanism. From the beginning, the machine didn't know why it was scolded, but later it would be clear what caused the scolding consequences. This process is similar to the learning process of human children.

In the same way, the profit value, safety value, risk value, goal achievement value, and dominance value of the machine are similar situations. They all continuously link behavior and behavior results through the machine's past experience. The way to connect them is to put them in the same memory frame. Even if the machine did not get timely feedback when the behavior occurred. The trainer may also point out the behavior itself and give feedback in the later stage, so that the behavior and the result are connected in a single memory frame. The trainer does not even need to specify which behavior is good or bad. The machine only needs to receive the correct feedback every time, and through memory and forgetting, it can gradually establish the connection between the correct behavior and the demand value. For example, those behaviors that will definitely receive rewards or punishments are memorized at the same time after each behavior and reward or punishment. Each time they repeat, their memory increases, and eventually the connection between the two will become closer and closer than the other connections.

The evaluation system of the machine is a preset program. This program determines whether a virtual output should be transformed into a real output based on the satisfaction state of the machine's demand for gains and losses, safety and risk values, goal achievement values, and dominance values. These types of needs are given by humans to machines. Of course, we can give machines more goals that humans expect them to have, such as "compliance with the robot convention", "compliance with human laws", "compassionate", "ethical", "behaving gracefully" and other goals. These goals can be achieved by setting demand symbols in the memory and adjusting the behavior of the machine through feedback from the trainer, so as to achieve human expectations. It needs to be pointed out that these goals can be increased or decreased in accordance with human expectations. The addition or reduction of these objectives does not affect the claims of the present application.

In order to better communicate with humans. The application of the present invention proposes to use the actual satisfaction state of the machine's requirements as the input of the emotion system, and use a preset program to convert them into the emotion of the machine. The purpose of this is to anthropomorphize, imitating the emotional response of human beings in different states of satisfying needs. Only in this way can machines better communicate with humans. At the same time, we use the following methods to realize that the machine itself can use its own emotions to achieve its own goals: 1. Each time the machine stores a memory, it stores its own emotions synchronously. 2. The trainer needs to give feedback on the emotions of the machine. Through the trainer's feedback, the machine determines how emotions should be adjusted. 3. The machine can modify the parameters of the preset program by itself, and output emotions according to its own experience. With the above three points, the machine can connect emotions and feedback. Such emotions are not only a way of expression, but also a means that can be used. Because certain emotions are connected with certain external feedback. When the machine is looking for specific feedback, emotions may be incorporated into memory and become a kind of imitation object when the machine expects to reproduce specific results. It needs to be pointed out that the type and intensity of emotions can be increased or decreased according to human expectations. The addition or reduction of these objectives does not affect the claims of the present application.

S504 is based on the various evaluation values established by S503 (values obtained for each demand state), and combined with the internal state values of the machine itself (such as whether it is lack of power, whether some of its own systems are broken, etc.) to make judgments. The result is pass or fail. This is a link to personalize the machine, and different choices are equivalent to different personalities. This step can be achieved through a preset logical judgment program, or you can keep some parameters that can be adjusted by the machine itself, let the machine try different options, with different consequences, and gradually establish a response that best meets your needs. This step can be achieved by the existing publicly known technology, and will not be repeated here.

In S504, if the response established by the machine fails to pass the evaluation system. Then the machine needs to re-establish other responses. After the machine returns to the step S502, it needs to remove the behaviors that brought heavy losses, dangers and other negative results in the last evaluation. These behaviors are the combined behaviors of the static feature maps and dynamic feature maps that bring losses. Getting rid of negative behaviors is also a more complicated machine thinking process. In this process, the machine needs to convert all the current goals into inheritance goals, leaving the computing power vacant for the calculation of a temporary goal such as removing negative behaviors. The method used by the machine can be to give itself a short period of time to buffer, so that the existing activation value in the relationship network fades. Then, the machine needs to look for all the memories of this negative behavior and find the experience of how to exclude it. If the machine cannot find a suitable choice during this process, it may send out temporary responses such as "um" and "ah" to tell the outside world that it is thinking, please do not disturb. Or the thinking time is a bit long, and the machine needs to re-input the object it is thinking to itself to refresh the activation value in the relational network to avoid forgetting what it is thinking. This process can also achieve the purpose of eliminating interference from other information in the relationship network.

After removing the behavior that brought negative results, the machine re-establishes a new response according to the method in S502. The process of establishment is still to optimize dynamic feature maps, replace static feature maps with concepts, and then use similar memories to determine their combination. Re-establish a new response, and then the machine re-enters steps S503 and S504 for evaluation.

If the machine is repeated many times, it still cannot establish a response that can pass the evaluation. It is possible that there was an error in the previous steps, or the machine encountered an unsolvable problem. At this time, the machine enters the processing of the "unprocessable information" flow. In other words, "unable to process information" itself is a result of processing information. The machine builds a response to "unable to process information" based on its own experience. These responses may be ignored, may be to confirm the information with the information source again, or use higher resolution to identify the information again, etc. These are also reasonable responses similar to human behavior.

7. Perform response.

Performing the response step is a translation process. If in selecting various possible response steps, the machine uses voice output, which is relatively simple. It only needs to convert the image feature map to be output into voice, and then use the relational network and memory to change the dynamic The feature map (including the concept that represents the relationship) is combined with the static concept, organized into a language output sequence, and the pronunciation experience is used to implement it. It needs to be pointed out that the machine may choose some dynamic features that express the entire sentence based on experience (self or other people's experience) (such as using different movement patterns of tone, audio pitch, or stress changes to express doubts, mockery, distrust, emphasizing key points, etc.) Common way). Because machines learn these expressions from human life, in theory, machines can learn all the expressions that humans have.

If the machine uses motion output, or a mixed output of voice and motion, the problem will become much more complicated. This is equivalent to organizing an event. In the machine's response plan, there may only be the main steps and the final goal, and the rest need to be changed in practice.

1. The machine needs to target the image feature map sequence to be output (this is the intermediate target and the final target), and different time and space are involved according to these targets. The machine needs to divide them in time and space in order to coordinate their execution efficiency. The method adopted is to select groups that are closely related in time and that are closely related in space. Because the dynamic feature map and the static feature map are combined to form an information combination, the environment space of the related memory contains time and space information, so this step can use the classification method. (This step is equivalent to rewriting from the overall script to the sub-script).

2. The machine needs to combine the intermediate targets in each link again with the real environment, and use the method of segmented imitation to expand layer by layer. The response plan proposed by the machine at the top level is usually only composed of highly generalized process features and highly generalized static concepts (because these highly generalized processes can find multiple similar memories, so learn from them to establish The response is also highly general). For example, under the total output response of "business trip", "going to the airport" is an intermediate link goal. But this goal is still very abstract, and machines cannot perform imitation.

Therefore, the machine needs to be divided according to time and space, and the link that needs to be executed in the current time and space is the current goal. And take other goals in time and space as inheritance goals and put them aside for the time being. After the machine takes the intermediate link as the target, the machine still needs to further subdivide time and space (write down the score script again). This is a process of increasing temporal and spatial resolution. The process by which a machine converts a target into multiple intermediate links is still a process of creating various possible responses, using an evaluation system to evaluate them, and selecting their own responses according to the principle of "seeking advantages and avoiding disadvantages". The above process is continuous iteration, and the process of dividing each goal into multiple intermediate goals is a completely similar processing flow. It has to be broken down to the bottom experience of the machine. For language, the bottom experience is to mobilize muscles to make syllables. In terms of action, it is broken down to issuing drive commands to related “muscles”. This is a tower-shaped decomposition structure. The machine starts from the top-level goal and decomposes a goal into multiple intermediate-link goals. This process is to create virtual intermediate process goals, if these intermediate process goals "meet the requirements", keep them. If "does not meet the requirements", re-create it. This process unfolds layer by layer, and finally establishes the colorful response of the machine.

3. In this process, the machine may encounter new information at any time, causing the machine to process all kinds of information, and these original goals become inheritance motivation. This is equivalent to the process of organizing activities, constantly encountering new situations that need to be resolved immediately, otherwise the activities will not be able to be organized. So the director called to stop other activities, first to solve the immediate problems. After the resolution, the activity continues. Another situation is that during this process, the director suddenly received a new task, so after weighing the pros and cons, the director decided to suspend the activity first and deal with the new task first.

4. The machine is to perform imitation tasks that can be performed while decomposing other goals into more detailed goals. So the machine is thinking while doing it. This is because the reality is very different, and it is impossible for the machine to know the external situation in advance and make a plan. So this is a process in which the environment and the machine interact to complete a goal.

At this point, the machine has completed the understanding and response to an information input. This process is a minimal cycle of interaction between the machine and the outside world, and it will be repeatedly used to accomplish greater goals.

8. Update the memory bank.

Updating the memory bank runs through all the steps. It is not a separate step, but the realization of the relationship extraction mechanism. In step S1, the establishment of low-level features is mainly to use memory and forgetting mechanisms. Each time the machine finds a similar local feature through the local field of view, if there are already similar local features in the feature library, it will increase its memory value according to the memory curve. If there is no similar local feature in the feature library, store it in the feature map and give it an initial memory value. The memory values in all feature libraries gradually decrease according to the forgetting curve with time or training time (increasing with the number of training samples). In the end, the simple features that are widely present in various things will have high memory value and become the underlying feature map.

In step S2, every time a low-level feature or feature map is found, if there are already similar low-level features or feature maps in the temporary memory library, feature library, or memory, its memory value increases according to the memory curve. They also follow the forgetting mechanism. In step S2, the machine first saves the environment space into the temporary memory bank. When the machine stores these environment spaces in the memory bank, it will also store the feature maps in the environment space and their memory values. The initial memory values of these feature maps are positively correlated with the activation values when their storage occurs. In steps S3, S4, S5 and S6, the memory value of the feature map in the memory bank complies with the memory and forgetting mechanism. Whenever a relationship in the memory is used once, the feature map involved in this relationship will increase the memory value according to the memory curve, and all the feature maps will forget the memory value according to the forgetting curve of the memory bank in which they are located.

9. An example of an interactive cycle.

Let us briefly illustrate an interaction cycle through an example. Suppose that in a hotel room in an unfamiliar city, the machine receives an instruction from the owner to "go out and buy a bottle of beer and get it back". Through the S2 step, the machine extracts many low-level syllable inputs and many low-level features of environmental information. After the S3 step, the focus points found by the machine may be: "room", "hotel", "go out", "buy", "a bottle", "beer", "take", "come back", "evening", " “I’m running out of electricity”, “pay the room fee”, etc. (where the room fee may be the inheritance goal left by the machine’s previous activities), and translate these feature maps into the underlying information processing form of the machine (a form outside of language). In step S4, the machine begins to understand this information. The method adopted by the machine is to assign initial activation values to all the attention points (these activation values can be unified initial values, which are set using a preset program and based on the current demand state of the machine) and start the chain activation process. After the chain activation process is completed, the machine searches for the memory with the highest activation value from 1 to N, the memory with the largest number of activated feature maps, or simply sums the activation value in each memory, and the largest 1~M (natural number) memories are the memories selected by the machine. In these memories, the machine first searches for parts related to dynamic characteristics. They are "go out", "buy", "take", and "come back." These dynamic features are all moving images, and they can be connected with various static feature maps to form a process feature. The machine imitates the combination of dynamic feature maps and static feature maps in these memories, and combines them. If the static feature map in memory does not match the static feature map in reality, the machine adopts the analogy substitution method of the same type (feature map under the same concept) and replaces the static feature map in memory with the static feature map in reality. This is a generalized application realized by analogy thinking through the same attributes.

After the machine organizes the input information, it establishes one or more understanding sequences, including the "out" feature map, the "buy" feature map, the "take" feature map, the "back" feature map, and the The order in which various objects and these dynamic feature maps are combined. Then, the machine re-inputs this understanding sequence into its own relational network, looking for its most responses in memory under similar input situations. These most repeated responses are the owner's purpose. Obviously, the machine here can understand that the owner's purpose is to require the machine to perform according to its own requirements.

The machine began to evaluate the instinctive response to "obey the owner's arrangement, go out to buy a bottle of beer and get it back", and found that it could not pass the evaluation (because the battery was not sufficient at this time), so the machine looked for other possible responses again. It is possible to find the memory of taking the beer from the refrigerator to the owner before. So the machine established a possible virtual output process of "taking out beer from the refrigerator to the owner". When the machine evaluates this virtual output process, it again uses chain activation in the relational network to find relevant memories. At this time, all memories including "open the refrigerator", "take beer", and "give to the owner" will be activated, and may also activate those related to "open...", "take...", and "give..." The memory, other feature maps in these memories will also be activated, including all demand states and emotional states. One of the memories may be that you “opened the cabinet and was scolded by the owner for not finding what you need”, then the “open...”, “take...”, “not found...” contained in this memory Dynamic feature maps, because they are related to the "loss" symbol in the same memory, so in this memory, "open...", "take...", "not found..." and other feature maps will be directed to " The "loss" symbol conveys the activation value. And "not found..." This feature map may be in multiple memories, and in these memories, the machine was scolded by the owner. Therefore, among these memories, the memory values of "not found...", "swearing", and "loss" are relatively high, so they are closely connected to each other. When "Not found..." is activated, it will push up the cumulative activation value of the "loss" symbol after the entire chain activation is completed. If the value of the "loss" symbol is too high, then this scheme may not pass the evaluation system. The machine then needs to re-establish the possible output sequence again. In the process of re-establishing the response, one possible option is to improve on the existing response. Under the motive of "seeking advantages and avoiding disadvantages", the machine may be unwilling to give up this scheme (the gain value is very high), so the machine establishes a temporary goal for itself: "under this scheme, how to avoid losses".

Driven by this temporary goal, the machine analyzes the results obtained last time, and it is obvious that the loss comes from a specific memory. After removing this memory, the machine obtains very good evaluation results. So the machine establishes a temporary goal for itself: "How to avoid the situation that...turns on...but finds that...there is none". The way a machine achieves this temporary goal is the same as the way a machine achieves any other goal:

1. Treat the target as a sequence of input information. 2. Chain activation in the relationship network. 3. Assess the satisfaction of needs. 4. If it passes, execute it. If it fails, the virtual response is re-established. 5. When reconstructing the virtual response, first try to increase the range limit of the response (increase the target). If the negative result can be removed and a good positive result can be obtained, then this is the reconstructed virtual response. If increasing the scope of the response cannot exclude negative results, remove some of the targets that bring negative results. 6. Go back to step 1.

In the process of realizing the temporary goal, the machine may go through many choices, and finally imitate its previous experience when making similar decisions. The selected response is "first confirm the prerequisites, and then make other decisions according to the situation...". So the machine began to achieve this temporary goal. The machine also achieves similar goals through the process of searching memory (many details in these processes may have been forgotten, but the characteristic of the process of "walking over...look..." is often imitated and has a high memory value. Remember it), and expand the process of achieving this goal into a series of action feature graph sequences like "walk over to see if there is beer in the refrigerator". This is the new virtual output.

The machine takes the new virtual output as input, activates these feature map sequences in the relational network again, and checks the results of the evaluation system again. It may find that this response also fails to pass the evaluation system. Because there are multiple memories that it turned to other targets and failed to respond to the owner's instructions in time and was scolded, all of them delivered high activation values to the loss symbol. So you need to re-select the plan. The same as the above process, under the motivation of "seeking advantages and avoiding disadvantages", the machine may be unwilling to give up this scheme (the value of gains is very high), so the machine only needs to eliminate the factors that bring losses based on experience. Good plan. So the machine continues to increase the target to limit the scope of the response: "Avoid scolding by the owner."

Therefore, in the current state, the machine turns other goals into inheritance goals, and establishes a temporary goal to "avoid scolding by the master."

So the machine took "avoiding the master's scolding" as a virtual output process, activated by chain, this is an encouraged behavior, so it immediately passed the evaluation system. So the machine began to realize the goal of "avoiding the master's scolding" into a concrete process. Through similar memories related to "avoiding the owner's scolding", it found that among these memories, the memory in which it had first responded to the owner's language was less scolded. In these memories containing voice responses, a further comparison is that when my emotion is smiling, and when the voice is spoken, when the dynamic mode of intonation is "respectful", I have never been scolded once. The evaluation results in these memories most. So the machine selected its own response by searching the evaluation system and related memories, and passed the evaluation system: "Smiling and giving the owner a voice response, adding ‘sorry’ in the language works best...".

So the machine started to execute this response. It smiled and said to the owner, "Master, I'm sorry, my battery is running low. Let me check if there is beer in the refrigerator. I will bring it to you if I have it. If not, I will charge it." Then I went out to buy you beer and paid the room fee by the way...". These language output organization processes are also realized step by step through tower expansion. Among them, "in contrast, the master..." is already a process feature due to frequent use. It has a high memory value in memory and can be often found and used. "I don't have enough battery" is also due to frequent use, which is already a process feature and has become a common phrase. "Let’s see if there is beer in the refrigerator first. If so, I will bring it to you. If not, I will charge it up and go out to buy beer for you..." This is a feature that imitates the language process often used in life:" I first..., and then..." These sentence patterns have high memory value in memory due to frequent use, and can often be found and imitated. And "see if there is beer in the refrigerator" is "see... if there is..." this common sentence pattern is used, and they are also process characteristics in language. They can create a voice like "Look if there is beer in the refrigerator" by imitating the concept replacement in memory with the refrigerator and beer. "If there is something, I will give it to you, if not, I will charge it up and go out to buy you beer..." It is also in the memory found, by removing feature maps with low memory values, and by removing those that are not related to reality. Feature map, left over: "If there is,..., if not,..." such a language process feature. The combination of action and language is also a method of replacing the concept of the same attribute. In addition, the machine omitted the beer information in "If there is beer, I will give you beer, if there is no beer, I...", because the machine has omitted these repeated information based on the experience of imitating humans using these languages. . The sentence "I'll charge the battery and go out to buy you beer..." contains a lot of information. The first is the use of the dynamic feature of "going out". Because the space where the machine is currently located is a hotel room, and "buy beer" is always linked to the store in memory, and the geographical location in between is missing. Therefore, the machine uses the language symbol that represents the characteristic of the process from one space to another according to its own location and the location of the store: "Go" to connect the two places. Since the machine is in a closed space like a room and the shop is outside the room, the machine chooses the word "out" that best matches the status quo to indicate the process from the room to the store outside, although neither of these two places appears in the language. . In addition, there are three dynamic processes in it. They are "charging", "going out", and "buying beer". The machine needs to look for memories related to these three dynamic processes to find their order, and put the appropriate Realistic static objects are arranged in, which can constitute the message expression of "I charge the battery and go out to buy you beer...". When the machine established the image dynamic process of "going out to buy you beer", the image of "hotel front desk" appeared in the whole dynamic process, because this is the memory of the journey out. When the machine divides the script, the spatial location where the inheritance target "pays the room fee" expands also includes the image of the "hotel front desk", so the machine divides the realization of these goals into an empty space script. And in accordance with the dynamic pattern in memory, one goal is achieved on the way to another goal by the way: go along the way.... The concept of Shun Dao, which represents a dynamic relationship, is used to connect the two behaviors. After the machine organizes the information, it determines the process characteristics of each pronunciation selection according to the dynamic mode of intonation selected by itself. Each pronunciation is a tower-shaped expansion process, which expands a voice into multiple syllables. The choice of syllable pronunciation is selected in the dynamic mode of pronunciation of "respectful". The pronunciation of each syllable is a dynamic process, including a large number of muscle movements, all of which come from experience.

After the machine responds, it waits for feedback from the owner. It finds through the sensor that an image feature is closely related to the concept of "nodding", and "nodding" is in turn connected to the concept of "consent", so the machine can recognize that the owner agrees to his plan. So it believes that this temporary goal has been completed. It began to return to the upper goal (inheritance goal): "Go to the refrigerator".

In the process of imitating "walking to the refrigerator", the machine needs to merge its own location, refrigerator location, and environmental information, as an overall input, use a path planning program to plan the path, and use experience to adjust the path. When imitating the process feature of "walking to the refrigerator", the machine may find that the first one in the tower-shaped decomposition of the lower target is the dynamic feature of "walking". When imitating the dynamic feature of "walking", the machine found that it could not imitate it, because "walking" was standing, and it was sitting on the sofa. So the machine needs to temporarily establish a goal "to change from sitting to standing". The process of the machine to achieve this goal is the same as the previous analysis process. By imitating the process characteristics of "turning from sitting to standing" (the shared part of countless similar memories, the memory value becomes higher because of repeated imitating), it began to combine its own real environment (sofa) to find similar experiences, right Various "muscles" issue driving commands. The parameters in these commands come from the combination of environment and experience, and they are part of experience. The machine may implement a series of more detailed goals such as "stretching the legs", "leaning forward", "maintaining balance", and "stretching out your hands to protect yourself." Each goal corresponds to a set of muscle experience parameters. So the machine stood up. Then walk along the planned path.

In this process, the machine may discover a new situation: "found an obstacle." Then, in the face of these new input information, the machine has to suspend the original target and enter the process of processing the new information, and these original targets become inherited targets. The machine may have to process new information input from step S2, such as shape, size, texture, and color. This information is the basis for finding a solution behind the machine. Through this information, through the relationship network and memory, the machine needs to determine their attributes (such as weight and whether it is safe, etc.), and then find a solution (such as determining whether it can be crossed, whether there is a place to put it after moving, etc.).

When the machine removed these obstacles, it came to the refrigerator. When the machine is facing away from the owner, the machine knows that the owner cannot see his face based on the analogy of viewing the image in his own memory from a third-party perspective. So in order to save power, the machine canceled the smile. After getting the beer and before turning around, the machine, based on experience, smiles to the owner as an activity that brings benefits. The value of the profit brought exceeds the value of the loss caused by power consumption. So the machine put on a smiling face again, took the beer to the owner, and smiled... 10, a schematic diagram of a specific implementation scheme.

Figure 6 is a schematic diagram of a module for realizing general machine intelligence. Among them, S600 is to establish a machine feature extraction module. This module selects the static features and dynamic features of the data at different resolutions by comparing the local similarity, and establishes the contrast similarity or trains the neural network, or any other existing algorithms to extract the features of the data. Among them, S601 and S602 modules are modules that extract information features from external input information, and they involve different resolutions. The machine may need to perform feature extraction on input data at multiple resolutions. In S601, the same sensor data can be divided into multiple channels of data through preprocessing to extract different characteristics of the data. In S602, different pre-processing algorithms can be used again at different resolutions to extract data features at different resolutions. After the input information is extracted, the machine can include two modules in S603. One of them is a dedicated module dedicated to memory search and similarity comparison. It can be a dedicated search hardware. The purpose of this is to solidify the search memory and comparison similarity algorithm, and improve efficiency by using specialized hardware. The other is a module that combines memory information and reality information, which is equivalent to software that realizes data reorganization. This step is mainly to find the dynamic process from the relevant memory, and then generalize the experience through the generalization ability of the action characteristics. S604 is the entire memory bank (including the quick search library established to improve search efficiency, which contains commonly used memory information. It also includes temporary memory banks, long-term memory banks, and possibly other memory banks). The memory bank is equivalent to storage space, but it carries the life cycle (memory value) of each information. The memory bank can use a special memory value refresh module to maintain the memory value. S605 is a demand assessment system, which uses the demand value obtained in the S603 process to make logical judgments. S605 can be implemented in software. S606 is a segmented imitation process (a process of iterative concept development). This process requires constant calls to S603 and S604, which can be implemented by software. S607 is a logical judgment, and it can be realized by software. S608 is a new memory storage process, which can be implemented by software or dedicated hardware. The new memory contains the internal and external input information of the machine, the demand information of the machine and the emotional information of the machine. They are first stored in the temporary memory bank. S609 is the state of completing an information response cycle.

In the embodiment of FIG. 6, it is characterized in that a separate memory search and similarity comparison module is required. Because the machine needs to frequently use memory search and similarity comparison, in the present application, we propose a method of using an independent hardware circuit to realize this function.

Claims

A method for establishing a relationship network, its characteristics include:

Two kinds of basic relationships are extracted to establish a relationship network. They are 1, respectively, the similar relationship of information; 2, the environmental relationship of information.
The method according to claim 1, characterized in that it comprises:

When the machine stores information in the memory bank, it retains the original similarity relationship and environmental relationship between the information; the machine uses values or symbols to indicate the time that these information can exist in the memory bank, which are called memory values; information in the same memory , There is a relationship between each other; the strength of the relationship between any two pieces of information is related to the memory value of these two pieces of information.
A method of memory storage, its characteristics include:

When the machine stores memory, it not only stores the data given by internal sensors and external sensors, but also stores the demand data of the machine or the emotional data of the machine, or stores the demand data of the machine and the emotional data of the machine at the same time; and store these data in the same in memory.
A method of memory storage, its characteristics include:

When the machine stores the memory, the initial memory value assigned by the machine to the stored information is related to the activation value when the storage occurs.
A method for selecting data features, which features include:

The machine uses the method of comparing local similarity to select data features; the machine selects data features according to different resolutions, the same data, the data features selected at different resolutions may be different; the resolution used by the machine includes time resolution The data analyzed by the machine includes static data and dynamic data; the machine needs to use different resolutions for the same data to perform the operation of selecting features.
The method according to claim 5, a method for extracting dynamic features, the features comprising:

The machine uses different spatial resolutions, and uses one or more windows to represent the data in the window, and compares the similarity of the two dynamic motions by comparing the motion trajectories of the windows; the similarity of the comparison motion trajectories of the machine is at the same spatial resolution The machine uses the time resolution to compare the change rate of the machine's motion trajectory to determine the dynamic rate; the similarity of the machine contrast change rate is compared at the same time resolution; the machine needs to use different resolutions for the data Rate to perform repeated extractions.
A method for establishing a response to input information, the characteristics of which include:

The machine first finds one or more segments of the most relevant memories in the memory; these memories are past responses to similar input information, or past responses to multiple pieces of information that are locally similar to input information; the machine searches for the process characteristics in these responses, And according to the time and space relationship, these process characteristics are composed of one or more dynamic processes; the machine adopts the principle that the same attributes can be replaced under the same concept, and uses the action-related objects in the input information to replace the corresponding action-related objects in the memory, and establishes the right Enter the response to the information; the above process can be done iteratively.
A method for a machine to evaluate the planned output information, its characteristics include:

The machine first finds one or more segments of the most relevant memories in the memory. These memories are the external feedback obtained when the machine has made similar output information in the past or made partial similar output information in the past; the machine calls the memories containing these external feedbacks. In order to estimate the possible consequences of the actual output of a specific response, the demand status information is accumulated in the demand status information.
A method for realizing machine experience generalization, its characteristics include:

The machine first looks for the dynamic features in the experience; because the dynamic feature refers to the mode of movement, which has nothing to do with the specific sending and receiving objects, the machine can use the principle that the same attribute can be replaced under the same concept, and generalize the past dynamic experience to different On the object.
A general machine intelligence realization method, its characteristics include:

It is based on two assumptions; 1. At a certain resolution, some things with similar attributes may be similar to other attributes; 2. The information that appears in the same environment has a relationship with each other, and the strength of the relationship and their repetition The number of occurrences is positively correlated.
The method according to claim 10, characterized by comprising:

The machine builds a response to information in accordance with an empirical generalization method, and selects the appropriate output from different responses in accordance with the principle of "seeking advantages and avoiding disadvantages".
A general machine intelligence realization method, its characteristics include:

The machine contains a memory search module and an information similarity comparison module. These two modules can also be combined into one module.
The method according to claim 12, wherein the features include:

The memory search module and the information similarity comparison module can be implemented by hardware separately, or can be implemented by hardware in combination.
A method to improve the search in the relational network, its characteristics include:

The machine extracts the relational network existing in memory to form a cognitive network that can improve search efficiency.
The method according to claim 14, characterized by comprising:

The machine extracts the local relationship network from each memory, and then connects these local network relationships into an overall relationship network through similar feature maps.
The method according to claim 15, characterized by comprising:

The machine extracts a local relationship network for each memory, using the feature map as the center and the connection relationship as the connecting line, and determining the connection value according to the function related to the memory value of the feature map on both sides of the connecting line, which represents the connection strength.
The method according to claim 16, characterized by comprising:

The machine normalizes the connection value sent by each feature map; in this way, the connection value between the two feature maps may be asymmetrical and have directionality.
A method of memory storage, its characteristics include:

The machine stores the direction of gravity in every memory.
A general machine intelligence realization method, its characteristics include:

Assign different demand types to the machine, and use different symbols to represent the different demand types; store the symbols representing the demand and the information that caused the demand state to change in memory; and use numbers or symbols to indicate that the demand is met Case.
A general machine intelligence realization method, its characteristics include:

Give machines different types of emotions, and use different symbols to represent different types of emotions; store the symbols that represent emotions and the information that causes the emotional state to change in memory; and use numbers or symbols to express the intensity of emotions .
The method according to claim 20, characterized in that it comprises:

The emotion of the machine is controlled by the machine's needs and demand state through preset programs; at the same time, the machine can also adjust its own emotions as needed.
A method of memory storage, its characteristics include:

While the machine stores information, it also stores data representing how long it can exist in the database.