CN116774891A

CN116774891A - Method and device for applying artificial intelligence

Info

Publication number: CN116774891A
Application number: CN202310472221.8A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Beijing Goose Factory Technology Co ltd
Current assignee: Beijing Goose Factory Technology Co ltd
Priority date: 2023-04-27
Filing date: 2023-04-27
Publication date: 2023-09-19

Abstract

The main purpose of the large model is to handle different tasks by using one unified model, so in the application field, a more general method is developed to utilize the characteristics and the capabilities of the large model, which will become a very valuable work. Humans acquire information in electronic devices, most importantly through display devices. Because the multi-mode large model is high in price, low in processing speed and lower in maturation speed than the text-based artificial intelligent large model, the method and the device for realizing the target based on the text-based large model have practical significance. The invention aims to provide a method and a device for providing more general artificial intelligence application by taking display information as main input information.

Description

Method and device for applying artificial intelligence

Technical Field

The invention belongs to the field of large-model artificial intelligence, and particularly relates to a method and a device for applying artificial intelligence.

Background

Large model artificial intelligence has become an important breakthrough in social productivity leather hits, such as the GPT4.0 large model introduced by OpenAI. These powerful models will be developed and enhanced in the future, and how to quickly apply these models, further improving social productivity, has become an important direction of research in different fields. The main purpose of the large model is to handle different tasks by using one unified model, so in the application field, a more general method is developed to utilize the characteristics and the capabilities of the large model, which will become a very valuable work.

Humans acquire information in electronic devices, most importantly through display devices. These display devices make various systems increasingly compatible with human habits. If the artificial intelligence can acquire massive information which is the same as human beings from the display device, the application field of the artificial intelligence is greatly enriched, and the work of the human beings can be further lightened.

In the existing work, the displayed information can be directly sent to the multi-mode large model to enable the multi-mode large model to recognize and process screen information, and available information is returned and even equipment is directly controlled, but because the multi-mode large model is high in price, low in processing speed and inferior in maturation speed to a text type artificial intelligent large model, the method and the device based on the text type large model have practical significance in achieving the above-mentioned aim.

Disclosure of Invention

In view of the foregoing, it is an object of the present invention to provide a method and apparatus for providing a more versatile artificial intelligence application using display information as a primary input information.

In order to achieve the above functions, the present invention uses a text-based artificial intelligence large model as a main information processing center, which requires that text information can be input.

In the existing devices for human use, human-designed information is displayed with readability, typically information displayed by a display.

And performing text mapping on the content information of the display information. The rule of mapping is that if the information is plain text information, the information is directly mapped into words, and if the information is an operable module, the information is mapped into a module description. For example, the displayed "ok" button needs to be mapped directly to "ok button" and cannot be mapped simply to "ok". The mapping rules may be modified as desired.

The input information for the artificial intelligence large model is:

the text information mapped by the information is displayed, and the tasks required to be executed after the large model is input to the information, the format of the output information and the position of the output information are required to be specified.

The output information of the artificial intelligence large model is:

after the required task is executed, the task result is output according to the appointed position of the appointed format. The output data results will be transmitted to the location designated by the application, including but not limited to cloud space, server, client for subsequent further processing and display.

The specific implementation method comprises the following steps:

s1, extracting information in a display device, wherein the information is usually extracted from a frame buffer (frame buffer) of the display device or a window (surface) of an application program drawn by an operating system;

s2, generating character mapping according to the information and the designated sampling frequency;

s3, sending the text mapping to the large model, and simultaneously designating task requirements, data formats and output positions;

s4, returning information by the large model according to the designated output position, and waiting for further processing by the application program or the system.

It should be noted that, the returned information may be stored in the cloud or a remote server, waiting for the application to pull, or the returned information may be directly pushed to the application or the system, which is a default location.

In order to further realize the usability of the artificial intelligence system, the Chinese information in the step S2 can be mapped, and the position information of the Chinese information in the display information is added, namely the display content is mapped into a pair of information, namely the characters and the positions, and the Chinese information can be used for carrying out human-like operation on the system according to the information returned by the large model to partially or completely replace human beings.

The specific implementation method comprises the following steps:

s5, adding the position information of the display information where the characters are in the character map generated in the step S2;

s6, converting the control related text sequence returned by the large model and the position information into a control instruction through a control instruction translator, and performing human-like operation on the system.

For example, if the text sequence returned by the large model is "click ok button", the control instruction translator translates the text sequence, along with the location information corresponding to the "ok button", into a call to the system click operation. Types of operations include, but are not limited to, click, long press, slide. The control instruction translator can directly carry out system input in a mode of imitating human beings. Of course, the location mapping information may also be directly input into a large model, and the large model directly returns to contain location information, such as "click determination buttons (200 )", and the control command translator directly extracts the coordinate locations for clicking. The two methods of use for the location are equivalent.

Some operations of human beings have guiding effect and change the attention of the large model, so that the information can be sent to the large model for processing together.

S7, recording some operations of the human, generating a text map and sending the text map to the large model.

For example, the user clicks the "refresh" button, but the page does not change at all before and after the refresh. And sending the operation of clicking the refresh button by the user to the large model, wherein the large model can know the moment of refreshing again, the page information is unchanged, and further decision is made on the information. If no operational information is sent, the large model needs to self-refresh the page to confirm the page information changes.

The invention relates to a device for applying multi-mode artificial intelligence, which comprises the following modules:

the D1 display information acquisition module is responsible for acquiring display images at a certain frequency and can designate an image acquisition area;

d2, a display information word mapping module, which is responsible for mapping the display information word and position;

d3, applying a storage module to store description information of tasks to be realized, and returning format requirements of the task return information and information return position requirements;

the input module of the large model D4, namely an input interface of the large model, combines the information of the equipment D2 and the equipment D3 and inputs the combined information into the large model;

the D5 large model output module is used for placing information output by the large model into a designated position according to the information return position requirement in the device D3;

and D6, an application program processing module is used for retrieving information in the device D5 and sorting and displaying the information.

The optional modules will also include:

d7 control instruction translator, which translates control related text returned by device D5, along with location information, into system control instructions including, but not limited to, click, long press, swipe, text input.

As described above, the method and the device of the invention have the following beneficial effects:

the human behavior can be more closely simulated, thereby being more similar to the ability of a personal assistant;

only a literal artificial intelligence large model is used.

Drawings

Fig. 1 shows a basic logic diagram of the method.

Fig. 2 shows display content for example information.

Description of the embodiments

Examples: automatically replying to the holiday message. The invention has wide application prospect.

User a sends a holiday blessing to user B (see fig. 2).

In the information display interface (step S1), the display information is extracted (step S2), and the extracted information is:

+10001

1-21 14:31

the city commission and the city government hope that you are happy in spring festival-! The global area of Beijing city prohibits fireworks and crackers from being set off, please consciously observe

There is an "add button" (121,865) in the page, there is a "text entry box" (271,865), there is a "send button" (961,865) (the location information is step S5).

The large model is now required to do the following:

this is what is displayed on a message interface, asking me to draft a reply message (step S3). And tell me where on the page the text can be entered and then tell me where to click the text can be sent.

The above requirement uses a default output position, i.e. return to the client; the required output format is that the content of the reply information is returned first and then the operation is returned.

The return information of the large model is:

thanks to the blessing of the city commission and the city government. We will voluntarily follow the regulations of global banned fireworks and crackers, and jointly maintain the peace and clean environment of the city. Blessing you happy in spring festival-! (step S4)

Clicking the text input box, inputting text, clicking the send button.

The instruction translator generates an instruction directly executable by the system according to the text "click text input box, input text, click send button" (step S6):

click position 271,865;

the word "thank you for a city delegate and a city government's blessing is entered. We will voluntarily follow the regulations of global banned fireworks and crackers, and jointly maintain the peace and clean environment of the city. Blessing you happy in spring festival-! ";

click position 961,865.

And the three instructions are called by a system to complete the work of automatically replying information.

Claims

1. A method for applying artificial intelligence, characterized by text mapping of content information for display information. The input information for the artificial intelligence large model is: the text information mapped by the information is displayed, and the tasks required to be executed after the large model is input to the information, the format of the output information and the position of the output information are required to be specified. The output information of the artificial intelligence large model is: after the required task is executed, the task result is output according to the appointed position of the appointed format. The output data results will be transmitted to the location designated by the application, including but not limited to cloud space, server, client for subsequent further processing and display. The specific implementation method comprises the following steps:

s1, extracting information in a display device, wherein the information is usually extracted from a frame buffer of the display device or a window of an application program drawn by an operating system;

2. The method for applying artificial intelligence according to claim 1, wherein the human-like operation can be performed according to the location information, comprising the steps of:

3. A method of applying artificial intelligence according to claim 1, characterized in that certain operations of humans are recorded, literal mappings are generated and sent to large models.

4. An apparatus for applying artificial intelligence, comprising the following modules:

5. The apparatus for applying artificial intelligence according to claim 4, comprising the following modules: