CN116774891A - Method and device for applying artificial intelligence - Google Patents
Method and device for applying artificial intelligence Download PDFInfo
- Publication number
- CN116774891A CN116774891A CN202310472221.8A CN202310472221A CN116774891A CN 116774891 A CN116774891 A CN 116774891A CN 202310472221 A CN202310472221 A CN 202310472221A CN 116774891 A CN116774891 A CN 116774891A
- Authority
- CN
- China
- Prior art keywords
- information
- large model
- artificial intelligence
- text
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 22
- 238000000034 method Methods 0.000 title claims abstract description 17
- 238000012545 processing Methods 0.000 claims abstract description 7
- 238000013507 mapping Methods 0.000 claims description 14
- 238000005070 sampling Methods 0.000 claims description 2
- 238000007429 general method Methods 0.000 abstract description 2
- 230000035800 maturation Effects 0.000 abstract description 2
- 241000282414 Homo sapiens Species 0.000 description 9
- 241000282412 Homo Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000010985 leather Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0484—Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/445—Program loading or initiating
- G06F9/44505—Configuring for program initiating, e.g. using registry, configuration files
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The main purpose of the large model is to handle different tasks by using one unified model, so in the application field, a more general method is developed to utilize the characteristics and the capabilities of the large model, which will become a very valuable work. Humans acquire information in electronic devices, most importantly through display devices. Because the multi-mode large model is high in price, low in processing speed and lower in maturation speed than the text-based artificial intelligent large model, the method and the device for realizing the target based on the text-based large model have practical significance. The invention aims to provide a method and a device for providing more general artificial intelligence application by taking display information as main input information.
Description
Technical Field
The invention belongs to the field of large-model artificial intelligence, and particularly relates to a method and a device for applying artificial intelligence.
Background
Large model artificial intelligence has become an important breakthrough in social productivity leather hits, such as the GPT4.0 large model introduced by OpenAI. These powerful models will be developed and enhanced in the future, and how to quickly apply these models, further improving social productivity, has become an important direction of research in different fields. The main purpose of the large model is to handle different tasks by using one unified model, so in the application field, a more general method is developed to utilize the characteristics and the capabilities of the large model, which will become a very valuable work.
Humans acquire information in electronic devices, most importantly through display devices. These display devices make various systems increasingly compatible with human habits. If the artificial intelligence can acquire massive information which is the same as human beings from the display device, the application field of the artificial intelligence is greatly enriched, and the work of the human beings can be further lightened.
In the existing work, the displayed information can be directly sent to the multi-mode large model to enable the multi-mode large model to recognize and process screen information, and available information is returned and even equipment is directly controlled, but because the multi-mode large model is high in price, low in processing speed and inferior in maturation speed to a text type artificial intelligent large model, the method and the device based on the text type large model have practical significance in achieving the above-mentioned aim.
Disclosure of Invention
In view of the foregoing, it is an object of the present invention to provide a method and apparatus for providing a more versatile artificial intelligence application using display information as a primary input information.
In order to achieve the above functions, the present invention uses a text-based artificial intelligence large model as a main information processing center, which requires that text information can be input.
In the existing devices for human use, human-designed information is displayed with readability, typically information displayed by a display.
And performing text mapping on the content information of the display information. The rule of mapping is that if the information is plain text information, the information is directly mapped into words, and if the information is an operable module, the information is mapped into a module description. For example, the displayed "ok" button needs to be mapped directly to "ok button" and cannot be mapped simply to "ok". The mapping rules may be modified as desired.
The input information for the artificial intelligence large model is:
the text information mapped by the information is displayed, and the tasks required to be executed after the large model is input to the information, the format of the output information and the position of the output information are required to be specified.
The output information of the artificial intelligence large model is:
after the required task is executed, the task result is output according to the appointed position of the appointed format. The output data results will be transmitted to the location designated by the application, including but not limited to cloud space, server, client for subsequent further processing and display.
The specific implementation method comprises the following steps:
s1, extracting information in a display device, wherein the information is usually extracted from a frame buffer (frame buffer) of the display device or a window (surface) of an application program drawn by an operating system;
s2, generating character mapping according to the information and the designated sampling frequency;
s3, sending the text mapping to the large model, and simultaneously designating task requirements, data formats and output positions;
s4, returning information by the large model according to the designated output position, and waiting for further processing by the application program or the system.
It should be noted that, the returned information may be stored in the cloud or a remote server, waiting for the application to pull, or the returned information may be directly pushed to the application or the system, which is a default location.
In order to further realize the usability of the artificial intelligence system, the Chinese information in the step S2 can be mapped, and the position information of the Chinese information in the display information is added, namely the display content is mapped into a pair of information, namely the characters and the positions, and the Chinese information can be used for carrying out human-like operation on the system according to the information returned by the large model to partially or completely replace human beings.
The specific implementation method comprises the following steps:
s5, adding the position information of the display information where the characters are in the character map generated in the step S2;
s6, converting the control related text sequence returned by the large model and the position information into a control instruction through a control instruction translator, and performing human-like operation on the system.
For example, if the text sequence returned by the large model is "click ok button", the control instruction translator translates the text sequence, along with the location information corresponding to the "ok button", into a call to the system click operation. Types of operations include, but are not limited to, click, long press, slide. The control instruction translator can directly carry out system input in a mode of imitating human beings. Of course, the location mapping information may also be directly input into a large model, and the large model directly returns to contain location information, such as "click determination buttons (200 )", and the control command translator directly extracts the coordinate locations for clicking. The two methods of use for the location are equivalent.
Some operations of human beings have guiding effect and change the attention of the large model, so that the information can be sent to the large model for processing together.
S7, recording some operations of the human, generating a text map and sending the text map to the large model.
For example, the user clicks the "refresh" button, but the page does not change at all before and after the refresh. And sending the operation of clicking the refresh button by the user to the large model, wherein the large model can know the moment of refreshing again, the page information is unchanged, and further decision is made on the information. If no operational information is sent, the large model needs to self-refresh the page to confirm the page information changes.
The invention relates to a device for applying multi-mode artificial intelligence, which comprises the following modules:
the D1 display information acquisition module is responsible for acquiring display images at a certain frequency and can designate an image acquisition area;
d2, a display information word mapping module, which is responsible for mapping the display information word and position;
d3, applying a storage module to store description information of tasks to be realized, and returning format requirements of the task return information and information return position requirements;
the input module of the large model D4, namely an input interface of the large model, combines the information of the equipment D2 and the equipment D3 and inputs the combined information into the large model;
the D5 large model output module is used for placing information output by the large model into a designated position according to the information return position requirement in the device D3;
and D6, an application program processing module is used for retrieving information in the device D5 and sorting and displaying the information.
The optional modules will also include:
d7 control instruction translator, which translates control related text returned by device D5, along with location information, into system control instructions including, but not limited to, click, long press, swipe, text input.
As described above, the method and the device of the invention have the following beneficial effects:
the human behavior can be more closely simulated, thereby being more similar to the ability of a personal assistant;
only a literal artificial intelligence large model is used.
Drawings
Fig. 1 shows a basic logic diagram of the method.
Fig. 2 shows display content for example information.
Description of the embodiments
Examples: automatically replying to the holiday message. The invention has wide application prospect.
User a sends a holiday blessing to user B (see fig. 2).
In the information display interface (step S1), the display information is extracted (step S2), and the extracted information is:
+10001
1-21 14:31
the city commission and the city government hope that you are happy in spring festival-! The global area of Beijing city prohibits fireworks and crackers from being set off, please consciously observe
There is an "add button" (121,865) in the page, there is a "text entry box" (271,865), there is a "send button" (961,865) (the location information is step S5).
The large model is now required to do the following:
this is what is displayed on a message interface, asking me to draft a reply message (step S3). And tell me where on the page the text can be entered and then tell me where to click the text can be sent.
The above requirement uses a default output position, i.e. return to the client; the required output format is that the content of the reply information is returned first and then the operation is returned.
The return information of the large model is:
thanks to the blessing of the city commission and the city government. We will voluntarily follow the regulations of global banned fireworks and crackers, and jointly maintain the peace and clean environment of the city. Blessing you happy in spring festival-! (step S4)
Clicking the text input box, inputting text, clicking the send button.
The instruction translator generates an instruction directly executable by the system according to the text "click text input box, input text, click send button" (step S6):
click position 271,865;
the word "thank you for a city delegate and a city government's blessing is entered. We will voluntarily follow the regulations of global banned fireworks and crackers, and jointly maintain the peace and clean environment of the city. Blessing you happy in spring festival-! ";
click position 961,865.
And the three instructions are called by a system to complete the work of automatically replying information.
Claims (5)
1. A method for applying artificial intelligence, characterized by text mapping of content information for display information. The input information for the artificial intelligence large model is: the text information mapped by the information is displayed, and the tasks required to be executed after the large model is input to the information, the format of the output information and the position of the output information are required to be specified. The output information of the artificial intelligence large model is: after the required task is executed, the task result is output according to the appointed position of the appointed format. The output data results will be transmitted to the location designated by the application, including but not limited to cloud space, server, client for subsequent further processing and display. The specific implementation method comprises the following steps:
s1, extracting information in a display device, wherein the information is usually extracted from a frame buffer of the display device or a window of an application program drawn by an operating system;
s2, generating character mapping according to the information and the designated sampling frequency;
s3, sending the text mapping to the large model, and simultaneously designating task requirements, data formats and output positions;
s4, returning information by the large model according to the designated output position, and waiting for further processing by the application program or the system.
2. The method for applying artificial intelligence according to claim 1, wherein the human-like operation can be performed according to the location information, comprising the steps of:
s5, adding the position information of the display information where the characters are in the character map generated in the step S2;
s6, converting the control related text sequence returned by the large model and the position information into a control instruction through a control instruction translator, and performing human-like operation on the system.
3. A method of applying artificial intelligence according to claim 1, characterized in that certain operations of humans are recorded, literal mappings are generated and sent to large models.
4. An apparatus for applying artificial intelligence, comprising the following modules:
the D1 display information acquisition module is responsible for acquiring display images at a certain frequency and can designate an image acquisition area;
d2, a display information word mapping module, which is responsible for mapping the display information word and position;
d3, applying a storage module to store description information of tasks to be realized, and returning format requirements of the task return information and information return position requirements;
the input module of the large model D4, namely an input interface of the large model, combines the information of the equipment D2 and the equipment D3 and inputs the combined information into the large model;
the D5 large model output module is used for placing information output by the large model into a designated position according to the information return position requirement in the device D3;
and D6, an application program processing module is used for retrieving information in the device D5 and sorting and displaying the information.
5. The apparatus for applying artificial intelligence according to claim 4, comprising the following modules:
d7 control instruction translator, which translates control related text returned by device D5, along with location information, into system control instructions including, but not limited to, click, long press, swipe, text input.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310472221.8A CN116774891A (en) | 2023-04-27 | 2023-04-27 | Method and device for applying artificial intelligence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310472221.8A CN116774891A (en) | 2023-04-27 | 2023-04-27 | Method and device for applying artificial intelligence |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116774891A true CN116774891A (en) | 2023-09-19 |
Family
ID=87993893
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310472221.8A Pending CN116774891A (en) | 2023-04-27 | 2023-04-27 | Method and device for applying artificial intelligence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116774891A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104599669A (en) * | 2014-12-31 | 2015-05-06 | 乐视致新电子科技(天津)有限公司 | Voice control method and device |
CN106448670A (en) * | 2016-10-21 | 2017-02-22 | 竹间智能科技(上海)有限公司 | Dialogue automatic reply system based on deep learning and reinforcement learning |
CN107195302A (en) * | 2017-06-02 | 2017-09-22 | 努比亚技术有限公司 | A kind of method of Voice command and corresponding system, terminal device |
CN109995642A (en) * | 2017-12-29 | 2019-07-09 | Tcl集团股份有限公司 | A kind of method and device automatically generating quickly revert, instant communicating system |
CN112511882A (en) * | 2020-11-13 | 2021-03-16 | 海信视像科技股份有限公司 | Display device and voice call-up method |
CN114840327A (en) * | 2022-06-29 | 2022-08-02 | 阿里巴巴达摩院(杭州)科技有限公司 | Multi-mode multi-task processing method, device and system |
-
2023
- 2023-04-27 CN CN202310472221.8A patent/CN116774891A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104599669A (en) * | 2014-12-31 | 2015-05-06 | 乐视致新电子科技(天津)有限公司 | Voice control method and device |
CN106448670A (en) * | 2016-10-21 | 2017-02-22 | 竹间智能科技(上海)有限公司 | Dialogue automatic reply system based on deep learning and reinforcement learning |
CN107195302A (en) * | 2017-06-02 | 2017-09-22 | 努比亚技术有限公司 | A kind of method of Voice command and corresponding system, terminal device |
CN109995642A (en) * | 2017-12-29 | 2019-07-09 | Tcl集团股份有限公司 | A kind of method and device automatically generating quickly revert, instant communicating system |
CN112511882A (en) * | 2020-11-13 | 2021-03-16 | 海信视像科技股份有限公司 | Display device and voice call-up method |
CN114840327A (en) * | 2022-06-29 | 2022-08-02 | 阿里巴巴达摩院(杭州)科技有限公司 | Multi-mode multi-task processing method, device and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111095215B (en) | Inter-application delivery format specific data objects | |
US11727200B2 (en) | Annotation tool generation method, annotation method, electronic device and storage medium | |
CN111325020A (en) | Event argument extraction method and device and electronic equipment | |
KR20210043493A (en) | Methods, devices and devices for generating vector representations of knowledge graphs | |
CN111240669B (en) | Interface generation method and device, electronic equipment and computer storage medium | |
CN112507101A (en) | Method and device for establishing pre-training language model | |
CN110691028A (en) | Message processing method, device, terminal and storage medium | |
Scott et al. | Towards an interaction blueprint for mixed reality experiences in glam spaces: the augmented telegrapher at porthcurno museum | |
CN109598001A (en) | A kind of information display method, device and equipment | |
Shaikh | Augmented reality search to improve searching using augmented reality | |
CN112328088B (en) | Image presentation method and device | |
CN112911266A (en) | Implementation method and system of Internet of things practical training system based on augmented reality technology | |
CN116774891A (en) | Method and device for applying artificial intelligence | |
Wilson et al. | Enhanced interaction styles for user interfaces | |
CN114970666B (en) | Spoken language processing method and device, electronic equipment and storage medium | |
US20240038223A1 (en) | Speech recognition method and apparatus | |
EP3896614A2 (en) | Method and apparatus for labeling data | |
Zaguia et al. | Using multimodal fusion in accessing web services | |
Zhang | Development and analysis of educational virtual reality system using static image | |
CN116383620B (en) | Method and device for applying multi-mode artificial intelligence | |
US20230244325A1 (en) | Learned computer control using pointing device and keyboard actions | |
JP2019133418A (en) | Search device, search method, program, and database | |
CN117036652B (en) | Layout information generation method, model training method, device and electronic equipment | |
Shankar et al. | Collaborative Interactive Workspace Environment Using Augmented Reality | |
KR20210037634A (en) | Event argument extraction method, event argument extraction apparatus and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |