CN116774891A - Method and device for applying artificial intelligence - Google Patents

Method and device for applying artificial intelligence Download PDF

Info

Publication number
CN116774891A
CN116774891A CN202310472221.8A CN202310472221A CN116774891A CN 116774891 A CN116774891 A CN 116774891A CN 202310472221 A CN202310472221 A CN 202310472221A CN 116774891 A CN116774891 A CN 116774891A
Authority
CN
China
Prior art keywords
information
large model
artificial intelligence
text
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310472221.8A
Other languages
Chinese (zh)
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Goose Factory Technology Co ltd
Original Assignee
Beijing Goose Factory Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Goose Factory Technology Co ltd filed Critical Beijing Goose Factory Technology Co ltd
Priority to CN202310472221.8A priority Critical patent/CN116774891A/en
Publication of CN116774891A publication Critical patent/CN116774891A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The main purpose of the large model is to handle different tasks by using one unified model, so in the application field, a more general method is developed to utilize the characteristics and the capabilities of the large model, which will become a very valuable work. Humans acquire information in electronic devices, most importantly through display devices. Because the multi-mode large model is high in price, low in processing speed and lower in maturation speed than the text-based artificial intelligent large model, the method and the device for realizing the target based on the text-based large model have practical significance. The invention aims to provide a method and a device for providing more general artificial intelligence application by taking display information as main input information.

Description

Method and device for applying artificial intelligence
Technical Field
The invention belongs to the field of large-model artificial intelligence, and particularly relates to a method and a device for applying artificial intelligence.
Background
Large model artificial intelligence has become an important breakthrough in social productivity leather hits, such as the GPT4.0 large model introduced by OpenAI. These powerful models will be developed and enhanced in the future, and how to quickly apply these models, further improving social productivity, has become an important direction of research in different fields. The main purpose of the large model is to handle different tasks by using one unified model, so in the application field, a more general method is developed to utilize the characteristics and the capabilities of the large model, which will become a very valuable work.
Humans acquire information in electronic devices, most importantly through display devices. These display devices make various systems increasingly compatible with human habits. If the artificial intelligence can acquire massive information which is the same as human beings from the display device, the application field of the artificial intelligence is greatly enriched, and the work of the human beings can be further lightened.
In the existing work, the displayed information can be directly sent to the multi-mode large model to enable the multi-mode large model to recognize and process screen information, and available information is returned and even equipment is directly controlled, but because the multi-mode large model is high in price, low in processing speed and inferior in maturation speed to a text type artificial intelligent large model, the method and the device based on the text type large model have practical significance in achieving the above-mentioned aim.
Disclosure of Invention
In view of the foregoing, it is an object of the present invention to provide a method and apparatus for providing a more versatile artificial intelligence application using display information as a primary input information.
In order to achieve the above functions, the present invention uses a text-based artificial intelligence large model as a main information processing center, which requires that text information can be input.
In the existing devices for human use, human-designed information is displayed with readability, typically information displayed by a display.
And performing text mapping on the content information of the display information. The rule of mapping is that if the information is plain text information, the information is directly mapped into words, and if the information is an operable module, the information is mapped into a module description. For example, the displayed "ok" button needs to be mapped directly to "ok button" and cannot be mapped simply to "ok". The mapping rules may be modified as desired.
The input information for the artificial intelligence large model is:
the text information mapped by the information is displayed, and the tasks required to be executed after the large model is input to the information, the format of the output information and the position of the output information are required to be specified.
The output information of the artificial intelligence large model is:
after the required task is executed, the task result is output according to the appointed position of the appointed format. The output data results will be transmitted to the location designated by the application, including but not limited to cloud space, server, client for subsequent further processing and display.
The specific implementation method comprises the following steps:
s1, extracting information in a display device, wherein the information is usually extracted from a frame buffer (frame buffer) of the display device or a window (surface) of an application program drawn by an operating system;
s2, generating character mapping according to the information and the designated sampling frequency;
s3, sending the text mapping to the large model, and simultaneously designating task requirements, data formats and output positions;
s4, returning information by the large model according to the designated output position, and waiting for further processing by the application program or the system.
It should be noted that, the returned information may be stored in the cloud or a remote server, waiting for the application to pull, or the returned information may be directly pushed to the application or the system, which is a default location.
In order to further realize the usability of the artificial intelligence system, the Chinese information in the step S2 can be mapped, and the position information of the Chinese information in the display information is added, namely the display content is mapped into a pair of information, namely the characters and the positions, and the Chinese information can be used for carrying out human-like operation on the system according to the information returned by the large model to partially or completely replace human beings.
The specific implementation method comprises the following steps:
s5, adding the position information of the display information where the characters are in the character map generated in the step S2;
s6, converting the control related text sequence returned by the large model and the position information into a control instruction through a control instruction translator, and performing human-like operation on the system.
For example, if the text sequence returned by the large model is "click ok button", the control instruction translator translates the text sequence, along with the location information corresponding to the "ok button", into a call to the system click operation. Types of operations include, but are not limited to, click, long press, slide. The control instruction translator can directly carry out system input in a mode of imitating human beings. Of course, the location mapping information may also be directly input into a large model, and the large model directly returns to contain location information, such as "click determination buttons (200 )", and the control command translator directly extracts the coordinate locations for clicking. The two methods of use for the location are equivalent.
Some operations of human beings have guiding effect and change the attention of the large model, so that the information can be sent to the large model for processing together.
S7, recording some operations of the human, generating a text map and sending the text map to the large model.
For example, the user clicks the "refresh" button, but the page does not change at all before and after the refresh. And sending the operation of clicking the refresh button by the user to the large model, wherein the large model can know the moment of refreshing again, the page information is unchanged, and further decision is made on the information. If no operational information is sent, the large model needs to self-refresh the page to confirm the page information changes.
The invention relates to a device for applying multi-mode artificial intelligence, which comprises the following modules:
the D1 display information acquisition module is responsible for acquiring display images at a certain frequency and can designate an image acquisition area;
d2, a display information word mapping module, which is responsible for mapping the display information word and position;
d3, applying a storage module to store description information of tasks to be realized, and returning format requirements of the task return information and information return position requirements;
the input module of the large model D4, namely an input interface of the large model, combines the information of the equipment D2 and the equipment D3 and inputs the combined information into the large model;
the D5 large model output module is used for placing information output by the large model into a designated position according to the information return position requirement in the device D3;
and D6, an application program processing module is used for retrieving information in the device D5 and sorting and displaying the information.
The optional modules will also include:
d7 control instruction translator, which translates control related text returned by device D5, along with location information, into system control instructions including, but not limited to, click, long press, swipe, text input.
As described above, the method and the device of the invention have the following beneficial effects:
the human behavior can be more closely simulated, thereby being more similar to the ability of a personal assistant;
only a literal artificial intelligence large model is used.
Drawings
Fig. 1 shows a basic logic diagram of the method.
Fig. 2 shows display content for example information.
Description of the embodiments
Examples: automatically replying to the holiday message. The invention has wide application prospect.
User a sends a holiday blessing to user B (see fig. 2).
In the information display interface (step S1), the display information is extracted (step S2), and the extracted information is:
+10001
1-21 14:31
the city commission and the city government hope that you are happy in spring festival-! The global area of Beijing city prohibits fireworks and crackers from being set off, please consciously observe
There is an "add button" (121,865) in the page, there is a "text entry box" (271,865), there is a "send button" (961,865) (the location information is step S5).
The large model is now required to do the following:
this is what is displayed on a message interface, asking me to draft a reply message (step S3). And tell me where on the page the text can be entered and then tell me where to click the text can be sent.
The above requirement uses a default output position, i.e. return to the client; the required output format is that the content of the reply information is returned first and then the operation is returned.
The return information of the large model is:
thanks to the blessing of the city commission and the city government. We will voluntarily follow the regulations of global banned fireworks and crackers, and jointly maintain the peace and clean environment of the city. Blessing you happy in spring festival-! (step S4)
Clicking the text input box, inputting text, clicking the send button.
The instruction translator generates an instruction directly executable by the system according to the text "click text input box, input text, click send button" (step S6):
click position 271,865;
the word "thank you for a city delegate and a city government's blessing is entered. We will voluntarily follow the regulations of global banned fireworks and crackers, and jointly maintain the peace and clean environment of the city. Blessing you happy in spring festival-! ";
click position 961,865.
And the three instructions are called by a system to complete the work of automatically replying information.

Claims (5)

1. A method for applying artificial intelligence, characterized by text mapping of content information for display information. The input information for the artificial intelligence large model is: the text information mapped by the information is displayed, and the tasks required to be executed after the large model is input to the information, the format of the output information and the position of the output information are required to be specified. The output information of the artificial intelligence large model is: after the required task is executed, the task result is output according to the appointed position of the appointed format. The output data results will be transmitted to the location designated by the application, including but not limited to cloud space, server, client for subsequent further processing and display. The specific implementation method comprises the following steps:
s1, extracting information in a display device, wherein the information is usually extracted from a frame buffer of the display device or a window of an application program drawn by an operating system;
s2, generating character mapping according to the information and the designated sampling frequency;
s3, sending the text mapping to the large model, and simultaneously designating task requirements, data formats and output positions;
s4, returning information by the large model according to the designated output position, and waiting for further processing by the application program or the system.
2. The method for applying artificial intelligence according to claim 1, wherein the human-like operation can be performed according to the location information, comprising the steps of:
s5, adding the position information of the display information where the characters are in the character map generated in the step S2;
s6, converting the control related text sequence returned by the large model and the position information into a control instruction through a control instruction translator, and performing human-like operation on the system.
3. A method of applying artificial intelligence according to claim 1, characterized in that certain operations of humans are recorded, literal mappings are generated and sent to large models.
4. An apparatus for applying artificial intelligence, comprising the following modules:
the D1 display information acquisition module is responsible for acquiring display images at a certain frequency and can designate an image acquisition area;
d2, a display information word mapping module, which is responsible for mapping the display information word and position;
d3, applying a storage module to store description information of tasks to be realized, and returning format requirements of the task return information and information return position requirements;
the input module of the large model D4, namely an input interface of the large model, combines the information of the equipment D2 and the equipment D3 and inputs the combined information into the large model;
the D5 large model output module is used for placing information output by the large model into a designated position according to the information return position requirement in the device D3;
and D6, an application program processing module is used for retrieving information in the device D5 and sorting and displaying the information.
5. The apparatus for applying artificial intelligence according to claim 4, comprising the following modules:
d7 control instruction translator, which translates control related text returned by device D5, along with location information, into system control instructions including, but not limited to, click, long press, swipe, text input.
CN202310472221.8A 2023-04-27 2023-04-27 Method and device for applying artificial intelligence Pending CN116774891A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310472221.8A CN116774891A (en) 2023-04-27 2023-04-27 Method and device for applying artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310472221.8A CN116774891A (en) 2023-04-27 2023-04-27 Method and device for applying artificial intelligence

Publications (1)

Publication Number Publication Date
CN116774891A true CN116774891A (en) 2023-09-19

Family

ID=87993893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310472221.8A Pending CN116774891A (en) 2023-04-27 2023-04-27 Method and device for applying artificial intelligence

Country Status (1)

Country Link
CN (1) CN116774891A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104599669A (en) * 2014-12-31 2015-05-06 乐视致新电子科技(天津)有限公司 Voice control method and device
CN106448670A (en) * 2016-10-21 2017-02-22 竹间智能科技(上海)有限公司 Dialogue automatic reply system based on deep learning and reinforcement learning
CN107195302A (en) * 2017-06-02 2017-09-22 努比亚技术有限公司 A kind of method of Voice command and corresponding system, terminal device
CN109995642A (en) * 2017-12-29 2019-07-09 Tcl集团股份有限公司 A kind of method and device automatically generating quickly revert, instant communicating system
CN112511882A (en) * 2020-11-13 2021-03-16 海信视像科技股份有限公司 Display device and voice call-up method
CN114840327A (en) * 2022-06-29 2022-08-02 阿里巴巴达摩院(杭州)科技有限公司 Multi-mode multi-task processing method, device and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104599669A (en) * 2014-12-31 2015-05-06 乐视致新电子科技(天津)有限公司 Voice control method and device
CN106448670A (en) * 2016-10-21 2017-02-22 竹间智能科技(上海)有限公司 Dialogue automatic reply system based on deep learning and reinforcement learning
CN107195302A (en) * 2017-06-02 2017-09-22 努比亚技术有限公司 A kind of method of Voice command and corresponding system, terminal device
CN109995642A (en) * 2017-12-29 2019-07-09 Tcl集团股份有限公司 A kind of method and device automatically generating quickly revert, instant communicating system
CN112511882A (en) * 2020-11-13 2021-03-16 海信视像科技股份有限公司 Display device and voice call-up method
CN114840327A (en) * 2022-06-29 2022-08-02 阿里巴巴达摩院(杭州)科技有限公司 Multi-mode multi-task processing method, device and system

Similar Documents

Publication Publication Date Title
CN111095215B (en) Inter-application delivery format specific data objects
US11727200B2 (en) Annotation tool generation method, annotation method, electronic device and storage medium
CN111325020A (en) Event argument extraction method and device and electronic equipment
KR20210043493A (en) Methods, devices and devices for generating vector representations of knowledge graphs
CN111240669B (en) Interface generation method and device, electronic equipment and computer storage medium
CN112507101A (en) Method and device for establishing pre-training language model
CN110691028A (en) Message processing method, device, terminal and storage medium
Scott et al. Towards an interaction blueprint for mixed reality experiences in glam spaces: the augmented telegrapher at porthcurno museum
CN109598001A (en) A kind of information display method, device and equipment
Shaikh Augmented reality search to improve searching using augmented reality
CN112328088B (en) Image presentation method and device
CN112911266A (en) Implementation method and system of Internet of things practical training system based on augmented reality technology
CN116774891A (en) Method and device for applying artificial intelligence
Wilson et al. Enhanced interaction styles for user interfaces
CN114970666B (en) Spoken language processing method and device, electronic equipment and storage medium
US20240038223A1 (en) Speech recognition method and apparatus
EP3896614A2 (en) Method and apparatus for labeling data
Zaguia et al. Using multimodal fusion in accessing web services
Zhang Development and analysis of educational virtual reality system using static image
CN116383620B (en) Method and device for applying multi-mode artificial intelligence
US20230244325A1 (en) Learned computer control using pointing device and keyboard actions
JP2019133418A (en) Search device, search method, program, and database
CN117036652B (en) Layout information generation method, model training method, device and electronic equipment
Shankar et al. Collaborative Interactive Workspace Environment Using Augmented Reality
KR20210037634A (en) Event argument extraction method, event argument extraction apparatus and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination