CN117762422A - Code generation method, device, electronic equipment and storage medium - Google Patents

Code generation method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117762422A
CN117762422A CN202311814322.5A CN202311814322A CN117762422A CN 117762422 A CN117762422 A CN 117762422A CN 202311814322 A CN202311814322 A CN 202311814322A CN 117762422 A CN117762422 A CN 117762422A
Authority
CN
China
Prior art keywords
target
data
code
cursor position
gesture information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311814322.5A
Other languages
Chinese (zh)
Inventor
雷超
于鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing SoundAI Technology Co Ltd
Original Assignee
Beijing SoundAI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing SoundAI Technology Co Ltd filed Critical Beijing SoundAI Technology Co Ltd
Priority to CN202311814322.5A priority Critical patent/CN117762422A/en
Publication of CN117762422A publication Critical patent/CN117762422A/en
Pending legal-status Critical Current

Links

Landscapes

  • User Interface Of Digital Computer (AREA)

Abstract

The invention provides a code generation method, a device, electronic equipment and a storage medium, and relates to the technical field of artificial intelligence, wherein the method comprises the following steps: acquiring video data and first data corresponding to a user; performing line-of-sight recognition on the video data, and determining a target cursor position in a code operation interface; inputting the first data into a code generation model, and generating a target code corresponding to the first data at the target cursor position; the code generation model comprises a data identification model and a large language model corresponding to the first data. The invention can realize simpler, efficient and intelligent code automatic generation.

Description

Code generation method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to a code generation method, a code generation device, an electronic device, and a storage medium.
Background
Code development refers to the process of writing a series of instructions and sentences according to a certain rule in the process of software development, and realizing the functions and behaviors of the software by writing program codes.
In the prior art, in the code development process, a software engineer generally writes codes by means of a mouse and a keyboard, for example, confirmation click is controlled by a left mouse button, range selection is controlled by movement of the mouse, menu options are controlled by a right mouse button, and operations such as line changing, line retracting, deleting and withdrawing are controlled by corresponding keys in the keyboard. The speed of code writing is completely dependent on the proficiency of a software engineer in a mouse and a keyboard, and each operation has delay, so that the interaction efficiency and the code writing efficiency in the whole code development process are low.
Disclosure of Invention
The invention provides a code generation method, a device, electronic equipment and a storage medium, which are used for solving the defects of lower interaction efficiency and lower code writing efficiency in the whole code development process in the prior art, and realizing simpler, efficient and intelligent automatic code generation.
The invention provides a code generation method, which comprises the following steps:
acquiring video data and first data corresponding to a user;
performing line-of-sight recognition on the video data, and determining a target cursor position in a code operation interface;
inputting the first data into a code generation model, and generating a target code corresponding to the first data at the target cursor position; the code generation model comprises a data identification model and a large language model corresponding to the first data.
According to the code generation method provided by the invention, the code generation model is input with the first data, and the target code corresponding to the first data is generated at the target cursor position, which comprises the following steps:
inputting the first data into the data identification model, and determining text data corresponding to the first data;
and generating a target code corresponding to the first data at the target cursor position based on a matching result of the text data and the preset character.
According to the code generation method provided by the invention, the generating the target code corresponding to the first data at the target cursor position based on the matching result of the text data and the preset character comprises the following steps:
acquiring second data corresponding to the user under the condition that the text data is matched with the preset characters;
inputting the second data into the data identification model, and outputting prompt text corresponding to the second data;
inputting the prompt text into the large language model, and outputting a reply text corresponding to the second data;
extracting the target code from the reply text, and outputting the target code at the target cursor position.
According to the code generation method provided by the invention, the code corresponding to the first data is generated at the target cursor position based on the matching result of the text data and the preset character, and the code generation method comprises the following steps:
and under the condition that the text data is not matched with the preset characters, extracting the target code corresponding to the first data from the text data, and outputting the target code at the target cursor position.
According to the code generation method provided by the invention, the video data is subjected to line-of-sight recognition, and the target cursor position in the code operation interface is determined, which comprises the following steps:
performing line-of-sight recognition on the video data, and determining the current cursor position to be operated by the user in the code operation interface;
determining target gesture information corresponding to the user based on the video data;
matching the target gesture information with all first gesture information in a preset mapping relation, and determining a target cursor movement strategy corresponding to the target gesture information; the preset mapping relation comprises a mapping relation between the first gesture information and a cursor movement strategy;
and updating the current cursor position to be the target cursor position based on the target cursor movement strategy.
According to the code generation method provided by the invention, the preset mapping relation also comprises a mapping relation between the second gesture information and the operation strategy of the code operation interface;
the method further comprises the steps of:
under the condition that the target gesture information is not matched with all first gesture information in the preset mapping relation, matching the target gesture information with all second gesture information in the preset mapping relation, and determining a target operation strategy corresponding to the target gesture information;
and updating the code operation interface based on the target operation strategy.
According to the code generation method provided by the invention, the preset mapping relation further comprises a mapping relation between the blink type and the operation instruction;
the updating the code operation interface based on the target operation strategy comprises the following steps:
determining a target blink type corresponding to the user based on the video data;
matching the target blink type with all blink types in the preset mapping relation, and determining a target operation instruction corresponding to the target blink type;
determining a target intention instruction corresponding to the user based on the target operation instruction and the target operation strategy;
and executing the target intention instruction and updating the code operation interface.
The invention also provides a code generation device, which comprises:
the acquisition module is used for acquiring video data and first data corresponding to a user;
the determining module is used for carrying out line-of-sight identification on the video data and determining the position of a target cursor in the code operation interface;
the generation module is used for inputting the first data into a code generation model and generating a target code corresponding to the first data at the target cursor position; the code generation model comprises a data identification model and a large language model corresponding to the first data.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a code generation method as described in any of the above when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a code generation method as described in any of the above.
The code generation method, the device, the electronic equipment and the storage medium provided by the invention have the advantages that after the video data and the first data corresponding to the user are obtained, the target cursor position to be operated by the user is determined in the code operation interface through the visual line recognition on the video data, the first data is input into the code generation model so as to output the target code corresponding to the first data at the target cursor position, the frequent operation of the user on a mouse is replaced by the recognition on the video data, the target code to be input by the user is determined through the processing on the first data, the frequent operation of the user on a keyboard is replaced, the interaction efficiency and the code writing efficiency are improved under the condition of getting rid of the peripheral equipment such as the mouse, the keyboard and the like, and the efficient and intelligent generation of the code is realized.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a code generation method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a code generating device according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Aiming at the problem of lower interaction efficiency and lower code writing efficiency in the whole code development process in the prior art, the embodiment of the invention provides a code generation method, and fig. 1 is a schematic flow chart of the code generation method provided by the embodiment of the invention, as shown in fig. 1, the method comprises the following steps:
step 110, obtaining video data and first data corresponding to a user.
Optionally, the video data may be collected by a camera installed in the electronic device corresponding to the code operation interface, or may be collected by an indoor camera where the electronic device is located, and the collected video data is sent to the electronic device. After the video data is obtained, the video data can be subjected to frame extraction to obtain multi-frame static images. When the video data is extracted, the playing speed of the video data can be firstly adjusted to the slowest speed for playing, and the video data can be intercepted one by one after being played. The frames can be extracted at equal intervals or unequal intervals during frame extraction, and the embodiment of the invention is not limited in this respect.
Optionally, the first data may include voice data or mouth shape video data corresponding to the user, where the first data is used to determine an object code to be input by the user at the code operation interface.
Optionally, when the first data is voice data corresponding to a user, an application scenario corresponding to the code generation method provided by the embodiment of the present invention may be a rapid prototyping development scenario, a code reconstruction scenario, or an auxiliary programming scenario, where: when a rapid prototype development scene is that a certain idea or design needs to be rapidly verified, the idea or design can be evaluated and improved before the actual code development by getting rid of the automatic and efficient code generation of peripheral devices such as a mouse, a keyboard and the like; the code reconstruction scene is that the existing codes need to be reconstructed along with the growth and change of software projects so as to optimize the performance and improve the maintainability; the auxiliary programming scene is that part of codes are automatically and efficiently generated by getting rid of peripheral equipment such as a mouse, a keyboard and the like in the actual code development process, so that the workload of manually writing the codes is reduced, the code development speed is increased, and meanwhile, possible writing errors are reduced.
Optionally, when the voice data is collected, the voice data may be collected through a radio receiving device installed in the electronic device, where the radio receiving device may include: microphones, recording mechanisms, headphones, etc. After the voice data is obtained, preprocessing operations such as denoising, signal enhancement, framing, windowing and the like can be performed on the voice data so as to improve the accuracy of subsequent voice transcription of the voice data. The language corresponding to the voice data may be chinese, english, french, german, russian, etc., which is not limited by the embodiment of the present invention.
Optionally, when the first data is the mouth shape video data corresponding to the user, the application scene corresponding to the code generating method provided by the embodiment of the invention may be a conference scene, or a scene such as a patient or a deaf-mute who cannot sound accidentally. The mouth shape video data can be acquired by a camera installed in the electronic equipment, or can be acquired by an indoor camera where the electronic equipment is located, and the mouth shape video data is sent to the electronic equipment. After the mouth shape video data is obtained, the mouth shape identification data is required to be subjected to frame extraction, and a multi-frame mouth shape image is obtained. In order to ensure accuracy of the subsequent mouth shape recognition result, the frame extraction interval may be set to a small value, for example, 1 frame when the mouth shape video data is extracted.
And 120, performing line-of-sight recognition on the video data, and determining the position of a target cursor in a code operation interface.
Specifically, after the video data is acquired, line-of-sight recognition can be performed on the video data first, and the target cursor position to be operated by the user in the code operation interface can be determined.
Further, the performing line-of-sight recognition on the video data to determine a target cursor position in a code operation interface includes:
performing line-of-sight recognition on the video data, and determining the current cursor position to be operated by the user in the code operation interface;
determining target gesture information corresponding to the user based on the video data;
matching the target gesture information with all first gesture information in a preset mapping relation, and determining a target cursor movement strategy corresponding to the target gesture information; the preset mapping relation comprises a mapping relation between the first gesture information and a cursor movement strategy;
and updating the current cursor position to be the target cursor position based on the target cursor movement strategy.
Specifically, after the video data is acquired, the current cursor position to be operated by the user can be determined in the code operation interface by using an eye tracking method, and if the target gesture information of the user does not exist in the video data, the current cursor position can be determined as the target cursor position. If the target gesture information exists in the video data, the target gesture information is further matched with first gesture information in a preset mapping relation, and if the target gesture information exists in the first gesture information matched with the target gesture information, a cursor movement strategy corresponding to the first gesture information in the preset mapping relation is determined to be a target cursor movement strategy corresponding to the target gesture information, and then the target cursor movement strategy is executed at the current cursor position, so that the current cursor position is updated to be the target cursor position.
Optionally, when determining the current cursor position from the video data by using the eye tracking method, the video data may be identified by using an eye tracking model constructed by a deep learning method, which specifically includes: and performing target detection on the multi-frame static image corresponding to the video data, determining an eye area corresponding to the user, dividing the eye area, and determining the eye data of the user. The eye data is input into the eye movement tracking model, which can extract target eye features of the eye image, which can include pupil size, pupil position, pupil shape, cornea position, cornea shape, eye size, eye position, and the like, wherein pupil size, pupil position, pupil shape, cornea position, and cornea shape are features that can reflect the gaze direction. The eye movement tracking model predicts the predicted direction of the sight line of the user according to the extracted target eye feature by learning the mapping relation between the eye feature and the sight line direction during training. After the sight line prediction direction of the user is determined, the gaze point position on the code operation interface watched by the user is determined, and then the current cursor position in the code operation interface is determined according to the head gesture and the gaze point position of the user, and after the current cursor position is determined, the cursor can be controlled to move to the current cursor position.
Optionally, the eye movement tracking model may be constructed based on a convolutional neural network (Convolutional Neural Networks, CNN), a recurrent neural network (Recurrent Neural Network, RNN), a Long Term Memory (LSTM), or the like, and acquires a large amount of eye image data and line-of-sight directions in a natural state to train an initial eye movement tracking model, where the initial eye movement tracking model learns a mapping relationship between eye features and line-of-sight directions to obtain a trained eye movement tracking model.
In addition, the pupil cornea reflection method in the eye movement tracking method can be used to determine the prediction direction, and compared with the recognition of video data based on the eye movement tracking model constructed by the deep learning method, the prediction direction is different in that when the target eye feature is extracted from the eye image, the pupil size, pupil position, pupil shape, cornea position and cornea shape are extracted, and the reflection signals of the light rays at the pupil and cornea are extracted. It should be noted that, because the reflected signals of the pupil and the cornea are weaker, the accuracy requirement for the camera for collecting video data is higher. A support vector machine (Support Vector Machine, SVM) may also be used in pupillary keratometry to construct an eye tracking model.
Further, after determining the current cursor position, target detection and recognition can be performed on the multi-frame static image corresponding to the video data, and if the hand of the user is detected, the static image is further segmented, so that multi-frame hand images are obtained. Thereafter, a gesture recognition model in the electronic device may extract gesture features in the hand image, which may include a number of fingers, a shape of fingers, and finger motion features, which may include a direction of motion of the fingers, a displacement of motion of the fingers, a trajectory of motion of the fingers, and the like. After the gesture features are extracted, the gesture recognition model can recognize according to the gesture features to obtain target gesture information of the user.
Alternatively, the gesture recognition model may employ a time-series based model, such as a hidden markov model (Hidden Markov Model, HMM), a support vector machine, and the like, to which embodiments of the present invention are not limited.
Optionally, the preset mapping relationship may include different first gesture information and corresponding cursor movement policies. For example, (1) the first gesture information is that one finger moves once in the first target direction, and the corresponding cursor movement strategy is that the cursor moves one character from the current cursor position in the first target direction, for example, if the first target direction is horizontal left or horizontal right, the cursor moves one character from the current cursor position horizontally left or horizontal right, and if the first target direction is vertical up or vertical down, the cursor moves vertically from the current cursor position to the previous row or vertically to the next row. (2) The first gesture information is five-finger movement to a second target direction, the corresponding cursor movement strategy is to select a plurality of characters from the current cursor position to the target direction, and after the cursor moves from the current cursor position to the last character in the selected plurality of characters, the second target direction is horizontal left or horizontal right. For example, taking the current cursor position as the left side of character D as an example, if five fingers move horizontally 4 times to the right, this means that 4 characters "DEFG" are selected to the right from the current cursor position, and the cursor moves from the left side of character D to the right side of character G. If the five fingers move horizontally 3 times to the left, this indicates that 3 characters "ABC" are selected from the left side of character D to the left, and the cursor moves from the left side of character D to the left side of character C, i.e., the current cursor position has not changed. (3) The first gesture information is a fist, and the corresponding cursor movement strategy is a retract or space operation, i.e., the cursor is retracted from the current cursor position by a number of characters, or the cursor is advanced from the current cursor position by one character. (4) The second gesture information is a fist-making rotation, and the corresponding operation strategy is carriage return, i.e. the cursor jumps from the current cursor position to the starting position of the next row. The embodiments of the present invention are not limited in this regard.
Further, the preset mapping relation also comprises a mapping relation between the second gesture information and an operation strategy of the code operation interface;
the method further comprises the steps of:
under the condition that the target gesture information is not matched with all first gesture information in the preset mapping relation, matching the target gesture information with all second gesture information in the preset mapping relation, and determining a target operation strategy corresponding to the target gesture information;
and updating the code operation interface based on the target operation strategy.
Specifically, if the target gesture information is not matched with all the first gesture information in the preset mapping relationship, the target gesture information can be matched with all the second gesture information in the preset mapping relationship, and an operation strategy corresponding to the second gesture information matched with the target gesture information is determined as a target operation strategy, wherein the target operation strategy is to perform operations except cursor control on the code operation interface.
Optionally, the second gesture information and the corresponding operation policy in the preset mapping relationship may include: (1) And when the second gesture information is double-finger combined transverse movement, the corresponding operation strategy is selected. (2) The second gesture information is that the five fingers move vertically, and the corresponding operation strategy is to turn or scroll the code operation interface, for example, if the five fingers move vertically upwards, the code operation interface is turned to the previous page, or the code operation interface is scrolled upwards; if the five fingers move vertically downwards, the code operation interface is turned to the next page, or the code operation interface is scrolled downwards. (3) The second gesture information is a two-finger separation, and the corresponding operation policy is a withdraw operation, i.e., the last operation policy is withdrawn. The embodiments of the present invention are not limited in this regard.
Further, the preset mapping relation further comprises a mapping relation between the blink type and the operation instruction;
the updating the code operation interface based on the target operation strategy comprises the following steps:
determining a target blink type corresponding to the user based on the video data;
matching the target blink type with all blink types in the preset mapping relation, and determining a target operation instruction corresponding to the target blink type;
determining a target intention instruction corresponding to the user based on the target operation instruction and the target operation strategy;
and executing the target intention instruction and updating the code operation interface.
Specifically, the preset mapping relationship further includes a mapping relationship between the blink type and the operation instruction, that is, the blink type is used to replace the mouse operation, for example, if the blink type is a blink left eye, the corresponding operation instruction is a left mouse button, and if the blink type is a blink right eye, the corresponding operation instruction is a right mouse button, and the operation instruction can be understood as an interface interaction instruction for interacting with the code operation interface. After the target operation strategy is determined, identifying eye images in video data, determining a target blink type corresponding to a user, matching the target blink type with all blink types, and determining a target operation instruction according to an operation instruction corresponding to the matched blink type. And combining the target operation instruction and the target operation strategy, determining a target intention instruction corresponding to the user, executing the target intention instruction, and updating the code operation interface. For example, taking the target operation instruction as a left mouse button, taking the target operation strategy as an example of scrolling upwards, combining the left mouse button with scrolling upwards is equivalent to generating a target intention instruction for scrolling upwards the left mouse button control code operation interface, that is, the code operation interface scrolls upwards along with the upward movement of the left mouse button. After the target intention instruction is executed, the updating of the code operation interface can be realized.
It should be noted that, the blink types in the preset mapping relationship are respectively blinking the left eye or blinking the right eye, if the user blinks both eyes simultaneously in the video data, the blink action is indicated as the normal physiological response of the user, and the target operation instruction is not output to the electronic device.
130, inputting the first data into a code generation model, and generating a target code corresponding to the first data at the target cursor position; the code generation model comprises a data identification model and a large language model corresponding to the first data.
Specifically, after determining the target cursor position, firstly inputting the first data into a data identification model in a code generation model, identifying the first data through the data identification model, judging whether the large language model needs to be awakened through the identification result, if the large language model does not need to be awakened, determining the corresponding target code at the target cursor position directly according to the identification result of the first data, if the large language model needs to be awakened, generating the target code through the large language model in an auxiliary mode further according to the identification result of the first data, and outputting the target code at the target cursor position.
Alternatively, the large language model may include a text-to-speech and a star fire, etc. The code training data of the large language model is relatively clear and fixed so as to ensure that the accuracy of the target code generated by the large language model is relatively high. The training data may include: the hint words and corresponding source code may include a generate class or interface, a generate constructor, generate get and set methods, rewrite x methods, generate main or test classes, generate loops, assist in completing calls, generate tables corresponding map+dao+service, generate interface documents, format codes, simplify codes, and code risk prediction, among others. The source code may be determined based on The open source data sets of pipe and codeParrot, etc., and matches The hint word.
Further, the generating the first data input code generating model, generating the target code corresponding to the first data at the target cursor position, includes:
inputting the first data into the data identification model, and determining text data corresponding to the first data;
and generating a target code corresponding to the first data at the target cursor position based on a matching result of the text data and the preset character.
Specifically, if the first data is voice data, the data recognition model is a voice recognition model, and the voice recognition model performs voice transcription on the voice data to obtain text data corresponding to the first data; if the first data is the mouth shape video data, the data identification model is a mouth shape identification model, and the mouth shape identification model is used for identifying the mouth shape video data and determining words and sentences corresponding to the mouth shape corresponding to the user, namely determining text data corresponding to the mouth shape video data. After the text data corresponding to the first data are identified, the text data are matched with preset characters, whether a large language model needs to be awakened or not is determined according to the matching result, and then the target code which needs to be output at the target cursor position is determined.
Further, the generating, based on the matching result of the text data and the preset character, the object code corresponding to the first data at the position of the object cursor includes:
acquiring second data corresponding to the user under the condition that the text data is matched with the preset characters;
inputting the second data into the data identification model, and outputting prompt text corresponding to the second data;
inputting the prompt text into the large language model, and outputting a reply text corresponding to the second data;
extracting the target code from the reply text, and outputting the target code at the target cursor position.
For example, taking the preset character as a 'whisper phrase', taking the second data as a 'main method for generating the enterprise query system', taking the second data as voice data as an example, if the text data is the 'whisper phrase', that is, the text data is matched with the preset character, the large language model can be awakened, and the second data input by the user can be acquired again. After the second data is obtained, the second data can be input into a voice recognition model, and the voice recognition model carries out voice transcription on the second data to obtain a prompt text with the content of 'main method for generating an enterprise query system'. And then, inputting the prompt text into a large language model to obtain a reply text, wherein the reply text possibly comprises the target code and the explanation corresponding to the target code, filtering the reply text to remove the explanation corresponding to the target code in the reply text, only reserving the target code, and outputting the target code at the target cursor position.
Optionally, the type of the second data may be the same as or different from the type of the first data, for example, when the first data is voice data, the second data may be voice data or mouth-shaped video data.
Further, the generating, based on the matching result of the text data and the preset character, the code corresponding to the first data at the target cursor position includes:
and under the condition that the text data is not matched with the preset characters, extracting the target code corresponding to the first data from the text data, and outputting the target code at the target cursor position.
Specifically, if the text data is not matched with the preset characters, the fact that a large language model is not required to be awakened is indicated, and the user directly speaks the target code. However, there may be a problem text and an object code that are recognized by the environmental noise in the text data, at this time, based on the consideration that the object code is an english word, the problem text may be filtered out from the text data, only the object code is reserved, and the object code is output at the position of the object cursor, so as to implement efficient automatic generation of the code.
According to the code generation method provided by the embodiment of the invention, after the video data and the first data corresponding to the user are obtained, the target cursor position to be operated by the user is determined in the code operation interface by performing line-of-sight recognition on the video data, the first data is input into the code generation model so as to output the target code corresponding to the first data at the target cursor position, the frequent operation of the user on the mouse is replaced by recognizing the video data, the target code to be input by the user is determined by processing the first data, the frequent operation of the user on the keyboard is replaced, the interaction efficiency and the code writing efficiency are improved under the condition of getting rid of peripheral equipment such as the mouse and the keyboard, and the efficient and intelligent code generation is realized.
The code generating apparatus provided by the present invention will be described below, and the code generating apparatus described below and the code generating method described above may be referred to correspondingly to each other.
The embodiment of the present invention further provides a code generating device, fig. 2 is a schematic structural diagram of the code generating device provided in the embodiment of the present invention, as shown in fig. 2, the code generating device 200 includes: an acquisition module 210, a determination module 220, and a generation module 230, wherein:
an obtaining module 210, configured to obtain video data and first data corresponding to a user;
a determining module 220, configured to perform line-of-sight recognition on the video data, and determine a target cursor position in a code operation interface;
a generating module 230, configured to input the first data into a code generating model, and generate a target code corresponding to the first data at the target cursor position; the code generation model comprises a data identification model and a large language model corresponding to the first data.
According to the code generation device provided by the embodiment of the invention, after the video data and the first data corresponding to the user are obtained, the target cursor position to be operated by the user is determined in the code operation interface by performing line-of-sight recognition on the video data, the first data is input into the code generation model so as to output the target code corresponding to the first data at the target cursor position, the frequent operation of the user on a mouse is replaced by recognizing the video data, the target code to be input by the user is determined by processing the first data, the frequent operation of the user on a keyboard is replaced, the interaction efficiency and the code writing efficiency are improved under the condition of getting rid of peripheral equipment such as the mouse and the keyboard, and the efficient and intelligent generation of the code is realized.
Optionally, the determining module 220 is specifically configured to:
performing line-of-sight recognition on the video data, and determining the current cursor position to be operated by the user in the code operation interface;
determining target gesture information corresponding to the user based on the video data;
matching the target gesture information with all first gesture information in a preset mapping relation, and determining a target cursor movement strategy corresponding to the target gesture information; the preset mapping relation comprises a mapping relation between the first gesture information and a cursor movement strategy;
and updating the current cursor position to be the target cursor position based on the target cursor movement strategy.
Optionally, the preset mapping relationship further includes a mapping relationship between the second gesture information and an operation policy of the code operation interface.
Optionally, the determining module 220 is specifically configured to:
under the condition that the target gesture information is not matched with all first gesture information in the preset mapping relation, matching the target gesture information with all second gesture information in the preset mapping relation, and determining a target operation strategy corresponding to the target gesture information;
and updating the code operation interface based on the target operation strategy.
Optionally, the preset mapping relationship further includes a mapping relationship between the blink type and the operation instruction.
Optionally, the determining module 220 is specifically configured to:
determining a target blink type corresponding to the user based on the video data;
matching the target blink type with all blink types in the preset mapping relation, and determining a target operation instruction corresponding to the target blink type;
determining a target intention instruction corresponding to the user based on the target operation instruction and the target operation strategy;
and executing the target intention instruction and updating the code operation interface.
Optionally, the generating module 230 is specifically configured to:
inputting the first data into the data identification model, and determining text data corresponding to the first data;
and generating a target code corresponding to the first data at the target cursor position based on a matching result of the text data and the preset character.
Optionally, the generating module 230 is specifically configured to:
acquiring second data corresponding to the user under the condition that the text data is matched with the preset characters;
inputting the second data into the data identification model, and outputting prompt text corresponding to the second data;
inputting the prompt text into the large language model, and outputting a reply text corresponding to the second data;
extracting the target code from the reply text, and outputting the target code at the target cursor position.
Optionally, the generating module 230 is specifically configured to:
and under the condition that the text data is not matched with the preset characters, extracting the target code corresponding to the first data from the text data, and outputting the target code at the target cursor position.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, as shown in fig. 3, the electronic device may include: processor 310, communication interface (Communications Interface) 320, memory 330 and communication bus 340, wherein processor 310, communication interface 320, memory 330 accomplish communication with each other through communication bus 340. The processor 310 may invoke logic instructions in the memory 330 to perform a code generation method comprising:
acquiring video data and first data corresponding to a user;
performing line-of-sight recognition on the video data, and determining a target cursor position in a code operation interface;
inputting the first data into a code generation model, and generating a target code corresponding to the first data at the target cursor position; the code generation model comprises a data identification model and a large language model corresponding to the first data.
Further, the logic instructions in the memory 330 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing the code generation method provided by the methods described above, the method comprising:
acquiring video data and first data corresponding to a user;
performing line-of-sight recognition on the video data, and determining a target cursor position in a code operation interface;
inputting the first data into a code generation model, and generating a target code corresponding to the first data at the target cursor position; the code generation model comprises a data identification model and a large language model corresponding to the first data.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the code generation method provided by the above methods, the method comprising:
acquiring video data and first data corresponding to a user;
performing line-of-sight recognition on the video data, and determining a target cursor position in a code operation interface;
inputting the first data into a code generation model, and generating a target code corresponding to the first data at the target cursor position; the code generation model comprises a data identification model and a large language model corresponding to the first data.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A code generation method, comprising:
acquiring video data and first data corresponding to a user;
performing line-of-sight recognition on the video data, and determining a target cursor position in a code operation interface;
inputting the first data into a code generation model, and generating a target code corresponding to the first data at the target cursor position; the code generation model comprises a data identification model and a large language model corresponding to the first data.
2. The code generation method according to claim 1, wherein the inputting the first data into the code generation model generates the target code corresponding to the first data at the target cursor position, comprising:
inputting the first data into the data identification model, and determining text data corresponding to the first data;
and generating a target code corresponding to the first data at the target cursor position based on a matching result of the text data and the preset character.
3. The code generating method according to claim 2, wherein the generating the target code corresponding to the first data at the target cursor position based on the matching result of the text data and the preset character includes:
acquiring second data corresponding to the user under the condition that the text data is matched with the preset characters;
inputting the second data into the data identification model, and outputting prompt text corresponding to the second data;
inputting the prompt text into the large language model, and outputting a reply text corresponding to the second data;
extracting the target code from the reply text, and outputting the target code at the target cursor position.
4. The code generating method according to claim 2, wherein the generating the code corresponding to the first data at the target cursor position based on the matching result of the text data and a preset character includes:
and under the condition that the text data is not matched with the preset characters, extracting the target code corresponding to the first data from the text data, and outputting the target code at the target cursor position.
5. The code generation method of any of claims 1-4, wherein the performing line-of-sight recognition on the video data to determine a target cursor position in a code operation interface comprises:
performing line-of-sight recognition on the video data, and determining the current cursor position to be operated by the user in the code operation interface;
determining target gesture information corresponding to the user based on the video data;
matching the target gesture information with all first gesture information in a preset mapping relation, and determining a target cursor movement strategy corresponding to the target gesture information; the preset mapping relation comprises a mapping relation between the first gesture information and a cursor movement strategy;
and updating the current cursor position to be the target cursor position based on the target cursor movement strategy.
6. The code generation method according to claim 5, wherein the preset mapping relationship further includes a mapping relationship between second gesture information and an operation policy of a code operation interface;
the method further comprises the steps of:
under the condition that the target gesture information is not matched with all first gesture information in the preset mapping relation, matching the target gesture information with all second gesture information in the preset mapping relation, and determining a target operation strategy corresponding to the target gesture information;
and updating the code operation interface based on the target operation strategy.
7. The code generation method of claim 6, wherein the preset mapping relationship further comprises a mapping relationship between blink type and operation instruction;
the updating the code operation interface based on the target operation strategy comprises the following steps:
determining a target blink type corresponding to the user based on the video data;
matching the target blink type with all blink types in the preset mapping relation, and determining a target operation instruction corresponding to the target blink type;
determining a target intention instruction corresponding to the user based on the target operation instruction and the target operation strategy;
and executing the target intention instruction and updating the code operation interface.
8. A code generating apparatus, comprising:
the acquisition module is used for acquiring video data and first data corresponding to a user;
the determining module is used for carrying out line-of-sight identification on the video data and determining the position of a target cursor in the code operation interface;
the generation module is used for inputting the first data into a code generation model and generating a target code corresponding to the first data at the target cursor position; the code generation model comprises a data identification model and a large language model corresponding to the first data.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the code generation method of any of claims 1-7 when the program is executed by the processor.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the code generation method according to any of claims 1-7.
CN202311814322.5A 2023-12-26 2023-12-26 Code generation method, device, electronic equipment and storage medium Pending CN117762422A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311814322.5A CN117762422A (en) 2023-12-26 2023-12-26 Code generation method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311814322.5A CN117762422A (en) 2023-12-26 2023-12-26 Code generation method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117762422A true CN117762422A (en) 2024-03-26

Family

ID=90316085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311814322.5A Pending CN117762422A (en) 2023-12-26 2023-12-26 Code generation method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117762422A (en)

Similar Documents

Publication Publication Date Title
US10977452B2 (en) Multi-lingual virtual personal assistant
US10551915B2 (en) Gaze based text input systems and methods
US20210081056A1 (en) Vpa with integrated object recognition and facial expression recognition
US11848000B2 (en) Transcription revision interface for speech recognition system
CN106663189B (en) System and method for recognition techniques of superimposed handwriting
CA2313968A1 (en) A method for correcting the error characters in the result of speech recognition and the speech recognition system using the same
US10133945B2 (en) Sketch misrecognition correction system based on eye gaze monitoring
EP3979098A1 (en) Data processing method and apparatus, storage medium, and electronic apparatus
CN108682420A (en) A kind of voice and video telephone accent recognition method and terminal device
Strauß et al. ICFHR2018 competition on automated text recognition on a READ dataset
US11900931B2 (en) Information processing apparatus and information processing method
CN113223123A (en) Image processing method and image processing apparatus
KR20190105175A (en) Electronic device and Method for generating Natural Language thereof
Sankar et al. Multistream neural architectures for cued speech recognition using a pre-trained visual feature extractor and constrained ctc decoding
CN112163513A (en) Information selection method, system, device, electronic equipment and storage medium
CN117762422A (en) Code generation method, device, electronic equipment and storage medium
US11887600B2 (en) Techniques for interpreting spoken input using non-verbal cues
CN116368490A (en) Electronic device and control method thereof
Wang et al. Listen, Decipher and Sign: Toward Unsupervised Speech-to-Sign Language Recognition
Chen et al. Language model-guided classifier adaptation for brain-computer interfaces for communication
CN116229973B (en) Method for realizing visible and can-say function based on OCR
KR20220015831A (en) Electronic device and control method thereof
Vardhan et al. Automatic Sign Language Recognition Using Convolutional Neural Networks
Recalde et al. Creating an Accessible Future: Developing a Sign Language to Speech Translation Mobile Application with MediaPipe Hands Technology
AISHWARYA et al. ANNA UNIVERSITY: CHENNAI 600025

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication