CN114155479B

CN114155479B - Language interaction processing method and device and electronic equipment

Info

Publication number: CN114155479B
Application number: CN202210119958.7A
Authority: CN
Inventors: 赵宇
Original assignee: Zhongnong Polaris Tianjin Intelligent Agricultural Machinery Equipment Co ltd
Current assignee: Zhao Yu
Priority date: 2022-02-09
Filing date: 2022-02-09
Publication date: 2022-04-26
Anticipated expiration: 2042-02-09
Also published as: CN114155479A

Abstract

The application provides a language interaction processing method, a language interaction processing device and electronic equipment, wherein the language interaction processing method comprises the following steps: the method comprises the steps of firstly obtaining scene information of a user, determining a target scene according to the scene information, then determining first statement information according to the target scene, outputting the first statement information to the user, detecting voice information output by the user, and further performing language interaction with the user according to a detection result. In the application, language learning is automatically started according to the scene where the user is located, the learning content is changed along with the change of the scene where the user is located, and a given teaching template is not used for outputting sentence information, so that the user is personally on the scene in the learning process.

Description

Language interaction processing method and device and electronic equipment

Technical Field

The present application relates to the field of language processing technologies, and in particular, to a method and an apparatus for processing language interaction, and an electronic device.

Background

With the continuous development of artificial intelligence, various intelligent language learning methods are correspondingly generated.

Most of the existing intelligent language learning methods are limited to set courseware to enable students to finish learning tasks autonomously, and some intelligent language learning methods and devices realize interaction with the students by replacing traditional teachers with artificial intelligence, but the essence is still question and knowledge explanation based on fixed courses.

The intelligent language learning based on the fixed course has the advantages that the learning content is separated from the life, the learning interest of students is reduced easily, and the learning efficiency is low.

Disclosure of Invention

In view of this, an object of the present application is to provide a language interaction processing method, a language interaction processing apparatus, and an electronic device, which perform immersive language learning from a first perspective, so as to effectively improve learning interest and learning efficiency of a user.

In a first aspect, the present application provides a language interaction processing method, which is applied to a mobile terminal, where a shooting device, a voice detection device, and an information output device are configured, and the method includes: acquiring scene information of a user through a shooting device; the scene information is a scene obtained under a first visual angle of a user; according to the scene information, determining a target scene matched with the scene information from a preset scene library; determining first statement information according to the target scene, and outputting the first statement information through an information output device; and carrying out voice detection on the user through the voice detection device, and continuously outputting the language interaction information to the user through the information output device according to a detection result.

Further, the step of determining a target scene matching the scene information from preset scenes according to the scene information includes: extracting target object information of a target object from the scene information; the target object is an identifier included in a scene; and selecting a preset scene matched with the target object information from a preset scene library as a target scene.

Further, the target object information includes a type of the target object and a number of the target objects; the step of selecting the preset scene matched with the target object information from the preset scene library as the target scene includes: matching an object contained in each preset scene in a preset scene library with target object information to obtain a similarity score corresponding to the preset scene; and determining the preset scene with the similarity score higher than the similarity threshold value as the target scene.

Further, the step of selecting the preset scene matched with the target object information from the preset scene library as the target scene includes: and determining a target scene matched with the target object information through a neural network model for determining the scene matching degree.

Further, the step of determining the first sentence information according to the target scene includes: determining a target keyword according to a target scene; and determining first statement information according to the target keyword.

Furthermore, the target keywords are multiple; the step of determining first sentence information according to the target keyword includes: and combining the target keywords to generate first statement information corresponding to the target keywords.

Further, the step of determining the first sentence information according to the target keyword includes: determining a first sentence matched with a target keyword from a preset semantic library through a semantic recognition neural network; the preset semantic library is stored in the mobile terminal or acquired through the mobile terminal.

Further, the step of outputting the language interaction information to the user continuously through the information output device according to the detection result includes: judging whether the detection result represents that the difficulty of the first statement information is too high; if yes, determining second statement information according to the target scene, and outputting the second statement information through an information output device; and the difficulty corresponding to the second statement information is lower than that corresponding to the first statement information.

Further, when any one of the following conditions is met, determining that the difficulty of the detection result representing the first statement information is too high: the detection result is that the voice information output by the user is not detected; voice information output by the user and contained in the detection result represents that the user does not understand the voice information; the correlation degree between the semantic meaning corresponding to the voice information output by the user and contained in the detection result and the first semantic meaning contained in the first sentence is lower than the correlation degree threshold value.

Further, when any one of the following conditions is met, it is determined that the difficulty corresponding to the second sentence information is lower than the difficulty corresponding to the first sentence information: the speed of speech of the second statement information is lower than that of the first statement information; the vocabulary contained in the second sentence information is smaller than the vocabulary contained in the first sentence information; the number of the uncommon words contained in the second sentence is less than that contained in the first sentence.

Further, the method further comprises: when the difficulty corresponding to the second statement information is lower than a preset difficulty threshold, determining scene introduction information corresponding to the target scene from a preset semantic library, and outputting the scene introduction information in a voice and/or text mode; the preset semantic library is stored in the mobile terminal or acquired through the mobile terminal.

Further, the method further comprises: according to the target scene, determining a target scheduled course matched with the target scene from a scheduled course library; the scheduled curriculum library is stored in the mobile terminal or acquired through the mobile terminal; the user is provided with an interface to initiate a targeted lesson to enable the user to begin language learning in accordance with the targeted lesson.

Further, the method further comprises: providing a learning platform selection interface for a user so that the user can select learning materials of different platforms; and responding to the selection operation of the user on a first learning platform in the learning platforms, and providing the learning materials corresponding to the first learning platform for the user.

In a second aspect, the present application further provides a language interaction processing apparatus, where the apparatus is configured with a shooting module and a voice detection module, and the apparatus includes: the information acquisition module is used for acquiring scene information of a user through the shooting module; the scene information is a scene obtained under a first visual angle of a user; the target scene determining module is used for determining a target scene matched with the scene information from a preset scene library according to the scene information; the first statement information determining module is used for determining first statement information according to the target scene and outputting the first statement information through the information output device; and the output module is used for carrying out voice detection on the user through the voice detection device and continuously outputting the language interaction information to the user through the information output device according to the detection result.

In a third aspect, the present application further provides an electronic device, which includes a processor and a memory, where the memory stores computer-executable instructions capable of being executed by the processor, and the processor executes the computer-executable instructions to implement the language interaction processing method of the first aspect.

In a fourth aspect, the present application also provides a computer-readable storage medium storing computer-executable instructions that, when invoked and executed by a processor, cause the processor to implement the language interaction processing method of the first aspect.

Compared with the prior art, the method has the following beneficial effects:

according to the language interaction processing method, the language interaction processing device and the electronic equipment, firstly scene information of a user is obtained, a target scene is determined according to the scene information, then first statement information is determined according to the target scene, the first statement information is output to the user, voice information output by the user is detected, and language interaction is further carried out with the user according to a detection result. In the application, language learning is automatically started according to the scene where the user is located, the learning content is changed along with the change of the scene where the user is located, and a given teaching template is not used for outputting sentence information, so that the user is personally on the scene in the learning process.

Additional features and advantages of the disclosure will be set forth in the description which follows, or in part may be learned by the practice of the above-described techniques of the disclosure, or may be learned by practice of the disclosure.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the detailed description of the present application or the technical solutions in the prior art, the drawings needed to be used in the detailed description of the present application or the prior art description will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic structural diagram of an electronic system according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a language interaction processing method according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating another language interaction processing method according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating another language interaction processing method according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a language interaction processing apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

At present, intelligent voice recognition technology is raised, intelligent interaction technology for immersive language learning at a first visual angle is less, most of equipment is only limited to fixed course learning, and voice and video courseware type teaching cannot provide a good language environment for users. Based on this, the embodiment of the application provides a language interaction processing method, a language interaction processing device and electronic equipment, so that immersive language learning is performed at a first view angle, and therefore learning interest and learning efficiency of a user are effectively improved.

Referring to fig. 1, a schematic diagram of an electronic system 100 is shown. The electronic system can be used for realizing the language interaction processing method and device of the embodiment of the application.

As shown in fig. 1, an electronic system 100 includes one or more processing devices 102 and one or more memory devices 104. Optionally, electronic system 100 may also include input device 106, output device 108, and one or more voice capture devices 110, which may be interconnected via bus system 112 and/or other forms of connection mechanisms (not shown). It should be noted that the components and structure of the electronic system 100 shown in fig. 1 are exemplary only, and not limiting, and the electronic system may have some of the components in fig. 1, as well as other components and structures, as desired.

The processing device 102 may be a server, a smart terminal, or a device containing a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, may process data for other components in the electronic system 100, and may control other components in the electronic system 100 to perform language interaction processing functions.

Storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer-readable storage medium and executed by processing device 102 to implement the client functionality (implemented by the processing device) of the embodiments of the present application described below and/or other desired functionality. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, function keys, a gesture recognition device, a voice recognition device, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.

The voice capture device 110 may retrieve voice information spoken by the user and store the voice information in the storage 104 for use by other components.

For example, the devices used for implementing the language interaction processing method, apparatus and electronic device according to the embodiments of the present application may be integrally disposed, or may be dispersedly disposed, such as integrally disposing the processing device 102, the storage device 104, the input device 106 and the output device 108, and disposing the voice collecting device 110 at a designated position where voice can be collected. When the above-described devices in the electronic system are integrally provided, the electronic system may be implemented as an intelligent terminal such as a camera, a smart phone, a tablet computer, a vehicle-mounted terminal, and the like.

Fig. 2 is a language interaction processing method provided in an embodiment of the present application, where the method is applied to a mobile terminal, and a shooting device, a voice detection device, and an information output device are configured in the mobile terminal, for example, a smart phone, a smart tablet, or a language learning machine with a voice detection module with a camera, which is specially used for voice learning. The shooting device can be configured in the electronic equipment, or can be connected with an external shooting device through the electronic equipment, the voice detection device can be configured in the electronic equipment, or the electronic equipment can be connected with an external voice detection device, such as an earphone and the like realized by wireless technologies such as Zigbee, Bluetooth, Wi-Fi and the like, and the information output device is a voice and/or text output device. Specifically, the method comprises the following steps:

s202: acquiring scene information of a user through a shooting device; the scene information is a scene obtained under a first visual angle of a user;

the scene information is the scene information of the user, that is, the immersive scene information with the user as the first viewing angle, for example, if one user faces a building, the user takes a picture from the perspective of the user, the obtained scene image only contains the building, and if the picture is taken from the third viewing angle, the obtained scene image contains the user.

S204: according to the scene information, determining a target scene matched with the scene information from a preset scene library;

specifically, a preset scene library is prestored in the electronic device, and the scene library can be distinguished according to the real objects contained in each scene, for example, a natural environment containing an eiffel tower, a classroom containing scenes of blackboards, tables and chairs, a restaurant containing scenes of dining table and tableware real objects, and the like. In specific implementation, a matched target scene may be determined from a preset scene library according to scene information, for example, the target scene may be determined according to a matching degree between the scene information and pictures of the preset scene library, the target scene may also be determined according to a matching degree between a real object included in the scene information and a real object included in each preset scene, the feature of the scene information may also be extracted, and then the target scene may be determined through a neural network model.

S206: determining first statement information according to the target scene, and outputting the first statement information through an information output device;

for each scene, there may be one or more corresponding preset sentence information, for example, for a natural scene, the sentence may include questions such as "what building this is," what country's building this is, "which year this building starts," or may include a statement such as "this building is beautiful" to start a session between the mobile terminal and the user.

After the target scene is determined, the first sentence information may be randomly determined from the plurality of sentence information corresponding to the target scene, or the best matching first sentence information may be determined from the plurality of sentences according to the previous conversation characteristics and conversation habits of the user. The first sentence information includes a first language and a first semantic.

S208: and carrying out voice detection on the user through the voice detection device, and continuously outputting the language interaction information to the user through the information output device according to a detection result.

After receiving the first sentence, the user outputs corresponding voice information, the mobile terminal detects the voice information output by the user through the voice detection device, and then the subsequent sentence is determined based on the voice information. It should be noted that the user may also not output the voice information for a special reason, for example, the user cannot understand or hear the voice information clearly, at this time, the detection result is that the user does not output the voice information, and then according to the detection result, corresponding measures may be adopted, for example, reducing the difficulty of the first sentence, selecting another sentence as the first sentence, or repeating the first sentence, so as to achieve the effect of exciting the user to perform the language interaction.

The language interaction processing method provided by the embodiment of the application comprises the steps of firstly obtaining scene information of a user, determining a target scene according to the scene information, then determining first statement information according to the target scene, outputting the first statement information to the user, detecting voice information output by the user, and further performing language interaction with the user according to a detection result. In the application, language learning is automatically started according to the scene where the user is located, the learning content is changed along with the change of the scene where the user is located, and a given teaching template is not used for outputting sentence information, so that the user is personally on the scene in the learning process.

In order to enable a user to deeply combine learning content with an environment where the user is located in a learning process and improve learning experience of the user, on the basis of the embodiment shown in fig. 2, an embodiment of the present application further provides another language interaction processing method, which focuses on describing a process of determining a target scene according to a scene where the user is located, and as shown in fig. 3, the method specifically includes the following steps:

s302: acquiring scene information of a user through a shooting device;

s304: extracting target object information of a target object from the scene information;

the target object information includes the type of the target object and the number of the target objects, for example, a scene is a classroom scene, and there are 20 desks, 1 blackboard and 20 pupils in the classroom scene, then the target object information corresponding to the scene includes: the first target object is 20 desks, the second target object is a blackboard, the number of the first target object is 1, the third target object is a pupil, the number of the third target object is 20, and a set formed by the first target object, the second target object, the third target object and the number of the third target object is target object information.

S306: selecting a preset scene matched with the target object information from a preset scene library as a target scene;

the target scene can be selected by comparing objects contained in a plurality of alternative preset scenes with the target object, or by comparing the alternative preset scenes with scene information of the user as a whole. Thus, in some examples, the target scene may be determined by the following steps L31-L32:

l31: matching an object contained in each preset scene in a preset scene library with target object information to obtain a similarity score corresponding to the preset scene;

l32: and determining the preset scene with the similarity score higher than the similarity threshold value as the target scene.

For example, the scene information of the user is determined to include an object 1 (multiple tables), an object 2 (server student), and an object 3 (stage) through feature recognition, and then a plurality of preset scenes are included in the preset scene library, for example, the scene 1 (campus including teaching building, dormitory, playground), the scene 2 (restaurant including tables, server students, and multiple guests for eating), and the scene 3 (concert hall including stage, multiple rows of seats, and actors) are included, and then the matching degree between the object included in the scene information of the user and the object included in the scene 2 (restaurant) is finally determined to be the highest through comparison of the objects, so that the scene of the user can be determined to be the restaurant.

In actual application, for the similarity score corresponding to each preset scene, similarity determination may be performed on the scene where the user is located and the objects in the preset scene, and finally the similarity score of the preset scene is the sum of the similarity scores of the objects included in the preset scene. And the identified object of the scene where the user is located and the object contained in each preset scene can be input into a pre-trained neural network, so that the similarity score of each preset scene can be directly obtained. The method for identifying the object and determining the preset scene similarity score in the embodiment of the application is not particularly limited.

In other examples, the target scene matching the target object information may also be determined by a neural network model that determines the degree of scene matching.

The method provided by the embodiment of the application can accurately identify the scene where the user is located, and provides more targeted learning content and dialogue content by combining the scene, so that the learning process of the user is not boring any more, the interest of learning is improved, a more real language learning environment is provided for the user, and the language learning effect is improved.

After the target scene is acquired, the first statement information needs to be actively output according to the target scene, so on the basis of the foregoing embodiment, the embodiment of the present application further provides another language interaction processing method, which focuses on describing how to determine the first statement information according to the target scene, and as shown in fig. 4, the method includes:

s402: acquiring scene information of a user through a shooting device;

s404: according to the scene information, determining a target scene matched with the scene information from a preset scene library;

s406: determining a target keyword according to a target scene;

more than one piece of information is often included in one scene, and accordingly, a plurality of target keywords are corresponding to the target scene.

S408: determining first statement information according to the target keyword;

according to the plurality of target keywords, the target keywords can be simply rearranged and combined to obtain one sentence information, and based on the sentence information, the target keywords can be combined to generate the first sentence information corresponding to the target keywords. Specifically, a sentence generating method in the related art may be referred to by a way that a plurality of words form a sentence, which is not limited in this application.

In other possible embodiments, the first sentence matching the target keyword may be further determined from a preset semantic library through a semantic recognition neural network. The preset semantic library is stored in the mobile terminal, or is obtained by the mobile terminal, for example, one or more semantic libraries are obtained from the server by the mobile terminal.

S410: and carrying out voice detection on the user through the voice detection device, and continuously outputting the language interaction information to the user through the information output device according to a detection result.

The method provided by the embodiment of the application can actively generate the first sentence in a targeted manner according to the target scene, so that the language interaction with the user can be effectively developed, and the problem of large learning direction deviation caused by irrelevant information input by the user and learning content is avoided.

After outputting the first sentence to the user, in some possible embodiments, the user outputs corresponding language information according to the first sentence, and for the language information output by the user, the subsequent interactive content may be further determined and determined according to the language information, and the following L41-L45 are specific determination methods:

l41: determining a statement type represented by the first statement information; wherein the statement types include: question sentences and statement sentences;

further, first feedback statement information corresponding to the statement type of the first statement information can be searched from a semantic library corresponding to the target scene; the semantic library is stored in the mobile terminal or acquired from a server through the mobile terminal.

L42: and judging whether the semantic type is an interrogative sentence, if so, executing L43-L44, and otherwise, executing steps L45-L47.

L43: determining an answer sentence corresponding to the first sentence information from a semantic library;

l44: first feedback sentence information is determined from the answer sentence.

In some examples, the answer sentence corresponding to the first sentence information may be determined from a semantic library, in particular, by a semantic recognition neural network.

L45: identifying a keyword in the first statement information;

l46: determining auxiliary knowledge information corresponding to the keywords from a semantic library;

l47: first feedback sentence information corresponding to the first semantics of the first sentence information is generated according to the auxiliary knowledge information.

How to determine the first feedback statement information corresponding to the first statement information is exemplified below with reference to a practical application scenario.

The user wants to start language learning under the eiffel tower, and opens the electronic device provided with any one of the above methods of the embodiments of the present application, and the electronic device recognizes that the scene where the user is located is the eiffel tower of the famous building from the image taken from the user perspective, and further, the electronic device further detects that the user speaks in english through the voice detection device: "What is it", the electronic device first judges that the language learning that the user wants to do is english, and recognizes that the semantic meaning of the sentence is: what this is. The electronic equipment determines what the Eiffel tower needs to be explained for the question sentence through a language identification method, acquires the brief introduction of the Eiffel tower from a knowledge base pre-stored in the electronic equipment or a server through a network, and outputs the brief introduction to a user in English.

In some other scenarios, the detecting device detects that the sentence output by the user is: "It is beatuiful", the electronic device first judges that the sentence type output by the user is a statement sentence, and recognizes that the semantic meaning of the sentence is: is too beautiful. Further, according to the keywords "eiffel tower", "beauty", the construction history of the eiffel tower, some poems and the like with respect to depicting the eiffel tower are determined in a knowledge base or from a server through a network, and output to the user.

By the method provided by the embodiment of the application, the user can experience the learning process in an immersive manner, and as the dialogue content in the learning process is fully fused with the target scene and the object information in the target scene, the user can have a deeper impression on the learning content of the target scene, the learning content can be mastered more efficiently, and the school efficiency is improved.

In some possible embodiments, if the user does not reply for a long time, or the replied sentence indicates that the first sentence is not understood, the step of continuing to output the language interaction information to the user through the information output device according to the detection result may specifically include:

l51: judging whether the detection result represents that the difficulty of the first statement information is too high;

l52: if yes, determining second statement information according to the target scene, and outputting the second statement information through an information output device; and the difficulty corresponding to the second statement information is lower than that corresponding to the first statement information.

Specifically, when any one of the following conditions is met, it is determined that the difficulty of the detection result representing the first statement information is too high: the detection result is that the voice information output by the user is not detected; voice information output by the user and contained in the detection result represents that the user does not understand the voice information; the correlation degree between the semantic meaning corresponding to the voice information output by the user and contained in the detection result and the first semantic meaning contained in the first sentence is lower than the correlation degree threshold value.

In some examples, it is determined that the difficulty level corresponding to the second sentence information is lower than the difficulty level corresponding to the first sentence information when any one of the following conditions is satisfied: the speed of speech of the second statement information is lower than that of the first statement information; the vocabulary contained in the second sentence information is smaller than the vocabulary contained in the first sentence information; the number of the uncommon words contained in the second sentence is less than that contained in the first sentence.

In some possible embodiments, if the difficulty of outputting the sentence is continuously reduced and the user still cannot understand the sentence, the interactive learning may be changed to the scene introduction learning, specifically, when the difficulty corresponding to the second sentence information is lower than a preset difficulty threshold, the scene introduction information corresponding to the target scene is determined from a preset semantic library, and the scene introduction information is output in a voice and/or text manner; the preset semantic library is stored in the mobile terminal or acquired through the mobile terminal.

In some scenarios, the user may not output the sentence for a long time, for example, the user forgets to speak, or the user wishes to passively learn, based on which, the language interaction processing method in the embodiment of the present application may also automatically search the semantic library for the sentence related to the target scenario to output to the user, and specifically, the following method may be included:

(1) detecting first statement information within a preset time;

(2) determining third sentence information corresponding to the target scene from the semantic library according to the target scene; the semantic library is stored in the mobile terminal or acquired from the server through the mobile terminal;

the third sentence information may be a question issued for an object in the target scene, or a related knowledge introduction based on the target scene, or the like.

(3) And outputting the third sentence information by voice and/or words.

For example, a preset time may be 3 minutes, and when the user does not output any voice information within 3 minutes after starting the language interaction processing function in the electronic device, the method provided in the embodiment of the present application may automatically detect the eiffel tower included in the target object, and send a question to the user based on the object information: "What is it", or output to the user an English introduction about the Eiffel Tower. It can be understood that the output of the third sentence information may be voice, or may be in a text form, or may be that the text information is synchronously output on the screen while the voice is output, and the chinese translation for the english text information is performed, so as to further deepen the learning impression of the user and improve the learning effect.

In order to further improve the learning effect of the user, specific teaching contents such as word reciting, grammar explanation and the like can be output to the user aiming at the target scene so as to deepen the impression of the user and improve the learning efficiency. Accordingly, the above method may further comprise:

(1) according to the target scene, determining a target scheduled course matched with the target scene from a scheduled course library; the scheduled curriculum library is stored in the mobile terminal or acquired through the mobile terminal;

(2) the user is provided with an interface to initiate a targeted lesson to enable the user to begin language learning in accordance with the targeted lesson.

In specific application, auxiliary courses corresponding to each preset scene can be prestored in the electronic equipment, and for each preset scene, auxiliary courses at different levels can be set.

Furthermore, learning contents such as vocabularies and grammars learned by the user in the target scene can be recorded, and the learning contents are stored in the electronic equipment or sent to the server side through the electronic equipment to be stored, so that learning tracking contents such as error-prone word banks, new word banks and new grammars can be generated. By continuously tracking the target scenes which are not used, more systematic learning experience can be provided for the user, and the learning effect of the user is improved.

The method provided by the embodiment of the application is applied to a mobile terminal, wherein the mobile terminal can store a plurality of learning platform interfaces, each interface corresponds to a learning platform provider, and specifically, the method further comprises the following steps:

(1) providing a learning platform selection interface for the user so that the user can select learning materials of different platforms;

(2) and responding to the selection operation of the user on a first learning platform in the learning platforms, and providing the learning materials corresponding to the first learning platform for the user.

For example, a training school provides an interface a, a tutor provides an interface B, a foreign language learning station provides an interface C, and when a user opens the mobile terminal, the user can browse A, B and C in the interfaces, so that one of the platforms can be selected for learning.

For convenience of understanding, another language interaction processing method provided by the embodiment of the present application is described below with reference to an actual application scenario:

step 1: acquiring environment information of a user;

by performing video processing analysis on the image of the subject matter (including the technologies such as preprocessing technology, edge detection technology and image segmentation technology), recording the characteristic attributes such as the category, color, shape and name of the image of the subject matter, identifying the recorded characteristic attributes of the subject matter by an image conversion management system (including the technical means such as a moving object detection method), converting the characteristic attributes into a computer language and uploading the computer language to an application scene library,

step 2: converting the environment information into environment characteristics;

identifying identical application scenes from an application scene library (including application scenes intelligently generated through a deep learning system)

And step 3: selecting an application scene;

the environment characteristics are output to a natural language generating system, and scenes generated by the images are output to the user side by different languages through the natural language generating system and the voice converting system. I.e. the first sentence is output.

And 4, step 4: collecting user output language and determining an intelligent scene;

and the language output by the user is collected through the voice collection equipment, the collected language is understood, analyzed and dialog managed through the language understanding and dialog management system, and intelligent scene generation is performed through the application scene library.

And 5: completing voice interaction with the user;

and further, completing the interaction of the user through the voice conversion system, wherein if the user can not reply to the intelligent scene of the voice system, the system automatically generates related auxiliary knowledge explanation to assist the user to complete the whole learning of the language system.

Step 6: providing an auxiliary teaching system;

the system also comprises an auxiliary teaching system which combines the related content of the theme things after identifying the environmental theme things, and comprises the pertinence auxiliary course conversation of related language words, grammar and the like, wherein the application scene library not only stores a large number of application scenes for language learning, but also stores the teaching content of a large number of language courses, the related courses are fed back to the user through the technologies of deep learning, intelligent detection matching and the like, the learning content of the user is well recorded, an error-prone word library, a new grammar library and the like are automatically generated, the contents are converged into daily communication and are subjected to daily reinforced training, meanwhile, a language word family spectrum of the user is generated by utilizing a big data technology, the learning summary of each stage is fed back to the user, and the user learning is more systematic.

Based on the foregoing method embodiment, an embodiment of the present application further provides a language interaction processing apparatus, as shown in fig. 5, where a shooting module and a voice detection module are configured in the apparatus, and the apparatus includes:

an information obtaining module 502, configured to obtain scene information of a user through a shooting module; the scene information is a scene obtained under a first visual angle of a user;

a target scene determining module 504, configured to determine, according to the scene information, a target scene matching the scene information from a preset scene library;

a first statement information determining module 506, configured to determine first statement information according to the target scene, and output the first statement information through the information output device;

and the output module 508 is configured to perform voice detection on the user through the voice detection device, and continue to output the language interaction information to the user through the information output device according to the detection result.

The language interaction processing device provided by the embodiment of the application firstly obtains scene information of a user, determines a target scene according to the scene information, then determines first statement information according to the target scene, outputs the first statement information to the user, detects voice information output by the user, and further performs language interaction with the user according to a detection result. In the application, language learning is automatically started according to the scene where the user is located, the learning content is changed along with the change of the scene where the user is located, and a given teaching template is not used for outputting sentence information, so that the user is personally on the scene in the learning process.

The process of determining the target scene matched with the scene information from the preset scene according to the scene information includes: extracting target object information of a target object from the scene information; the target object is an identifier included in a scene; and selecting a preset scene matched with the target object information from a preset scene library as a target scene.

The target object information comprises the type of the target object and the number of the target objects; the process of selecting the preset scene matched with the target object information from the preset scene library as the target scene includes: matching an object contained in each preset scene in a preset scene library with target object information to obtain a similarity score corresponding to the preset scene; and determining the preset scene with the similarity score higher than the similarity threshold value as the target scene.

The process of selecting the preset scene matched with the target object information from the preset scene library as the target scene includes: and determining a target scene matched with the target object information through a neural network model for determining the scene matching degree.

The process of determining the first statement information according to the target scenario includes: determining a target keyword according to a target scene; and determining first statement information according to the target keyword.

The target keywords are multiple; the process of determining first statement information according to the target keyword includes: and combining the target keywords to generate first statement information corresponding to the target keywords.

The process of determining the first sentence information according to the target keyword includes: determining a first sentence matched with a target keyword from a preset semantic library through a semantic recognition neural network; the preset semantic library is stored in the mobile terminal or acquired through the mobile terminal.

The process of continuously outputting the language interaction information to the user through the information output device according to the detection result comprises the following steps: judging whether the detection result represents that the difficulty of the first statement information is too high; if yes, determining second statement information according to the target scene, and outputting the second statement information through an information output device; and the difficulty corresponding to the second statement information is lower than that corresponding to the first statement information.

When any one of the following conditions is met, determining that the difficulty of the detection result representing the first statement information is too high: the detection result is that the voice information output by the user is not detected; voice information output by the user and contained in the detection result represents that the user does not understand the voice information; the correlation degree between the semantic meaning corresponding to the voice information output by the user and contained in the detection result and the first semantic meaning contained in the first sentence is lower than the correlation degree threshold value.

When any one of the following conditions is met, determining that the difficulty corresponding to the second statement information is lower than the difficulty corresponding to the first statement information: the speed of speech of the second statement information is lower than that of the first statement information; the vocabulary contained in the second sentence information is smaller than the vocabulary contained in the first sentence information; the number of the uncommon words contained in the second sentence is less than that contained in the first sentence.

The above-mentioned device still includes: the scene introduction information determining module is used for determining scene introduction information corresponding to the target scene from a preset semantic library when the difficulty corresponding to the second sentence information is lower than a preset difficulty threshold, and outputting the scene introduction information in a voice and/or text mode; the preset semantic library is stored in the mobile terminal or acquired through the mobile terminal.

The above-mentioned device still includes: the scheduled course determining module is used for determining a target scheduled course matched with the target scene from a scheduled course library according to the target scene; the scheduled curriculum library is stored in the mobile terminal or acquired through the mobile terminal; and the scheduled course interface providing module is used for providing an interface for starting the target scheduled course for the user so as to enable the user to start language learning according to the target scheduled course.

The above-mentioned device still includes: the selection interface providing module is used for providing a learning platform selection interface for a user so that the user can select learning materials of different platforms; and the learning material providing module is used for responding to the selection operation of the user on a first learning platform in the learning platforms and providing the learning material corresponding to the first learning platform for the user.

The implementation principle and the generated technical effect of the language interaction processing device provided in the embodiment of the present application are the same as those of the foregoing method embodiment, and for the sake of brief description, no mention is made in the embodiment of the foregoing device, and reference may be made to the corresponding contents in the foregoing language interaction processing method embodiment.

An electronic device is further provided, as shown in fig. 6, which is a schematic structural diagram of the electronic device, where the electronic device includes a processor 1501 and a memory 1502, the memory 1502 stores computer-executable instructions that can be executed by the processor 1501, and the processor 1501 executes the computer-executable instructions to implement the language interaction processing method.

In the embodiment shown in fig. 6, the electronic device further comprises a bus 1503 and a communication interface 1504, wherein the processor 1501, the communication interface 1504 and the memory 1502 are connected by the bus 1503.

The Memory 1502 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is implemented through at least one communication interface 1504 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like may be used. The bus 1503 may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 1503 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one double-headed arrow is shown in FIG. 6, but that does not indicate only one bus or one type of bus.

Processor 1501 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 1501. The Processor 1501 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and the processor 1501 reads information in the memory, and completes the steps of the language interaction processing method of the foregoing embodiment in combination with hardware thereof.

The embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are called and executed by a processor, the computer-executable instructions cause the processor to implement the above language interaction processing method, and specific implementation may refer to the foregoing method embodiment, and is not described herein again.

The language interaction processing method, the language interaction processing apparatus, and the computer program product of the electronic device provided in the embodiments of the present application include a computer-readable storage medium storing program codes, where instructions included in the program codes may be used to execute the method described in the foregoing method embodiments, and specific implementations may refer to the method embodiments and are not described herein again.

Unless specifically stated otherwise, the relative steps, numerical expressions, and values of the components and steps set forth in these embodiments do not limit the scope of the present application.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In the description of the present application, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present application. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A language interaction processing method is applied to a mobile terminal, wherein a shooting device, a voice detection device and an information output device are arranged in the mobile terminal, and the language interaction processing method comprises the following steps:

acquiring scene information of a user through the shooting device; the scene information is a scene obtained under a first visual angle of the user;

according to the scene information, determining a target scene matched with the scene information from a preset scene library;

determining first statement information according to the target scene, and outputting the first statement information through the information output device;

carrying out voice detection on the user through the voice detection device, and continuously outputting language interaction information to the user through the information output device according to a detection result;

the step of determining a target scene matched with the scene information from preset scenes according to the scene information comprises the following steps: extracting target object information of a target object from the scene information; wherein the target object is an identifier included in the scene; and selecting a preset scene matched with the target object information from a preset scene library as a target scene.

2. The language interaction processing method according to claim 1, wherein the target object information includes a type of the target object and a number of the target objects;

the step of selecting a preset scene matched with the target object information from a preset scene library as a target scene comprises the following steps:

matching the object contained in each preset scene in the preset scene library with the target object information to obtain a similarity score corresponding to the preset scene;

and determining the preset scene with the similarity score higher than a similarity threshold value as a target scene.

3. The method for processing language interaction according to claim 1, wherein the step of selecting a preset scene matching the target object information from a preset scene library as the target scene comprises:

and determining a target scene matched with the target object information through a neural network model for determining scene matching degree.

4. The language interaction processing method according to claim 1, wherein the step of determining first sentence information from the target scene comprises:

determining a target keyword according to the target scene;

and determining first statement information according to the target keyword.

5. The language interaction processing method according to claim 4, wherein the target keyword is plural;

determining the first sentence information according to the target keyword, including:

and combining the target keywords to generate first statement information corresponding to the target keywords.

6. The language interaction processing method according to claim 4, wherein the step of determining the first sentence information based on the target keyword comprises:

determining a first sentence matched with the target keyword from a preset semantic library through a semantic recognition neural network; and the preset semantic database is stored in the mobile terminal or is acquired by the mobile terminal.

7. The method for processing language interaction according to claim 1, wherein the step of continuing to output language interaction information to the user through the information output device according to the detection result comprises:

judging whether the detection result represents that the difficulty of the first statement information is too high;

if yes, determining second statement information according to the target scene, and outputting the second statement information through the information output device; and the difficulty corresponding to the second statement information is lower than the difficulty corresponding to the first statement information.

8. The method of claim 7, wherein it is determined that the detection result indicates that the first sentence information is too difficult when any of the following conditions is satisfied:

the detection result is that the voice information output by the user is not detected;

voice information output by the user and contained in the detection result represents that the user does not understand the voice information;

the correlation degree between the semantic corresponding to the voice information output by the user and the first semantic contained in the first sentence contained in the detection result is lower than a correlation degree threshold value.

9. The language interaction processing method according to claim 7, wherein it is determined that the difficulty level corresponding to the second sentence information is lower than the difficulty level corresponding to the first sentence information when any one of the following conditions is satisfied:

the speed of speech of the second statement information is lower than that of the first statement information;

the vocabulary contained in the second statement information is smaller than the vocabulary contained in the first statement information;

the number of the uncommon words contained in the second sentence is smaller than that contained in the first sentence.

10. The language interaction processing method according to claim 7, further comprising:

when the difficulty corresponding to the second statement information is lower than a preset difficulty threshold, determining scene introduction information corresponding to the target scene from a preset semantic library, and outputting the scene introduction information in a voice and/or text mode; and the preset semantic database is stored in the mobile terminal or is acquired by the mobile terminal.

11. The language interaction processing method according to claim 1, further comprising:

according to the target scene, determining a target scheduled course matched with the target scene from a scheduled course library; the preset curriculum library is stored in the mobile terminal or acquired by the mobile terminal;

providing an interface to the user to initiate the target scheduled lesson to enable the user to begin language learning in accordance with the target scheduled lesson.

12. The language interaction processing method according to any one of claims 1 to 11, wherein the language interaction processing method further comprises:

providing a learning platform selection interface for the user so that the user can select learning materials of different platforms;

and responding to the selection operation of the user on a first learning platform in the learning platforms, and providing the learning materials corresponding to the first learning platform for the user.

13. A language interaction processing device is characterized in that a shooting module and a voice detection module are arranged in the language interaction processing device, and the language interaction processing device comprises:

the information acquisition module is used for acquiring scene information of a user through the shooting module; the scene information is a scene obtained under a first visual angle of the user;

the target scene determining module is used for determining a target scene matched with the scene information from a preset scene library according to the scene information;

the first statement information determining module is used for determining first statement information according to the target scene and outputting the first statement information through an information output device;

the output module is used for carrying out voice detection on the user through a voice detection device and continuously outputting language interaction information to the user through the information output device according to a detection result;

wherein the target scenario determination module is further configured to: extracting target object information of a target object from the scene information; wherein the target object is an identifier included in the scene; and selecting a preset scene matched with the target object information from a preset scene library as a target scene.

14. An electronic device comprising a processor and a memory, the memory storing computer-executable instructions executable by the processor, the processor executing the computer-executable instructions to implement the language interaction processing method of any one of claims 1-12.

15. A computer-readable storage medium having stored thereon computer-executable instructions that, when invoked and executed by a processor, cause the processor to implement the language interaction processing method of any of claims 1-12.