CN117036150A

CN117036150A - Image acquisition method, device, electronic equipment and readable storage medium

Info

Publication number: CN117036150A
Application number: CN202310596814.5A
Authority: CN
Inventors: 卞乐强
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2023-05-24
Filing date: 2023-05-24
Publication date: 2023-11-10

Abstract

The application discloses an image acquisition method, an image acquisition device, electronic equipment and a readable storage medium, and belongs to the technical field of artificial intelligence. The method comprises the following steps: acquiring scene description information input by a user; at least one scene image is output based on the scene description information, the user preference information, and the at least one model.

Description

Image acquisition method, device, electronic equipment and readable storage medium

Technical Field

The application belongs to the technical field of artificial intelligence, and particularly relates to an image acquisition method, an image acquisition device, electronic equipment and a readable storage medium.

Background

Currently, during the use of an electronic device, the following scenario may occur: there are some shooting scenes, but the user has not yet gone, but the user wants a picture under the shooting scene, so that the user can only obtain the picture by means of repairing the picture.

For example, a user downloads a picture on a network, the picture includes the shooting scene, the user uses professional image editing software to extract the user's own image from other pictures, and then uses software to synthesize the user's own image with the network picture, and the synthesized picture is a picture that the user wants to obtain. In the above-described process, the user also needs to be skilled in grasping the use skill of the image editing software.

In the prior art, a user needs to perform complicated picture repair operation to obtain a picture which is desired by the user and is not shot.

Disclosure of Invention

The embodiment of the application aims to provide an image acquisition method, which can acquire a scene image by combining preference information of a user and at least one model only according to scene description information input by the user, does not need professional image editing software to edit the image, and improves the convenience of acquiring the image expected by the user.

In a first aspect, an embodiment of the present application provides an image acquisition method, including: acquiring scene description information input by a user; at least one scene image is output based on the scene description information, the user preference information, and the at least one model.

In a second aspect, an embodiment of the present application provides an image acquisition apparatus, including: the acquisition module is used for acquiring scene description information input by a user; and the processing module is used for outputting at least one scene image based on the scene description information, the user preference information and the at least one model.

In a third aspect, an embodiment of the present application provides an electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the method as described in the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which when executed by a processor perform the steps of the method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and where the processor is configured to execute a program or instructions to implement a method according to the first aspect.

In a sixth aspect, embodiments of the present application provide a computer program product stored in a storage medium, the program product being executable by at least one processor to implement the method according to the first aspect.

Thus, in the embodiment of the application, during the process of using the electronic device by the user, the user preference information is collected, so that when the user inputs the scene description information, the scene description information is acquired, for example, a sentence spoken by the user, and at least one scene image is finally output through processing the information by at least one model in combination with the user preference information. Therefore, in the embodiment of the application, the user data analysis and the simple input of the user are combined to intelligently output the scene image expected by the user, the operation of the whole process is simple, the image is not required to be edited by professional image editing software, and the convenience of obtaining the image expected by the user is improved.

Drawings

FIG. 1 is a flow chart of an image acquisition method provided by some embodiments of the application;

FIG. 2 is a schematic display of an electronic device provided in some embodiments of the application;

FIG. 3 is a schematic display of an electronic device provided in some embodiments of the application;

FIG. 4 is a schematic display of an electronic device provided by some embodiments of the application;

FIG. 5 is a block diagram of an image acquisition apparatus provided by some embodiments of the application;

FIG. 6 is a schematic diagram of a hardware architecture of an electronic device provided by some embodiments of the application;

fig. 7 is a schematic diagram of a hardware structure of an electronic device according to some embodiments of the present application.

Detailed Description

The technical solutions of the embodiments of the present application will be clearly described below with reference to the accompanying drawings of the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which are obtained by a person skilled in the art based on the embodiments of the present application, fall within the scope of protection of the present application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

The execution subject of the image acquisition method provided by the embodiment of the application can be the image acquisition device provided by the embodiment of the application or the electronic equipment integrated with the image acquisition device, wherein the image acquisition device can be realized in a hardware or software mode.

The image acquisition method provided by the embodiment of the application is described in detail through specific embodiments and application scenes thereof with reference to the accompanying drawings.

Step 110: and acquiring scene description information input by a user.

Wherein the step is that the user actively inputs scene description information to trigger the output of at least one scene image in case that the user inputs the scene description information. The input manner of the scene description information may be a first operation, and the user input includes touch input made by the user on the screen, including but not limited to clicking, sliding, dragging, and the like. The user input may also be a blank input of the user, such as a gesture motion, a facial motion, etc., and further includes an input of the user to a physical key on the device, not limited to a press, etc. Moreover, the user input includes one or more inputs, where the plurality of inputs may be continuous or time-spaced.

For example, referring to fig. 2, the screen displays an interface of the album program, in which a thumbnail of a plurality of photos is displayed, and one thumbnail 201 is shown, and an entry for inputting scene description information is provided, such as a control 202, a text prompt such as "please input scene description information" is displayed on the control 202, and the user presses the control 202 for a long time, so as to say a sentence, thereby converting received voice content into text content, as scene description information, and specific text content includes: "watch sunrise clouds in the morning".

In some embodiments of the present application, the scene description information input by the user is text information. Wherein, the voice information can be converted into corresponding text information.

Step 120: at least one scene image is output based on the scene description information, the user preference information, and the at least one model.

The user preference information is information for describing user preference and user characteristics, and at least comprises the following two items: attribute information of the user and feature information of images liked by the user. Wherein the attribute information of the user includes, for example, the age, height, hobbies, etc. of the user. The feature information of the image includes, for example, a type, a photographed scene, a photographed object, a color, and the like.

In some embodiments of the present application, the album program pushes the network picture or the previously shot picture to the user at random, and displays a "like" option and a "dislike" option on the picture, and after clicking the "like" option, the user acquires the feature information such as the type of the picture, the shot scene, the shot object, the color, etc.

In some embodiments of the present application, the user gives a photo praise to the social circles, so as to obtain the feature information of the type, shooting scene, shooting object, color, etc. of the praise photos.

In some embodiments of the present application, a user captures, saves, downloads, and captures a picture of a screen, and obtains feature information such as a type of the picture, a captured scene, a captured object, and color.

In some embodiments of the present application, when the user searches for the picture by searching for the keyword, the keyword may be obtained as a type of picture that the user likes.

Thus, based on the above two items of information, at least one scene image is output by reusing at least one model.

In an embodiment of the application, at least one model comprises all models involved in outputting the image of the scene throughout.

Where the scene image is an image that the user has not yet taken, but is expected to be taken in the future.

In some embodiments of the present application, when a user expresses that he wants a scene image by inputting scene description information such as "marine sunrise", the scene description information is combined with user preference information, a marine sunrise scene image is output through at least one model, and in the scene image, a marine sunrise scene described by the user is presented, and at the same time, the scene image is processed according to personal characteristics of the user, for example, according to colors liked by the user.

In an image acquisition method of another embodiment of the present application, the at least one model includes an image tag acquisition model;

in an embodiment of the present application, before step 120, the method further includes:

step A1: at least two reference images are input into the image tag acquisition model, and at least two groups of image feature tags of the at least two reference images are output.

In an embodiment of the application, a database of reference images is built prior to outputting the scene images, in which database a large number of reference images are stored, which reference images need to be input into an image tag acquisition model, so that image feature tags of the reference images are output based on the model.

Wherein a reference image corresponds to a set of image feature labels, the set of image feature labels including at least one image feature label.

In the embodiment of the application, the image feature tag of the reference image is a text tag, and one image feature tag is used for representing one image feature of the reference image, and is not limited to the feature of the image content, the name of the image content, the number of the same image content, the color of the image and the type of the image.

For example, a reference image is presented as a sea, and the output set of image feature labels includes: "sea", "landscape", "soft", etc.; a reference image is obtained from a screenshot of a document, the content of the document is related to an algorithm, and a set of image feature labels are output, wherein the image feature labels comprise: "algorithm" and the like; a reference image is a person working in a room, and the output set of image feature labels comprises: "character", "a boy", "manual", "indoor", etc.

In the embodiment of the application, when training images for establishing a database are prepared, the training images can be uploaded to a cloud server by album users or other ways, an evaluation system is built by the server, each training image is labeled manually, and the corresponding relation between each training image and labeled labels is generated and used for training an image label acquisition model. The image tag acquisition model adopts a convolutional neural network (Convolutional Neural Networks, CNN), and in the training process, a marked training image data set is used for supervised learning of the model, so that the accuracy of the model is improved, and meanwhile, some data enhancement and regularization technologies are used for preventing the model from being over-fitted. Further, feature extraction is performed on at least two input reference images by using a trained image tag acquisition model, so as to obtain a feature vector of each reference image, wherein the feature vector is commonly called as a 'visual vector'. And finally, storing the visual vector corresponding to each reference image into an inverted index table. Wherein the feature vector is herein referred to as an image feature tag.

In an embodiment of the present application, at least two reference images need to be input into the image tag acquisition model to output image feature tags of the reference images before outputting the scene images. The image feature labels are used for describing features of the reference images in a generalized mode, so that the corresponding reference images are obtained based on comparison between the image feature labels and the scene labels in the later period, and the scene images are output by the aid of the reference images.

In a flow of an image acquisition method of another embodiment of the present application, at least one model includes a preference tag acquisition model and a scene tag acquisition model;

step 120 in the embodiment of the present application includes sub-steps B1, B2 and B3.

Substep B1: user preference information is input to the preference tag acquisition model, and at least one user preference tag is output.

In some embodiments of the present application, the preference tag acquisition model is a neural network model, and it is contemplated to employ a cyclic neural network (Recurrent Neural Network, RNN), long Short-Term Memory (LSTM), a pre-trained language representation (Bidirectional Encoder Representation from Transformers, BERT), or the like.

In the process of training the model, iteration of a plurality of rounds can improve generalization capability and accuracy of the model.

In some embodiments of the present application, a preference tag acquisition model is trained on the amount of data of a user, such that user preference tags for the user are output based on the model. When the data quantity of a certain user is small, the training result of the model is poor possibly due to data incompleteness, and at this time, a model can be trained for a large amount of user data of the same age group, so that the user preference label of the user can be output based on the model corresponding to the age group to which the user belongs.

The different models can be used for probing the user with which preference the user belongs to for a plurality of times in a mode of issuing the user preference label to the user, so that the models are continuously optimized through user feedback.

In some embodiments of the present application, the user preference tags output by the model provide a secure means of transmission and storage, while informing the user of which purpose, manner of processing and use, and access and control of rights the user data is used for, thereby processing the user data legally, normally and transparently.

In the whole process of acquiring, training and outputting user data, the privacy of the user needs to be protected, and the personal information safety of the user is ensured.

Substep B2: the scene description information and at least one user preference tag are input into a scene tag acquisition model, and at least one group of scene tags are output.

In an embodiment of the application, in the field Jing Biaoqian acquisition model, chat-generated pre-training transformations or other natural language processing techniques may be utilized to analyze the scene description information and user preference tags to output at least one set of scene tags.

In the chat generation type pre-training transformation procedure, a large-scale dialogue corpus is used for pre-training to learn language rules and context information in the dialogue. Currently, the corpus used for training chat and forming pre-training transformation programs mainly comprises an open-domain dialogue corpus and a special-domain dialogue corpus. A Beam Search (Beam Search) is a Search algorithm that can select a sequence that is generally most likely, with the guarantee that k options with the highest probability are selected per time step. In the chat-generating pre-training transformation program, a Beam Search algorithm is used to generate a response.

In some embodiments of the application, a set of scene tags may constitute a phrase that is used to describe a scene.

In some embodiments of the present application, rules may be preset to define a set of scene tags such that a set of scene tags may be used for comparison with image feature tags of a reference image.

For example, in the database, a reference image includes image feature tags whose total number of words is on average about ten, and the total number of words of a set of scene tags is set to be not more than ten.

In some embodiments of the present application, the scene tags are text tags, and one scene tag corresponds to one keyword.

In one example, the content of the scene description information includes: "weekend me wants to go to mountain, hoped to know a new heterogenous friend, eat a refined western meal at night, watch sunrise clouds in the morning", combine user preference labels such as "30 years", "frequent trip", "plane", etc., think about the following ten scenes: "climbing with a border and making a date, magic Huangshan clouds, romantic dinner in western dining room, catsup gathering and communicating, accompanying food and food in high mountain area, ancient deep lane folk custom style, enjoying warm sunlight intimate interaction, ten interior gallery walking, mysterious ancient village searching culture, low-altitude overlooking and beautiful Huangshan". Thus, for any scene, a set of scene tags can be split.

For example, for a scene of "ascending a knot with a surreptitious sunrise", a set of scene tags may be obtained including: "climbing", "limbed", "meeting", "sunrise".

Substep B3: and comparing the at least one group of scene tags with the at least two groups of image feature tags to obtain N groups of image feature tags, wherein the first number of identical tags existing between each group of the N groups of image feature tags and the at least one group of scene tags is larger than a first threshold value, and N is a positive integer.

In this step, a set of scene tags is compared with image feature tags of a large number of reference images to obtain a reference image that best fits the scene.

For example, a set of scene labels is compared with image feature labels of a reference image, identical labels in the two sets of labels are found, and the number of identical labels is counted.

In some embodiments of the present application, for a set of scene tags, a reference image with the largest number of identical tags may be obtained by comparing the image feature tags of a large number of reference images. Correspondingly, the first number is greater than a first threshold, which is related to the comparison result and is not a fixed value, and can be used to define the number to rank first.

In some embodiments of the present application, for a set of scene tags, a plurality of reference images with a greater number of identical tags may be obtained by comparing the image feature tags of the plurality of reference images. Correspondingly, the first number is greater than a first threshold, which is related to the comparison result and is not a fixed value, and can be used to define as the top three of the number ranking.

In some embodiments of the present application, for a set of scene tags, an unlimited number of reference images having the same number of tags greater than a first threshold may be obtained by comparing the image feature tags of a large number of reference images. Correspondingly, the first number is greater than a first threshold, which is a fixed value.

In some embodiments of the present application, at least one reference image is output for a set of scene tags.

Substep B4: and outputting N scene images according to N reference images corresponding to the N groups of image feature labels.

In this step, a scene image may be output based on the output of a reference image.

For example, for a reference image, image processing may be performed according to colors and the like that are liked by the user, thereby obtaining a scene image.

In the embodiment of the application, a user inputs ' weekend me wants to climb mountain ' and hopes to know a new opposite friend, a user eats a fine western style meal at night and views the sunrise's yunhai in the morning, and a scene is thought by combining user preference labels of users ' 30 years old ', ' male ', ' slight fat ', and the like: the method comprises the steps of (1) climbing a knot to make a sunrise, so that a reference image is found, wherein the reference image shows the back shadow of two opposite characters on the sea, and the sunrise exists at sea; further, in conjunction with the user preference tag, the back-shadow outline of the male character, etc. are fine-tuned, thereby outputting a scene image 301, see fig. 3.

In the embodiment of the application, firstly, combining user preference labels and scene description information, thinking that at least one shooting scene possibly appears is associated, and each shooting scene corresponds to a group of scene labels, so that based on the scene labels, the shooting scene is compared with image feature labels of a large number of reference images to obtain reference images which are more in line with the scene labels, and then the scene images are output according to the reference images. It can be seen that, according to this embodiment, by using the comparison between the labels, a suitable reference image can be obtained in the database, and after further performing the correlation processing, a scene image is output, so that an image that the user may be interested in can be output based on the scene description information input by the user.

In the flow of the image capturing method according to another embodiment of the present application, step B4 includes sub-steps C1, C2 and C3.

Substep C1: and acquiring target character information in the scene description information, wherein the target character information comprises M character names, and M is a positive integer.

In some embodiments of the application, the persona names include "mom," "best friend," "Zhang Mou," and the like.

Substep C2: under the condition that M=1, replacing the image content of a face area in each of the N reference images with a face image corresponding to the name of the person; where, in the case of m=1, each reference image includes only one face region.

In one case, a number tag, such as "one person", is output according to the number of person names included in the scene description information, and a comparison is made in the database to find a reference image also having "one person". Further, the image content of the face area in the reference image is replaced to be replaced by a face image corresponding to the name of the person in the scene description information.

Substep C3: under the condition that M is more than 1, the image content of each face area in the reference image is replaced by a face image corresponding to the name of the person with the same face characteristic as the face area in the reference image; wherein, in the case of M > 1, each reference image includes M face regions.

In another case, a number tag, such as "two persons", is output according to the number of person names included in the scene description information, and a comparison is made in the database to find a reference image also having "two persons". Further, the image content of the face area in the reference image is replaced to be replaced by a face image corresponding to the name of the person in the scene description information.

In the step, in the replacing process, according to the face image corresponding to each person name, a face area with the same face characteristic is found in the reference image, so that the image content of the face area is replaced by the face image corresponding to the person name.

For example, the name of the person comprises father and mom, two face areas are included in the reference image, and the face characteristics of the female can be acquired based on the face image of mom, so that the face area with the face characteristics of the female is found in the reference image for replacement; similarly, based on the 'father' face image, the face features of the male can be acquired, and a face region with the same face features of the male is found in the reference image for replacement.

In some embodiments of the present application, face images corresponding to person names are pre-scratched.

For example, in the album program, the user establishes an album, and defines the name of the album, which may be "dad", so as to extract the face image of each picture in the album, to find the face image with the highest occurrence frequency, and to pick out the face image as the face image of "dad".

In some embodiments of the present application, in order to protect user data, under the condition of user authorization, the classified images of the user are obtained to the cloud or locally to extract the face image. Before the face image is extracted, the approach required to inform the user of the use is to output the scene image, cannot be used as other illegal approaches, and is necessary to require the authorization of related persons or to locally detect whether the person is operating. Furthermore, the extracted face image needs to take encryption measures and be saved to a server or a device terminal, for example, encryption algorithm, access right limiting and other modes are adopted.

In some embodiments of the present application, a depth forging technique is used to replace the image content of the face region in the reference image with a face image corresponding to the name of the person.

In an embodiment of the application, the face image is replaced based on an image generation technology of open-source depth forgery.

Among them, the core principle of deep forgery is a deep learning based generation countermeasure network (Generative Adversarial Net, GANs) and self encoder (Autoencoder) technology, which combines a deep learning model and an image processing technology so that these technologies can seamlessly synthesize, modify or exchange facial features of a person.

In particular, the implementation using deep forgery comprises the following steps:

step one, collecting data, namely collecting dozens or hundreds of facial images of people in an album to serve as a training data set;

and step two, model training, namely providing the training data set for an algorithm model for training and generating an countermeasure network or a self-encoder. Through a large amount of training data, the model can learn to eliminate the facial features of the target person in the image and then replace the facial features with other facial features;

extracting facial features, namely extracting key information such as eyes, mouth, nose and the like from the face image of the target person by using a facial feature extraction technology;

step four, facial feature replacement, namely combining the extracted facial features with other sources by using a generated countermeasure network or a self-encoder, so that the facial features are perfectly fused with a target person after a certain image synthesis algorithm;

Step five, adjustment and blending, some adjustments and blending are also required to match the generated image to the rest of the environmental conditions at the final stage of image synthesis. For example, the generated facial features may be further adapted to better blend with the background.

Based on the depth forging technology, the face of one person can be replaced by the face of another person, so that the effect is vivid, and the video face can be replaced.

The depth forging technology can be used for enhancing color balance, exposure, contrast and the like by adjusting details in the image so that the image is clearer, smoother and more real, thereby realizing image enhancement processing.

For example, after the face is replaced, the image enhancement processing is performed, so that the replaced face is clearer and clearer. Wherein the image enhancement processes such as automatically supplementing brightness, contrast, color, etc. of the image.

In some embodiments of the present application, such a scene image may not be output at a later time if the subsequent user corrects the scene image more than twice, indicating that such a face replacement is disfavored.

In the embodiment of the application, the diversification of the image acquisition method is embodied, and the displayed scene image can be more vivid through the face replacement, so that the user is personally on the scene, and the application scene of the image display is also richer and creative.

In a further embodiment of the present application, the image acquisition method includes the steps of;

in an embodiment of the present application, after step 120, the method further includes steps D1 and D2.

Step D1: in the case of displaying a first scene image, a first input by a user to the first scene image is received.

Wherein the first input is for a user to operate on the first scene image to trigger output of the second scene image in combination with the first image. The first input may be a first operation including a touch input by a user on a screen including, but not limited to, a click, a swipe, a drag, etc. The first input may also be a blank input of the user, such as a gesture action, a facial action, etc., and further includes an input of the user to a physical key on the device, not limited to a press, etc. Moreover, the first input comprises one or more inputs, wherein the plurality of inputs may be continuous or time-spaced.

For example, the user inputs scene description information "go seaside to see sunrise", and a first scene image is displayed, the content of the first scene image being that the user sees sunrise at seaside, so that the user can be encouraged to take a picture at seaside; further, after the user triggers to display the first scene image, if the first scene image is saved in advance and then is called out, the first scene image is displayed with a shooting option, such as an icon of a camera, the user clicks the icon, a shooting preview interface is displayed, the user shoots an image, namely the first image, so that a second scene image is displayed, and the second scene image is formed by splicing the first scene image and the first image.

Step D2: and responding to the first input, performing image stitching on the first scene image and the first image shot by the camera, and outputting a second scene image.

In some embodiments of the present application, the first scene image and the first image are stitched in a certain arrangement order, so as to obtain the second scene image.

In some embodiments of the application, the user may send the second scene image to a chat group, posting to a social circle, or the like.

In some embodiments of the application, automatically generated special effects, such as watermark protection, are also included in the second scene image.

In some embodiments of the present application, user motivational measures, such as bonus points, medals, coupons, etc., are set to motivate users to take photos, punch cards, and share photos.

In the embodiment of the application, an application mode of scene images is provided, so that the shooting function and the picture repairing function of the electronic equipment can be enhanced, the social contact between users can be indirectly enhanced, the users can share the second scene images to other people, and the users can also participate in a match or create a unique visual story, so that the electronic equipment plays a role in social contact and interpersonal relationship.

In the flow of the image acquisition method according to another embodiment of the present application, after step 120, the method further includes steps E1, E2 and E3.

Step E1: at least one scene image is displayed.

In some embodiments of the application, the at least one scene image is output in the form of a list.

For example, at least one scene image arranged in sequence is displayed on a screen.

Step E2: a second input is received for a third scene image of the at least one scene image.

The second input is used for operating the third scene image by a user, marking the preference degree of the third scene image as a target preference degree, and updating the display sequence of the scene images of the same type as the third scene image in at least one scene image. The second input may be a first operation, including a touch input by a user on the screen, including but not limited to a click, a swipe, a drag, etc. The second input may also be a blank input of the user, such as a gesture action, a facial action, etc., and further includes an input of the user to a physical key on the device, not limited to a press, etc. Moreover, the second input comprises one or more inputs, wherein the plurality of inputs may be continuous or time-spaced.

For example, referring to fig. 4, on the third scene image 401, a "like" option 402 is provided, and an icon corresponding to the "like" option 402 is a white background, and the user clicks the "like" option 402, and the icon corresponding to the "like" option 402 changes from the white background to the red background.

In some embodiments of the present application, the second input may also be that the user clicks on the "favorites" option, the user clicks on the "forwards" option, the user controls the screen to remain displayed for a duration of one minute for the third scene image.

Step E3: in response to the second input, marking the preference level of the third scene image as the target preference level and updating a display order of the scene images of the same type as the third scene image in the at least one scene image.

In an embodiment of the present application, the preference of the user for the third scene image includes at least one preference, which is used to indicate that the user likes the third scene image, i.e. the target preference.

For example, referring to fig. 3, if the icon corresponding to the "like" option 402 changes from a white background to a red background, marking the preference of the third scene image as the target preference is completed.

In some embodiments of the present application, if the user indicates a preference for the third scene image via the second input, the other scene images of the same type may be adjusted to be preferentially presented to the user.

In some embodiments of the application, the labels of the scene images include image feature labels of related reference images and scene labels of related groups, and when the user likes the third scene image, the scene images having the same label as the third scene image, i.e., the same type of scene image, can be found to adjust the scene images to be preferentially presented to the user. The display order may be adjusted in the order of the number of the same labels from more to less, that is, the more the number of the same labels is, the more the display order is.

For example, the current display order is: third scene image, fourth scene image, fifth scene image, sixth scene image, present display third scene image, after the user indicates like third scene image, have three same labels between fifth scene image and the third scene image, have four same labels between sixth scene image and the third scene image to adjust the display order as: and displaying the sixth scene image after clicking the 'next' option by the user.

In an embodiment of the present application, the preference of the user for the third scene image includes at least one preference, which is used to indicate that the user does not like the third scene image, i.e. the target preference.

For example, referring to fig. 3, if the icon corresponding to the "dislike" option 403 changes from a white background to a red background, the marking of the preference of the third scene image as the target preference is completed.

In some embodiments of the application, the second input may also be that the user control screen remains displayed for less than one minute of the third scene image.

In some embodiments of the present application, if the user does not like the third scene image representation via the second input, the other same type of scene image may be adjusted to be presented to the user in a deferred manner.

In some embodiments of the application, the labels of the scene images include image feature labels of related reference images and scene labels of related groups, and when the user likes the third scene image, the scene images having the same label as the third scene image, i.e., the same type of scene image, can be found to adjust the scene images to be presented to the user in a deferred manner. The display order may be adjusted in the order of the number of the same labels from small to large, that is, the more the number of the same labels is, the later the display order is.

For example, the current display order is: third scene image, fourth scene image, fifth scene image, sixth scene image, present display third scene image, after the user indicates dislike third scene image, have three same labels between fourth scene image and the third scene image, have four same labels between fifth scene image and the third scene image to adjust the display order and be: and displaying the sixth scene image after clicking the 'next' option by the user.

In the embodiment of the application, the behavior data of the user on the scene images is acquired, and the display sequence of the scene images is adjusted in real time, so that the scene images which are liked by the user can be preferentially displayed, the scene images which are liked by the user can be presented to the user as early as possible, and the requirements of the user are met.

In some embodiments of the present application, after outputting the scene image, feedback of the user to the scene image is received, for example, in the case of displaying the scene image, a "like" option and a "dislike" option are also displayed, and the user can click on any option, so as to collect user input data, update user preference information, and thereby improve accuracy of pushing the scene image.

In some embodiments of the application, user feedback on the scene image includes, but is not limited to: fill out questionnaires and leave comments.

In some embodiments of the present application, after outputting the scene image, feedback of the user on the scene image is received, for example, in the case of displaying the scene image, a "like" option and a "dislike" option are also displayed, and the user can click on any option, so that at least one model algorithm is updated according to the user feedback data, so that reference images with high popularity, high forwarding rate, etc. can be matched, thereby improving the accuracy of pushing the scene image and increasing the pushing confidence.

In some embodiments of the present application, after updating the model algorithm, the model after updating the algorithm is compared with the model before updating the algorithm, and if the comparison result is: after updating the algorithm, the like degree of the user on the scene image is improved, for example, the click rate of the scene image is improved by 60%, the sharing rate is improved by 30%, and the like, so that the model after updating the algorithm has better pushing effect.

In some embodiments of the present application, a Virtual Reality (VR) technique may also be used to display a scene image, so that the user is in the scene; the scene image can be more vivid through the three-dimensional projection technology.

In some embodiments of the present application, the cloud server may store the user preference tag while the server sets business logic such as an interface for outputting the scene image. Meanwhile, the cloud server adopts multiple encryption measures, including secure socket layer (Security Socket Layer, SSL) encryption, common encryption algorithm, single access key access control, access log audit and the like. In addition, on the basis of guaranteeing stability and safety of the cloud server and the database, enough computing resources and load balancing strategies are added, so that the problems of application service failure or blocking and the like caused by large flow are avoided.

In summary, the application enables the user to see the scene image possibly happening on the user's body in the future based on a section of text or a section of voice through a series of data analysis and algorithm calculation, thereby bringing the user with the immersive future scene, enabling the user to feel beautiful in the world without going out, not only displaying the scene image, but also being more like a travel through the space-time adventure, and enabling the user to enjoy the unprecedented visual feast and novel experience.

According to the image acquisition method provided by the embodiment of the application, the execution subject can be an image acquisition device. In the embodiment of the present application, an image acquisition apparatus is described by taking an example in which an image acquisition apparatus performs an image acquisition method.

Fig. 5 shows a block diagram of an image acquisition apparatus according to an embodiment of the present application, the apparatus comprising:

an acquisition module 10, configured to acquire scene description information input by a user;

the processing module 20 is configured to output at least one scene image based on the scene description information, the user preference information, and the at least one model.

Optionally, the at least one model comprises an image tag acquisition model;

before the processing module 20 outputs at least one scene image based on the scene description information, the user preference information, and the at least one model, the processing module 20 is further configured to:

at least two reference images are input into the image tag acquisition model, and at least two groups of image feature tags of the at least two reference images are output.

Optionally, the at least one model includes a preference tag acquisition model and a scene tag acquisition model;

the processing module 20 is specifically configured to:

inputting the user preference information into a preference tag acquisition model, and outputting at least one user preference tag;

inputting the scene description information and at least one user preference label into a scene label acquisition model, and outputting at least one group of scene labels;

comparing at least one group of scene tags with at least two groups of image feature tags to obtain N groups of image feature tags, wherein the first number of identical tags existing between each group of the N groups of image feature tags and the at least one group of scene tags is larger than a first threshold value, and N is a positive integer;

and outputting N scene images according to N reference images corresponding to the N groups of image feature labels.

Optionally, the processing module 20 is specifically configured to:

acquiring target personage information in scene description information, wherein the target personage information comprises M personage names; m is a positive integer;

acquiring target character information in scene description information, wherein the target character information comprises M character names, and M is a positive integer;

under the condition that M=1, replacing the image content of a face area in each of the N reference images with a face image corresponding to the name of the person; wherein, in the case of m=1, each reference image includes only one face region;

under the condition that M is more than 1, the image content of each face area in the reference image is replaced by a face image corresponding to the name of the person with the same face characteristic as the face area in the reference image; wherein, in the case of M > 1, each reference image includes M face regions.

Optionally, the at least one scene image comprises a first scene image; the apparatus further comprises:

the first receiving module is used for receiving a first input of a user on the first scene image under the condition that the first scene image is displayed;

the processing module 20 is further configured to, in response to the first input, perform image stitching on the first scene image and the first image captured by the camera, and output a second scene image.

Optionally, the apparatus further comprises:

the display module is used for displaying at least one scene image;

a second receiving module for receiving a second input of a third scene image of the at least one scene image;

the processing module 20 is further configured to mark the preference level of the third scene image as the target preference level in response to the second input, and update the display order of the scene images of the same type as the third scene image in the at least one scene image.

The image acquisition device in the embodiment of the application can be an electronic device or a component in the electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. By way of example, the electronic device may be a mobile phone, tablet computer, notebook computer, palm computer, vehicle-mounted electronic device, mobile internet appliance (Mobile Internet Device, MID), augmented reality (augmented reality, AR)/Virtual Reality (VR) device, robot, wearable device, ultra-mobile personal computer, UMPC, netbook or personal digital assistant (personal digital assistant, PDA), etc., but may also be a server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (TV), teller machine or self-service machine, etc., and the embodiments of the present application are not limited in particular.

The image acquisition device according to the embodiment of the present application may be a device having an action system. The action system may be an Android (Android) action system, an ios action system, or other possible action systems, and the embodiment of the application is not limited specifically.

The image acquisition device provided by the embodiment of the application can realize each process realized by the embodiment of the method, and in order to avoid repetition, the description is omitted.

Optionally, as shown in fig. 6, the embodiment of the present application further provides an electronic device 100, including a processor 101, a memory 102, and a program or an instruction stored in the memory 102 and capable of running on the processor 101, where the program or the instruction implements each step of any one of the above embodiments of the image acquisition method when executed by the processor 101, and the steps achieve the same technical effects, and are not repeated herein.

The electronic device of the embodiment of the application includes the mobile electronic device and the non-mobile electronic device.

Fig. 7 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 1000 includes, but is not limited to: radio frequency unit 1001, network module 1002, audio output unit 1003, input unit 1004, sensor 1005, display unit 1006, user input unit 1007, interface unit 1008, memory 1009, processor 1010, camera 1011, and the like.

Those skilled in the art will appreciate that the electronic device 1000 may also include a power source (e.g., a battery) for powering the various components, which may be logically connected to the processor 1010 by a power management system to perform functions such as managing charge, discharge, and power consumption by the power management system. The electronic device structure shown in fig. 7 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than shown, or may combine certain components, or may be arranged in different components, which are not described in detail herein.

The processor 1010 is configured to obtain scene description information input by a user; at least one scene image is output based on the scene description information, the user preference information, and the at least one model.

Optionally, the at least one model includes an image tag acquisition model; the processor 1010 is further configured to input at least two reference images into the image tag acquisition model, and output at least two sets of image feature tags of the at least two reference images.

Optionally, the at least one model includes a preference tag acquisition model and a scene tag acquisition model; the processor 1010 is further configured to input the user preference information into the preference tag acquisition model and output at least one user preference tag; inputting the scene description information and the at least one user preference tag into the scene tag acquisition model, and outputting at least one group of scene tags; comparing the at least one group of scene tags with the at least two groups of image feature tags to obtain N groups of image feature tags, wherein a first number of identical tags existing between each group of the N groups of image feature tags and the at least one group of scene tags is larger than a first threshold value, and N is a positive integer; and outputting N scene images according to the N reference images corresponding to the N groups of image feature labels.

Optionally, the processor 1010 is further configured to obtain target personage information in the scene description information, where the target personage information includes M personage names, and M is a positive integer; under the condition that M=1, replacing the image content of a face area in each of the N reference images with a face image corresponding to the person name; wherein, in the case of m=1, each reference image includes only one face region; under the condition that M is more than 1, replacing the image content of each face area in the reference image with a face image corresponding to the name of the person with the same face characteristic as the face area in the reference image; wherein, in the case of M > 1, each reference image includes M face regions.

Optionally, the at least one scene image comprises a first scene image; a user input unit 1007 for receiving a first input of a first scene image by a user in a case where the first scene image is displayed; the processor 1010 is further configured to, in response to the first input, perform image stitching on the first scene image and a first image captured by the camera, and output a second scene image.

Optionally, a display unit 1006, configured to display the at least one scene image; a user input unit 1007 further for receiving a second input of a third scene image of the at least one scene image; the processor 1010 is further configured to mark a preference level of the third scene image as a target preference level in response to the second input, and update a display order of the scene images of the same type as the third scene image in the at least one scene image.

It should be appreciated that in an embodiment of the present application, the input unit 1004 may include a graphics processor (Graphics Processing Unit, GPU) 10041 and a microphone 10042, and the graphics processor 10041 processes image data of still pictures or video images obtained by an image capturing device (e.g., a camera) in a video image capturing mode or an image capturing mode. The display unit 1006 may include a display panel 10061, and the display panel 10061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1007 includes at least one of a touch panel 10071 and other input devices 10072. The touch panel 10071 is also referred to as a touch screen. The touch panel 10071 can include two portions, a touch detection device and a touch controller. Other input devices 10072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. Memory 1009 may be used to store software programs as well as various data including, but not limited to, application programs and an action system. The processor 1010 may integrate an application processor that primarily processes an action system, user pages, applications, etc., with a modem processor that primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 1010.

The memory 1009 may be used to store software programs as well as various data. The memory 1009 may mainly include a first memory area storing programs or instructions and a second memory area storing data, wherein the first memory area may store an operating system, application programs or instructions (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory 1009 may include volatile memory or nonvolatile memory, or the memory 1009 may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM), static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (ddr SDRAM), enhanced SDRAM (Enhanced SDRAM), synchronous DRAM (SLDRAM), and Direct RAM (DRRAM). Memory 1009 in embodiments of the application includes, but is not limited to, these and any other suitable types of memory.

The processor 1010 may include one or more processing units; optionally, the processor 1010 integrates an application processor that primarily processes operations involving an operating system, user interface, application programs, and the like, and a modem processor that primarily processes wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 1010.

The embodiment of the application also provides a readable storage medium, on which a program or an instruction is stored, which when executed by a processor, implements each process of the above-mentioned image acquisition method embodiment, and can achieve the same technical effects, and in order to avoid repetition, the description is omitted here.

Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes computer readable storage medium such as computer readable memory ROM, random access memory RAM, magnetic or optical disk, etc.

The embodiment of the application further provides a chip, which comprises a processor and a communication interface, wherein the communication interface is coupled with the processor, and the processor is used for running programs or instructions to realize the processes of the embodiment of the image acquisition method, and can achieve the same technical effects, so that repetition is avoided, and the description is omitted here.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

Embodiments of the present application provide a computer program product stored in a storage medium, where the program product is executed by at least one processor to implement the respective processes of the above-described image acquisition method embodiments, and achieve the same technical effects, and for avoiding repetition, a detailed description is omitted herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims

1. A method of image acquisition, the method comprising:

acquiring scene description information input by a user;

at least one scene image is output based on the scene description information, the user preference information, and the at least one model.

2. The method of claim 1, wherein the at least one model comprises an image tag acquisition model;

before the outputting of the at least one scene image based on the scene description information, the user preference information and the at least one model, the method further comprises:

and inputting at least two reference images into the image tag acquisition model, and outputting at least two groups of image feature tags of the at least two reference images.

3. The method of claim 2, wherein the at least one model comprises a preference tag acquisition model and a scene tag acquisition model;

the outputting at least one scene image based on the scene description information, the user preference information and the at least one model comprises:

inputting the user preference information into the preference tag acquisition model, and outputting at least one user preference tag;

inputting the scene description information and the at least one user preference tag into the scene tag acquisition model, and outputting at least one group of scene tags;

Comparing the at least one group of scene tags with the at least two groups of image feature tags to obtain N groups of image feature tags, wherein a first number of identical tags existing between each group of the N groups of image feature tags and the at least one group of scene tags is larger than a first threshold value, and N is a positive integer;

and outputting N scene images according to the N reference images corresponding to the N groups of image feature labels.

4. A method according to claim 3, wherein outputting N scene images according to N reference images corresponding to the N sets of image feature labels comprises:

acquiring target character information in the scene description information, wherein the target character information comprises M character names, and M is a positive integer;

under the condition that M=1, replacing the image content of a face area in each of the N reference images with a face image corresponding to the person name; wherein, in the case of m=1, each reference image includes only one face region;

under the condition that M is more than 1, replacing the image content of each face area in the reference image with a face image corresponding to the name of the person with the same face characteristic as the face area in the reference image; wherein, in the case of M > 1, each reference image includes M face regions.

5. The method of claim 1, wherein the at least one scene image comprises a first scene image;

after outputting at least one scene image based on the scene description information, the user preference information, and the at least one model, the method further comprises:

receiving a first input of a user to a first scene image while the first scene image is displayed;

and responding to the first input, performing image stitching on the first scene image and the first image shot by the camera, and outputting a second scene image.

6. The method of claim 1, wherein after outputting at least one scene image based on the scene description information, user preference information, and at least one model, the method further comprises:

displaying the at least one scene image;

receiving a second input to a third scene image of the at least one scene image;

in response to the second input, marking a preference level of the third scene image as a target preference level and updating a display order of the same type of scene image as the third scene image in the at least one scene image.

7. An image acquisition apparatus, the apparatus comprising:

the acquisition module is used for acquiring scene description information input by a user;

and the processing module is used for outputting at least one scene image based on the scene description information, the user preference information and the at least one model.

8. The apparatus of claim 7, wherein the at least one model comprises an image tag acquisition model;

before the processing module outputs at least one scene image based on the scene description information, user preference information, and at least one model, the processing module is further configured to:

9. The apparatus of claim 8, wherein the at least one model comprises a preference tag acquisition model and a scene tag acquisition model;

the processing module is specifically configured to:

10. The apparatus according to claim 9, wherein the processing module is specifically configured to:

acquiring target character information in the scene description information, wherein the target character information comprises M character names; m is a positive integer;

11. The apparatus of claim 7, wherein the at least one scene image comprises a first scene image; the apparatus further comprises:

a first receiving module for receiving a first input of a user to a first scene image in the case of displaying the first scene image;

the processing module is further used for responding to the first input, performing image stitching on the first scene image and the first image shot by the camera, and outputting a second scene image.

12. The apparatus of claim 7, wherein the apparatus further comprises:

the display module is used for displaying the at least one scene image;

the processing module is further configured to, in response to the second input, mark a preference level of the third scene image as a target preference level, and update a display order of a scene image of the same type as the third scene image in the at least one scene image.

13. An electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the image acquisition method as claimed in any one of claims 1 to 6.

14. A readable storage medium, characterized in that the readable storage medium has stored thereon a program or instructions which, when executed by a processor, implement the steps of the image acquisition method according to any one of claims 1 to 6.