CapNet

A fully end-to-end trainable AI model that can recommend best position for captions/texts on images.

When we are making videos or posters, we need to decide where to place our texts. Human are good at this task because we have the sense of aesthetics. However, when facing large amount of images/videos, the efficiency will drop gradually due to fatigue. On the other hand, machines, though has no idea what is aesthetic, are good at doing repetitive tasks. We want to use machine learning methods to train a model, and perform as close to human as possible.

Results

Based on above-mentioned problem, we designed a novel, end-to-end machine learning model to predict the best region on an image to place the text/caption. Below are some of the results, the position is predicted and recommended by our model, and users can utilize that information to place their captions, titles and texts.

*Please do not share or use in any cases, images only for illustration purpose.

**Result: position is recommended by AI, with predifined text.**