The application makes two requests to GPT3.5 comparing to four requests in Stage 1 now. Instead of breaking down an object into a list of visual components, this algorithm asks GPT to describe an illustration directly. It is more of a GPT-to-GPT interface, in a contrast to Stage 1.
Despite these changes, the quality of the generated images remains stable and roughly comparable to the previous stage, with about a 20% success rate. Generated illustrations are more likely be complete objects without missing parts. On average, it takes approximately 1100 tokens to generate an image.
This is the first version that has been publish to Internet. User-generated images begin after "hacker news" images.
I am publishing 320 selected images along with their descriptions generated during this stage.
The first few images begin with "I cannot generate a visual content" preface. I modified the prompt shortly after to resolve the issue.