DALL-E 4 Released: After Testing for Three Days, I Found It's Not Just About Better Images

Honestly, OpenAI’s update this time surprised me a bit.

DALL-E 4 quietly launched on April 5th — no press conference, no pre-launch hype, just an update log on Twitter. But after actually using it for three days, I found this update might be more important than many people think.

First, the most intuitive change: text rendering. Previously, when generating images with AI, text was basically a disaster zone — either misspelled, font corrupted, or turned into alien script. But DALL-E 4 really “gets text right” this time. I tested dozens of prompts: road signs, posters, book covers. English accuracy is at least 90%+. Chinese still has room for improvement, but much better than the previous generation.

What does this mean? It means AI image generation can finally be used for “serious stuff.” Advertising images, product packaging, PPT illustrations — these scenarios were too risky for AI before because wrong text would give it away. Now DALL-E 4 plugs this biggest shortcoming.

The second highlight is aspect ratio control. Previously, generated images were basically only 1:1. If you wanted landscape or portrait, you had to crop, and the composition would fall apart. Now DALL-E 4 supports any aspect ratio from 9:16 to 16:9, and it’s native generation, not post-cropping. I tried a 21:9 widescreen movie poster; the composition integrity was much better than the cropped version.

The third feature is multi-round editing. Honestly, this feels a bit “Photoshop crisis.” After generating an image, you can circle an area and say “change this person’s hat to red” or “remove the tree in the background,” and the AI will only modify the specified part without touching anything else. I tested it dozens of times; edge blending is quite natural with no obvious stitching traces.

But my personal feeling is that DALL-E 4’s biggest breakthrough this time may not be “drawing prettier” but “more controllable.”

Previous AI art tools had a major problem: “too much randomness.” You write a prompt, it generates ten images, maybe only one is usable. This uncertainty is fine for personal play but a nightmare for commercial scenarios — you can’t tell the client “let me generate 100 more times, surely one will satisfy you.”

DALL-E 4 made two key improvements in controllability. First, prompt understanding is more precise. You write “a Shiba Inu wearing a red scarf,” and it really gives you a Shiba Inu with a red scarf, not a Husky with a blue scarf and a hat. Second, it added a “seed reproduction” feature; you can specify a seed value to generate almost identical images with the same prompt. This is a lifesaver for scenarios requiring batch generation of style-consistent assets.

Of course, DALL-E 4 isn’t without flaws.

Pricing is still on the expensive side. Officially, the cost per image is 30% lower than DALL-E 3, but doing the math, generating a 1024x1024 image still costs $0.04. Generate 100 images a day, that’s $4; $120 a month. For individual creators, this cost is something to weigh.

Then there’s competitive pressure. Midjourney V7 is still stronger in artistic stylization; Stable Diffusion 4 has advantages in the open-source ecosystem. DALL-E 4 wanting to dominate is still unrealistic. And honestly, if your need is “draw a nice picture,” the gap between the three is no longer that big. Real competition may shift to the dimension of “who can better integrate into workflows.”

What I care more about is: what’s the next bottleneck for AI image generation?

Current models can already draw realistically, write text correctly, and control composition. But one problem remains unsolved: you still struggle to “precisely reproduce” the image in your head.

In your head you think “a golden retriever running at sunset on the beach with warm lighting.” What you write might just be “a golden retriever running on beach at sunset.” The AI will give you an image, but maybe the lighting isn’t warm enough, or the dog’s posture is wrong, or the beach atmosphere isn’t quite there. You have to repeatedly adjust the prompt and generate many times to get close to what you want.

This “gap from idea to image” may be the next hurdle AI image generation tools need to overcome.

DALL-E 4 can already help you “draw better,” but can’t yet “draw what you want.” Closing this gap may require not just model capability improvements but entirely new interaction methods.

So back to the original question: AI image generation tools are already this powerful; which capability do you think is still “impossible”?

My answer: it still can’t “read minds.” It can draw well, but doesn’t know what you really want. This “intent understanding” gap may be the next battlefield.