How to add correct examples for image-to-text task – Prompting


I’m now using GPT-4 Vision to describe simple objects with simple text as you can see in the attached image.

The description includes the shape, color, and texture of objects.
The images are very simple, however, GPT4 Vision cannot answer correctly. The performance was seriously terrible.

Then I want to improve the performance from a prompt engineering perspective, specifically, I want to add image-text paired examples.
First of all, I want to find precedents, my question is “Are there actually people who already tried this approach, adding image-text pair as the correct example for the image-to-text task?”

I actually tried to did this approach however, it didn’t work as I expected. Specifically,

The first image shows the Letter T shape, as an example.
The second image shows checkerboard texture as an example.

Then I showed GPT4 other T-shaped objects and a checkerboard object.
Then, it didn’t work.

So my question is do you know if there are already people who tried this, I want to see the research paper for this concept.

I know OpenAI announced already the limitation of GPT4-Vision officially. However, I was surprised the inaccurate performance.

And let me know the performance of GPT4 in your use-case, does that performs well for you?

scene_front_7
scene_front_5



Source link

Leave a Comment