How to correctly reference a dataframe in a Gemini…


I’m running a prompt using Gemini Pro to analyze a dataset that includes entity sentiment analysis datapoints on user reviews content. The dataset is a mix of mostly string and numerical data types, with a date timestamp. And I’m wanting the AI to analyze the data and write a summary, similar to an example I’m including in the Prompt.

The dataset is a pandas dataframe that’s in the same Python notebook as the Gemini Pro prompt. The dataframe shape is (166, 4). So I presume that’s not too large, although I have no idea what ‘too large’ would be. 

I’m using a structured prompt, with Context, Examples, and Input sections, with the Input containing instructions and the reference to the dataframe, in this line of code:

`Here is the data to be analyzed and summarized: {df_string}`

The ‘df_string’ is the dataframe being converted to string data, which I’d read that needed to be done, although I’ve also coded this without converting the dataframe to a string, replacing ‘df_string’ with a direct reference to the dataframe. Both have many and similar errors in their summaries. 

The summary has so many factual errors and misstatements regarding the dataframe being analyzed that I’m wondering if either way of contextualizing the data in this type of analysis is technically correct. 

Any suggestions on how to approach this greatly appreciated.



Source link

Leave a Comment