How to get unique values of an RDataFrame column – ROOT


Hi,
I’am doing an analysis with RDataFrame and I want to get all the unique values from a dataframe column; for example if I have a column with the elements (1,2,1,2,3,6,6), I would like to get only (1,2,3,6) (i.e. only the set of the element contained). The only way I know to do this is to pass from Numpy but I would prefere a solution without loading the column in memory.

Hi Marco,

Welcome to the ROOT community!

This is an interesting question. Let me ask for some additional context. I understand that you have a RDF and you want to extract one single column of individual numbers (one per entry) and eliminate the duplicates. Is that correct?
Or do you mean you have a column with numbers and you want to skip the processing of entries in case the same number in a column is encountered more than once?

Cheers,
D

Hi Danilo,
I would like to know how many different values are in a column, to be able to extract a subdataframe for each of the values. For example if I have a dataset of the form :
col0 col1
1 5.4
2 6.8
1 2.5
5 7.9
I want to separate it in different dataframes by each different value of col1 (of which i don’t know a priori what is the content, so I need to extract this info), giving:
dataset1:
col0 col1
1 5.4
1 2.5

dataset2:
col0 col1
2 6.8

dataset3:
col0 col1
5 7.9

Thank you,
Marco



Source link

Leave a Comment