Its still crazy to me that Microsoft did not create an out of the box action to process a csv (or .xls files too 😣 ). Fortunatley there are a few ways to do it, 4 in fact, some cool, some easy, some crazy, but all interesting (well I think so 😎) so I want to show them:
- Plumsail
- Flow
- Office Script
- DataFlow
Plumsail (Expensive way)
Plumsail offers a payed connector to process any csv plumsail process csv, but obiously has a cost and thats just boring, so quickly moving on.
Flow (Crazy way)
So a csv is simply one long string with delimitas for rows and columns. The rows are identified by \r\n (for most, though some are just \n) and the column by a comma (,) (comma seperated values, its in the name 😎 ). So our flow is going to split into rows and then into columns, but we have a few issues to fix:
- Getting Column Headers
- Dealing with commas within values
The best way to show is to walk through the flow:
First we declare a ‘few’ variables
- aHeaders = this the array we will fill with our column headers
- rIndex = to record current row we are working on
- cIndex = to record current column we are working on
- aRows = array that saves transformed row data
- oRow = object we use to build the row before adding to aRows
After variables we need to grab the csv and split the rows. We use the below expression:
split(outputs('Get_file_content')?['body'],decodeUriComponent('%0D%0A'))
%0D%0A is the encoded version of \r\n
We use a Do until that checks against the rIndex to know when all rows complete (there is always a blank row at the end so we minus 1 row). Also this could easily be a Apply_to_each, I just have a soft spot for Do_Untils.
We use the interationindex to check if its the first row, if it is we split by a comma and then add to the aHeaders array.
For the main rows it is a little more complex. We are going to split the row again and then loop over the new column array. For each column we use the AddProperty expression, using the column index to find the column name (from the aHeaders) and column value.
addProperty(variables('oRow'),
variables('aHeaders')[iterationIndexes('Do_until_columns')]
,
outputs('SplitColumns')[iterationIndexes('Do_until_columns')]
)
When we addProperty we actually create a copy of the orgional object and add the property. So we need to now update the oRow variable with the value from the compose.
Finally we append the oRow object to the aRows array.
But did you spot the deliberate mistake…. currently there is no way I know of to process csv’s that contain commas inside a value. Back in the day Power Automate use to return the csv like this:
""David","1","TRUE""
so we could split on ‘”,’ but now it returns"David,1,TRUE"
So there is not way to identify a comma in a value to a comma sperator.
Office Script (Cool way)
We love a bit of pro-code, and good news Microsft have made it for us already (https://learn.microsoft.com/en-us/office/dev/scripts/resources/samples/convert-csv), but that is only to convert it to Excel, what if you want to covnert to a JSON to use in your flow. I created a script a while back in this blog 5 Scripts every Power Automate Developer Should Know, but that requires you to hardcode in the columns, what if you want them to be dynamic (ie work for every csv).
The problem is Office Scripts are based on TypeScript (so every objects structure has to be declared), and they have banned any (so we can’t even uses TypeScripts own workaround). Fortunatley there is a way, and thats to build our own JSON array as a string and get Power Automate to convert it back to a JSON.
function main(workbook: ExcelScript.Workbook, csv: string) {
let sJson: string = "[";
let aHeaders: string[] = []
csv = csv.replace(/\r/g, "");
let rows = csv.split("\n");
const csvRegex = /(?:,|\n|^)("(?:(?:"")*[^"]*)*"|[^",\n]*|(?:\n|$))/g
rows.forEach((value, index) => {
let rIndex=index;
if (value.length > 0) {
let row = value.match(csvRegex);
if (row[0].charAt(0) === ',') {
row.unshift("");
}
if (index != 0) { sJson += "{" }
row.forEach((cell, index) => {
row[index] = cell.indexOf(",") === 0 ? cell.substr(1) : cell;
if (rIndex == 0) {
aHeaders.push(row[index] .toString())
} else {
if (Number(row[index])){
sJson += '"' + aHeaders[index] + '":' + row[index] + ','
} else if (row[index] == "TRUE" || row[index] == "FALSE"){
sJson += '"' + aHeaders[index] + '":' + row[index].toLowerCase() + ','
}else{
sJson += '"' + aHeaders[index] + '":"' + row[index].trim() + '",'
}
}
});
if (index != 0) {
sJson = sJson.substring(0, sJson.length - 1);
sJson += "},"
}
}
});
sJson = sJson.substring(0, sJson.length - 1);
sJson += "]";
return (sJson);
}
The split is based on Microsfts, but we change a few things:
First we declare a string varibale called sJson and set it to ‘[‘ , opening our array. If its the first row (ie headers) we add them to a seperate array called aHeaders.
On the next row we open our object if (index != 0) { sJson += "{" }
and then loop over each column adding the corrisponding value from the aHeaders array: sJson += '"' + aHeaders[index] + '":"' + row[index].trim() + '",'
.
To make sure we convert any numbers/booleans from strings we add a little logic, so all together for each row we end up with:
if (rIndex == 0) {
aHeaders.push(row[index] .toString())
} else {
if (Number(row[index])){
sJson += '"' + aHeaders[index] + '":' + row[index] + ','
} else if (row[index] == "TRUE" || row[index] == "FALSE"){
sJson += '"' + aHeaders[index] + '":' + row[index].toLowerCase() + ','
}else{
sJson += '"' + aHeaders[index] + '":"' + row[index].trim() + '",'
}
}
Finally we do the closing, we remove the last comma sJson = sJson.substring(0, sJson.length - 1);
, and then close the object sJson += "},"
. After all the rows have been processed we repeat to close the array, but swap out ‘}’ to ‘]’ sJson += "]";
Quick note, the office script returns the array as a string so we use the json() expression to convert it back to a json
Dataflows (Easy way)
Dataflows are cool and still under used (didn’t help that for long time they were not solution aware). I have done a full blog a while back here, but in a nut shell its Power Query (the exact same you see in Excel and Power BI). So we can use a lovely UI to convert our text file (as thats what csv’s really are) into proper data.
To create one you head over to make.powerapps.com and select Dataflows from left menu, create.
Select text/csv file type (see I told you so).
Then create some connections and select the file. As its a csv Dataflows automatically transforms the data into rows/columns and sets types.
You can then add filters and calculated columns if you like (See my previous blog for how), but as we are just grabbing the csv we can leave as is and hit next.
Next we have to decide where we store the data, and this is the big draw back, we have to save to Dataverse, so its premium functionality.
We can use exsiting table or create a new table, and then we can download and use the data as needed.
Dataflows give us a few options, we can run on Power Automate trigger (when file gets update), or we can schedule the Dataflow to update, and then when it finishes use that to trigger a flow.
This example waits with a timer for the Dataflow to finish and lists table, its not the way to go as its a hard coded wait but you get the idea 😎
Hopefully one of the solutions will work for you (Im also looking at creating a lowcode plugin, will update if I do). All the flows can be found here and script here to download and look at.