Group the values in selected columns of data file using ‘Group by’

The Group by function allows you to group the values in selected columns by the values in other columns. A common use is for time series data where you want to split a long list of values into a column for each year, or each month.

Step 1: Select the data from the left file tree or from “files” tab which needs to be manipulated.

Step 2: Once you select the file, it will be opened in the “Data Wrangler”.

Step 3: Click on the “Group by” and a popup will appear.

Step 4: Click on the columns in which you want to apply group by under “Select data”.

Step 5: After selecting the columns you want to group, the method area drop-down will allow you to select the column that you wish to base the groups on. Depending on the grouping column that you select from the drop down, you will be given a series of grouping options.

  1. Date-time column: your options for grouping are year, month, day of the week or hour of the day.
  2. Numeric column: you can choose the number of bins to group the data into. The bins are of equal size and are calculated as the range in values divided into equal range bins.
  3. Text: each unique text string will be used to form a group. For example, if the column contains names, then the output will have a new column for each person.

Note: Your groups can grow quickly if you then combine multiple group by columns (by clicking ‘and’). In this way you can create more complex groups.

Step 6: The default “Result” is to overwrite the original file with the updated file.

Select what you want to save as your output file from the “output” dropdown menu.

If you want to keep the original file along with your updated file then, click on “Keep all columns” or “only keep selected columns” under “create new file”. Click “run”. The new file will be saved in the same folder as original file.