opencv - How to aggregate CSV data with group by in Java? -


assume have following csv file of action log of application. csv may contain 3 - 4 million rows.

company, actionstype, action abc, downloaded, tutorial 1 abc, watched, tutorial 2 pqr, subscribed, tutorial 1 abc, watched, tutorial 2 pqr, subscribed, tutorial 3 xyz, subscribed, tutorial 1 xyz, watched, tutorial 3 pqr, downloaded, tutorial 1 

is there anyway way aggregate data grouping company name , show actiontype counters column shown below using java?

company, downloaded, watched, subscribed abc, 1, 2, 0 pqr, 1, 0, 2 xyz, 0, 1, 1 

i thought of loading csv file list using opencsv, efficient csv file of millions of data?

it's inefficient if you're trying aggregate data. should check out mapreduce aggregating large data.

here's solution w/o mapreduce:

import java.io.bufferedreader; import java.io.stringreader; import java.util.hashmap;  public class csvmapper {      public string transformcsv (string csvfile) {         return csvmaptostring(getcsvmap(csvfile));     }      private hashmap<string, integer[]> getcsvmap (string csvfile) {         // <k,v> := <company, [downloaded, watched, subscribed]>         hashmap<string, integer[]> csvmap = new hashmap<string, integer[]>();         bufferedreader reader = new bufferedreader(new stringreader(csvfile));         string csvline;          // create map         try {             while ((csvline = reader.readline()) != null) {                 string[] csvcolumns = csvline.split(",");                 if (csvcolumns.length > 0) {                      try {                         string company = csvcolumns[0].trim();                         string actionstype = csvcolumns[1].trim();                         integer[] columnvalues = csvmap.get(company);                          if (columnvalues == null) {                             columnvalues = new integer[3];                             columnvalues[0] = columnvalues[1] = columnvalues[2] = 0;                         }                         columnvalues[0] = columnvalues[0] + (actionstype.equals("downloaded") ? 1 : 0);                         columnvalues[1] = columnvalues[1] + (actionstype.equals("watched")    ? 1 : 0);                         columnvalues[2] = columnvalues[2] + (actionstype.equals("subscribed") ? 1 : 0);                          if (!company.equals("company"))                             csvmap.put(company, columnvalues);                     }                     catch (exception nfe) {                         //todo: handle numberformatexception                     }                 }             }         }         catch (exception e) {             //todo: handle ioexception         }         return csvmap;     }      private string csvmaptostring (hashmap<string, integer[]> csvmap) {         stringbuilder newcsvfile = new stringbuilder();          newcsvfile.append("company, downloaded, watched, subscribed\n");         (string company : csvmap.keyset()) {             integer[] columnvalues = csvmap.get(company);              newcsvfile.append(company +                                ", " + integer.tostring(columnvalues[0]) +                               ", " + integer.tostring(columnvalues[1]) +                               ", " + integer.tostring(columnvalues[2]) + "\n");         }         return newcsvfile.tostring();     }      public static void main (string[] args) {         string csvfile = "company, actionstype, action\n" +                      "abc, downloaded, tutorial 1\n" +                      "abc, watched, tutorial 2\n" +                      "pqr, subscribed, tutorial 1\n" +                      "abc, watched, tutorial 2\n" +                      "pqr, subscribed, tutorial 3\n" +                      "xyz, subscribed, tutorial 1\n" +                      "xyz, watched, tutorial 3\n" +                      "pqr, downloaded, tutorial 1";          system.out.println( (new csvmapper()).transformcsv(csvfile) );     } } 

Comments

Popular posts from this blog

javascript - Any ideas when Firefox is likely to implement lengthAdjust and textLength? -

matlab - "Contour not rendered for non-finite ZData" -

delphi - Indy UDP Read Contents of Adata -