compare csv files with bash\awk\shell -


i write script perform compare between csv files. still have problem need be

5 values - space - 5 values

the problem there lines contain 4 values, need add instead of missing value space columum

input:

file1:

1,1,1,1 3,3,3,3,3 

file2:

2,2,2,2 4,4,4,4,4 

now results that:

1,1,1,1, ,2,2,2,2 3,3,3,3,3, ,4,4,4,4,4 

i need results this:

1,1,1,1, , , 2,2,2,2,*space*  3,3,3,3,3, ,4,4,4,4,4 

this code:

#! /bin/bash  #------------------------------------------------------------------------------ # # description: joins files vartically based on file extensions. # # usage      : ./joinfile directory1 directory2 # #------------------------------------------------------------------------------  #---- variables ---------------------------------------------------------------  resultfile="resultfile.csv"  #---- main --------------------------------------------------------------------  # checking if 2 arguments provided, if not, display usage info, , exit. if [ "$#" -ne 2 ]    echo "usage: $0 directory1 directory2"    exit 1 fi  # checking if of arguments provided not directory. if [ ! -d "$1" -o ! -d "$2" ]    if [ ! -d "$1" ]          echo "error: $1 not valid directory"    fi     if [ ! -d "$2" ]          echo "error: $2 not valid directory"    fi     exit 1 fi  # removing end slash arguments, if user had provided. dir1=$(echo "$1" | sed 's/\/$//') dir2=$(echo "$2" | sed 's/\/$//')  # creating array of files having ^ in filenames. filearr=( $(ls "$dir1"/*^* "$dir2"/*^*) )  # getting filearr length. filearrlen=${#filearr[@]}  # creating array of extensions. (( i=0; i<"$filearrlen"; i++ ))    extarr+=(${filearr[i]##*^}) done  # removing duplicates , last extension extarr. oldifs="$ifs" ifs=$'\n' newextarr=($(for in "${extarr[@]}"; echo "$i" | sed 's/\.[^.]*$//'; done | sort -du)) ifs="$oldifs"  # getting newextarr length. newextarrlen=${#newextarr[@]}  # removing previous outfile, if exists. if [ -e "$resultfile" ]    rm "$resultfile" fi  # joning files vertically based on extensions. (( i=0; i<"$newextarrlen"; i++ ))    ext="${newextarr[i]}"     echo "handling ==> $ext"    # getting files similar extensions.    joinfiles=($(for j in "${filearr[@]}"; echo "$j" | grep "\^$ext"; done))     # getting joinfiles array length.    joinfileslen=${#joinfiles[@]}     # making list of files pasted.    (( k=0; k<"$joinfileslen"; k++))          pastefiles+="${joinfiles[k]} "         dos2unix "${joinfiles[k]}" 2>/dev/null         cat "${joinfiles[k]}" | grep "^[ \t]*([0-9]* [0-9]*)," | sed 's/^[ \t]*//g'  | sort -t, -       k1 | cut -d',' -f1- >.ext_${k}_tags.csv    done     # executing paste command.    echo "==> ${ext}" >> "$resultfile"  awk 'begin{ fs = "," } { if(fnr == nr){ a[$1] = $0 } else{ b[$1] = $0 }  for(i in a) {  if (i in b)  { c[i]=a[i]", ,"b[i]; if (a[i] == b[i] ) { c[i]="true,"c[i]; } else { c[i]="false,"c[i]; }  } else { c[i]="false,"a[i]", ,"i",missing-missing-missing";} } for(i in b) {  if (! in a) { c[i]="false,"i",missing-missing-missing, ,"b[i]; } } } end{ (i in c){ print c[i]; } } ' ".ext_0_tags.csv" ".ext_1_tags.csv"|sort -t, -k1 >> "$resultfile"  rm -f ".ext_0_tags.csv" ".ext_1_tags.csv"  done  #---- end --------------------------------------------------------------------- 

here's 1 way can solve problem:

awk -f, '{a[fnr]=a[fnr] sprintf("%s,%s,%s,%s,%s%s",$1,$2,$3,$4,($5==""?" ":$5),(nr==fnr?", ,":""))} end{for(i=1;i<=fnr;++i)print a[i]}' file1.txt file2.txt 

this joins 2 files using array. %s in sprintf statement take either value of column, or space if fifth column empty. final %s replaced comma if first file being processed. once of records have been processed, elements of array printed.

a number of assumptions made here: assumed fifth column can empty , there equal number of records in both files.

output:

1,1,1,1, , ,2,2,2,2, 3,3,3,3,3, ,4,4,4,4,4 

Comments

Popular posts from this blog

javascript - Any ideas when Firefox is likely to implement lengthAdjust and textLength? -

matlab - "Contour not rendered for non-finite ZData" -

delphi - Indy UDP Read Contents of Adata -