compare csv files with bash\awk\shell -
i write script perform compare between csv files. still have problem need be
5 values - space - 5 values
the problem there lines contain 4 values, need add instead of missing value space columum
input:
file1:
1,1,1,1 3,3,3,3,3
file2:
2,2,2,2 4,4,4,4,4
now results that:
1,1,1,1, ,2,2,2,2 3,3,3,3,3, ,4,4,4,4,4
i need results this:
1,1,1,1, , , 2,2,2,2,*space* 3,3,3,3,3, ,4,4,4,4,4
this code:
#! /bin/bash #------------------------------------------------------------------------------ # # description: joins files vartically based on file extensions. # # usage : ./joinfile directory1 directory2 # #------------------------------------------------------------------------------ #---- variables --------------------------------------------------------------- resultfile="resultfile.csv" #---- main -------------------------------------------------------------------- # checking if 2 arguments provided, if not, display usage info, , exit. if [ "$#" -ne 2 ] echo "usage: $0 directory1 directory2" exit 1 fi # checking if of arguments provided not directory. if [ ! -d "$1" -o ! -d "$2" ] if [ ! -d "$1" ] echo "error: $1 not valid directory" fi if [ ! -d "$2" ] echo "error: $2 not valid directory" fi exit 1 fi # removing end slash arguments, if user had provided. dir1=$(echo "$1" | sed 's/\/$//') dir2=$(echo "$2" | sed 's/\/$//') # creating array of files having ^ in filenames. filearr=( $(ls "$dir1"/*^* "$dir2"/*^*) ) # getting filearr length. filearrlen=${#filearr[@]} # creating array of extensions. (( i=0; i<"$filearrlen"; i++ )) extarr+=(${filearr[i]##*^}) done # removing duplicates , last extension extarr. oldifs="$ifs" ifs=$'\n' newextarr=($(for in "${extarr[@]}"; echo "$i" | sed 's/\.[^.]*$//'; done | sort -du)) ifs="$oldifs" # getting newextarr length. newextarrlen=${#newextarr[@]} # removing previous outfile, if exists. if [ -e "$resultfile" ] rm "$resultfile" fi # joning files vertically based on extensions. (( i=0; i<"$newextarrlen"; i++ )) ext="${newextarr[i]}" echo "handling ==> $ext" # getting files similar extensions. joinfiles=($(for j in "${filearr[@]}"; echo "$j" | grep "\^$ext"; done)) # getting joinfiles array length. joinfileslen=${#joinfiles[@]} # making list of files pasted. (( k=0; k<"$joinfileslen"; k++)) pastefiles+="${joinfiles[k]} " dos2unix "${joinfiles[k]}" 2>/dev/null cat "${joinfiles[k]}" | grep "^[ \t]*([0-9]* [0-9]*)," | sed 's/^[ \t]*//g' | sort -t, - k1 | cut -d',' -f1- >.ext_${k}_tags.csv done # executing paste command. echo "==> ${ext}" >> "$resultfile" awk 'begin{ fs = "," } { if(fnr == nr){ a[$1] = $0 } else{ b[$1] = $0 } for(i in a) { if (i in b) { c[i]=a[i]", ,"b[i]; if (a[i] == b[i] ) { c[i]="true,"c[i]; } else { c[i]="false,"c[i]; } } else { c[i]="false,"a[i]", ,"i",missing-missing-missing";} } for(i in b) { if (! in a) { c[i]="false,"i",missing-missing-missing, ,"b[i]; } } } end{ (i in c){ print c[i]; } } ' ".ext_0_tags.csv" ".ext_1_tags.csv"|sort -t, -k1 >> "$resultfile" rm -f ".ext_0_tags.csv" ".ext_1_tags.csv" done #---- end ---------------------------------------------------------------------
here's 1 way can solve problem:
awk -f, '{a[fnr]=a[fnr] sprintf("%s,%s,%s,%s,%s%s",$1,$2,$3,$4,($5==""?" ":$5),(nr==fnr?", ,":""))} end{for(i=1;i<=fnr;++i)print a[i]}' file1.txt file2.txt
this joins 2 files using array. %s
in sprintf
statement take either value of column, or space if fifth column empty. final %s
replaced comma if first file being processed. once of records have been processed, elements of array printed.
a number of assumptions made here: assumed fifth column can empty , there equal number of records in both files.
output:
1,1,1,1, , ,2,2,2,2, 3,3,3,3,3, ,4,4,4,4,4
Comments
Post a Comment