Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extending join function #109

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

extending join function #109

wants to merge 1 commit into from

Conversation

johnlak
Copy link

@johnlak johnlak commented Mar 6, 2022

Hi

This a proposal for one change request and one feature request in dataframe join.

  1. modified join on to work with columns with same column names but different positions in the data frames

test code joining on column name "AA"

DataFrame<String> df1 = new DataFrame<String>();
df1.add("DD","AA","BB");
df1.append(Arrays.asList("d1","a1","b1"));
df1.append(Arrays.asList("d2","a2","b2"));
df1.append(Arrays.asList("d3","a3","b4"));

DataFrame<String> df2 = new DataFrame<String>();
df2.add("AA","CC");
df2.append(Arrays.asList("a1","c1"));
df2.append(Arrays.asList("a2","c2"));
df2.append(Arrays.asList("a4","c4"));

System.out.println(df1.joinOn(df2, DataFrame.JoinType.OUTER, "AA").resetIndex().toString());

output of 1.10 released version:
join fails due to AA being in positions 1 in df1 and 0 in df2

  	DD	AA_left	BB	AA_right	CC
 0	d1	a1     	b1	        	  
 1	d2	a2     	b2	        	  
 2	d3	a3     	b4	        	  
 3	  	       	  	a1      	c1
 4	  	       	  	a2      	c2
 5	  	       	  	a4      	c4

output with proposed change:

  	DD	AA_left	BB	AA_right	CC
 0	d1	a1     	b1	a1      	c1
 1	d2	a2     	b2	a2      	c2
 2	d3	a3     	b4	        	  
 3	  	       	  	a4      	c4
  1. adding join on columns with different names

joining on column df3 "AA1" and df4 "AA2" (using proposed change)

DataFrame<String> df3 = new DataFrame<String>();
df3.add("DD","AA1","BB");
df3.append(Arrays.asList("d1","a1","b1"));
df3.append(Arrays.asList("d2","a2","b2"));
df3.append(Arrays.asList("d3","a3","b4"));

DataFrame<String> df4 = new DataFrame<String>();
df4.add("AA2","CC");
df4.append(Arrays.asList("a1","c1"));
df4.append(Arrays.asList("a2","c2"));
df4.append(Arrays.asList("a4","c4"));

System.out.println(df3.joinOn(df4, new String[] {"AA1"}, new String[] {"AA2"}, DataFrame.JoinType.OUTER).resetIndex().toString());

output with proposed change:

  	DD	AA1	BB	AA2	CC
 0	d1	a1 	b1	a1 	c1
 1	d2	a2 	b2	a2 	c2
 2	d3	a3 	b4	   	  
 3	  	   	  	a4 	c4

Kind regards
John

- added join on columns with same name but different positions
- added join on with columns with different names
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant