Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: --find-renames for discovering renames based on content #320

Merged
merged 5 commits into from
Jan 22, 2022

Conversation

fwiesel
Copy link
Contributor

@fwiesel fwiesel commented Dec 31, 2021

If we rename objects, they will appear as
add and remove actions, which makes it difficult
to assess the semantic delta, which may only be
a simple renaming.

The current implementation creates potentially
a diff between every add/remove item, which
means O(n^2) runtime. So it should only be
enabled if such changes are rare

@fwiesel
Copy link
Contributor Author

fwiesel commented Dec 31, 2021

I do not see a way around the O(n^2) runtime, except with some heuristics. But I think in most use-cases that is acceptable.
What I would like to change before considering it ready for review is:

  • max-content-delta: That should be more properly explained, and possibly replaced with something more intuitive.
  • Missing Test: At least a simple test case for the new feature should be implemented

@fwiesel fwiesel force-pushed the content_diff branch 2 times, most recently from fcf340c to 3537cb4 Compare January 3, 2022 14:57
@fwiesel fwiesel marked this pull request as ready for review January 3, 2022 15:00
The test case contains blank lines with white-spaces
and many editors do trim those, so we adopt the test
as well as the code to not indent blank lines
The three steps are called in three places, we better
combine them into a single function.
By encapsulating the parameters into a struct and
a corresponding cmd.options package helper functions,
we need only to change the code in two places when adding
a new parameter instead of four (for each subcommand).
If we rename objects, they will appear as
add and remove actions, which makes it difficult
to assess the semantic delta, which may only be
a simple renaming.

The current implementation creates a diff
between every added & removeed item of the same kind,
which means at worst O(m*n) runtime.
So it should only be enabled if such changes are rare
@fwiesel
Copy link
Contributor Author

fwiesel commented Jan 11, 2022

When implementing the tests, I was running into some minor issues I couldn't explain, so I split up the changes into several commits and refactored the code to limit the change required to implement the functionality.
I hope, it makes it better understandable what changed and why.

@fwiesel fwiesel requested a review from mumoshu January 20, 2022 10:10
Copy link
Collaborator

@mumoshu mumoshu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks a lot for your efforts!

@mumoshu mumoshu changed the title Discover renames based on content --find-renames to discover renames based on content Jan 22, 2022
@mumoshu mumoshu changed the title --find-renames to discover renames based on content feat: --find-renames to discover renames based on content Jan 22, 2022
@mumoshu mumoshu changed the title feat: --find-renames to discover renames based on content feat: --find-renames for discovering renames based on content Jan 22, 2022
@mumoshu mumoshu merged commit 9086688 into databus23:master Jan 22, 2022
@fwiesel fwiesel deleted the content_diff branch January 23, 2022 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants