Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hold for payment 2024-04-17] [$500] Improve performance of getPersonalDetailByEmail #38960

Closed
rlinoz opened this issue Mar 25, 2024 · 47 comments
Assignees
Labels
Bug Something is broken. Auto assigns a BugZero manager. Daily KSv2 External Added to denote the issue can be worked on by a contributor Reviewing Has a PR in review

Comments

@rlinoz
Copy link
Contributor

rlinoz commented Mar 25, 2024

Problem

We sometimes have the need to get personal data based on an email, currently the way to do it is calling getPersonalDetailByEmail which will loop over every value in our personalDetails array and check which one matches with the email we want, making this a O(n) search. The problem is that this can be really slow for policies with too many users creating a bad user experience.

Solution

Use the data we have stored in personalDetails and create a cached map keyed by email so we can use that map in getPersonalDetailByEmail making the search O(1). We should also update the cache whenever personalDetails change.

Upwork Automation - Do Not Edit
  • Upwork Job URL: https://www.upwork.com/jobs/~014394e09fe6cfc357
  • Upwork Job ID: 1772362853753180160
  • Last Price Increase: 2024-04-01
  • Automatic offers:
    • rojiphil | Reviewer | 0
    • abzokhattab | Contributor | 0
@rlinoz rlinoz added Daily KSv2 Bug Something is broken. Auto assigns a BugZero manager. labels Mar 25, 2024
Copy link

melvin-bot bot commented Mar 25, 2024

Triggered auto assignment to @joekaufmanexpensify (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details.

@rlinoz rlinoz added Help Wanted Apply this label when an issue is open to proposals by contributors External Added to denote the issue can be worked on by a contributor labels Mar 25, 2024
@melvin-bot melvin-bot bot changed the title Improve performance of getPersonalDetailByEmail [$500] Improve performance of getPersonalDetailByEmail Mar 25, 2024
Copy link

melvin-bot bot commented Mar 25, 2024

Job added to Upwork: https://www.upwork.com/jobs/~014394e09fe6cfc357

Copy link

melvin-bot bot commented Mar 25, 2024

Triggered auto assignment to Contributor-plus team member for initial proposal review - @rojiphil (External)

@FitseTLT
Copy link
Contributor

FitseTLT commented Mar 25, 2024

Proposal

Please re-state the problem that we are trying to solve in this issue.

Improve performance of getPersonalDetailByEmail

What is the root cause of that problem?

Improve performance

What changes do you think we should make in order to solve the problem?

We will first set the map here where we set allPersonalDetails by making the key of the map the login of the personal detail

allPersonalDetails = val;
},

        allPersonalDetailsMap = new Map();
        Object.values(val ?? {}).map((value) => allPersonalDetailsMap.set(value?.login, value));

And we return personal detail result by using get on the map here

return (Object.values(allPersonalDetails ?? {}) as PersonalDetails[]).find((detail) => detail?.login === email);
}

    return allPersonalDetailsMap.get(email);

What alternative solutions did you explore? (Optional)

@ghost
Copy link

ghost commented Mar 25, 2024

Proposal

Please re-state the problem that we are trying to solve in this issue.

Improve performance of getPersonalDetailByEmail

What is the root cause of that problem?

Here:

function getPersonalDetailByEmail(email: string): PersonalDetails | undefined {
return (Object.values(allPersonalDetails ?? {}) as PersonalDetails[]).find((detail) => detail?.login === email);
}

as the issue suggest, we have to loop over all the items of personalDetails array to get the email that we want.

What changes do you think we should make in order to solve the problem?

we can have 2 options i.e.

  1. Change Data Structure: Ensure allPersonalDetails is structured as a hash map where each key is an email address, and the value is the corresponding PersonalDetails object.

  2. Modify the Function: Adjust the function to directly access the PersonalDetails object using the email address as the key.

For implementing hashmap we can do something like this -

Define PersonalDetails and Initialize Cache

First, define the structure of PersonalDetails using type and initialize your cache. The cache should be a Map where keys are emails (strings) and values are PersonalDetails.

type PersonalDetails = {
  login: string;
  // Add other personal detail fields as needed
};

// Initialize an empty cache
let cache: Map<string, PersonalDetails> = new Map();

Populate and Maintain the Cache

Create a function to populate or update the cache from your existing personalDetails data. This cache needs to be kept in sync with any changes to personalDetails.

// Assuming personalDetails is an array of PersonalDetails
const personalDetails: PersonalDetails[] = [
  // Populate with initial data
];

// Function to populate or update the cache
function updateCache(details: PersonalDetails[]) {
  cache.clear(); // Clear existing cache to prevent stale data
  details.forEach(detail => {
    if (detail.login) {
      cache.set(detail.login, detail);
    }
  });
}

// Initial population of the cache
updateCache(personalDetails);

// Call updateCache(personalDetails) whenever personalDetails changes

Implement getPersonalDetailByEmail Using the Cache

Adjust getPersonalDetailByEmail to utilize the cache for O(1) lookup, leveraging the Map's .get() method.

function getPersonalDetailByEmail(email: string): PersonalDetails | undefined {
  return cache.get(email);
}

Handling Updates to personalDetails

Make sure that any modifications to personalDetails are reflected in the cache. This is crucial for additions, deletions, and updates.

// Example: Adding a new PersonalDetail
function addPersonalDetail(newDetail: PersonalDetails) {
  personalDetails.push(newDetail); // Add to your main data structure
  updateCache(personalDetails); // Refresh the cache to keep it in sync
}

// Implement similar functions for updating or removing details

What alternative solutions did you explore? (Optional)

N/A

@abzokhattab
Copy link
Contributor

abzokhattab commented Mar 25, 2024

Proposal

Please re-state the problem that we are trying to solve in this issue.

Improve performance of getPersonalDetailByEmail

What is the root cause of that problem?

getPersonalDetailByEmail runs in O(n) every time we search for a contact using the email

What changes do you think we should make in order to solve the problem?

we can use a hashmap and have the email as a key and the personaldetail as a value

let emailToDetailsCache: Record<string, PersonalDetails> = {};

and define buildCache function:

function buildCache() {
    emailToDetailsCache = personalDetails.reduce((acc: Record<string, PersonalDetails>, detail) => {
        if (detail && detail.login) {
            acc[detail.login.toLowerCase()] = detail;
        }
        return acc;
    }, {});
}

then inside the oynx.connect we should call the prev function:

Onyx.connect({
    key: ONYXKEYS.PERSONAL_DETAILS_LIST,
    callback: (val) => {
        personalDetails = Object.values(val ?? {});
        allPersonalDetails = val;
        buildCache(); // Rebuild the cache whenever personalDetails changes
    },
});

finally we should use the new object:

function getPersonalDetailByEmail(email) {
    return emailToDetailsCache[email.toLowerCase()];
}

@ghost
Copy link

ghost commented Mar 25, 2024

updated proposal with explaination of 1st method

@FitseTLT
Copy link
Contributor

Updated to add code implementation example

@abzokhattab
Copy link
Contributor

abzokhattab commented Mar 25, 2024

Updated Proposal:

Not a major change just fixed typescript errors.

@nayabatir1
Copy link

nayabatir1 commented Mar 25, 2024

Proposal

Please re-state the problem that we are trying to solve in this issue.

Improve performance of getPersonalDetailByEmail

What is the root cause of that problem?

Currently getPersonalDetailByEmail loop over all personal details to get data. Which makes it making this a O(n) search. We can leverage Map to store data in key value pair, making search O(1)

What changes do you think we should make in order to solve the problem?

updating Oynx.connect to store data in Map

const personalDetailsMap = new Map<string, PersonalDetails>();
Onyx.connect({
    key: ONYXKEYS.PERSONAL_DETAILS_LIST,
    callback: (val) => {
        personalDetails = Object.values(val ?? {});
        allPersonalDetails = val;
        personalDetails.map((p) => p?.login && personalDetailsMap.set(p.login, p));
    },
});

update getPersonalDetailByEmail to get data in constant time

function getPersonalDetailByEmail(email: string): PersonalDetails | undefined {
    return personalDetailsMap.get(email);
}

What alternative solutions did you explore? (Optional)

@ahmedGaber93
Copy link
Contributor

ahmedGaber93 commented Mar 26, 2024

Proposal

Please re-state the problem that we are trying to solve in this issue.

Improve performance of getPersonalDetailByEmail

What is the root cause of that problem?

getPersonalDetailByEmail loop over every value in our personalDetails array and check which one matches with the email we want, making this a O(n) search.

What changes do you think we should make in order to solve the problem?

I think create emailToAccountID rather than emailToPersonalDetails will improve the performance, and is better for memory which is also important as we define them in global scope.

let personalDetails: Array<PersonalDetails | null> = [];
let allPersonalDetails: OnyxEntry<PersonalDetailsList> = {};
+const emailToAccountIDMap: Record<string, number> = {};
Onyx.connect({
    key: ONYXKEYS.PERSONAL_DETAILS_LIST,
    callback: (val) => {
        personalDetails = Object.values(val ?? {});
        allPersonalDetails = val;
+      for (const personalDetail of personalDetails) {
+         if (personalDetail?.login){
+             emailToAccountIDMap[personalDetail.login] = personalDetail?.accountID;
+         }
+      }
    },
});

And getPersonalDetailByEmail will be

function getPersonalDetailByEmail(email: string): PersonalDetails | undefined {
    if (!emailToAccountIDMap[email]){
        return;
    }

    return (allPersonalDetails?.[emailToAccountIDMap[email]] ?? {}) as PersonalDetails;
}

What alternative solutions did you explore? (Optional)

@joekaufmanexpensify
Copy link
Contributor

Pending proposal review

@ikevin127
Copy link
Contributor

Proposal

Please re-state the problem that we are trying to solve in this issue

We aim to optimize the retrieval of personal data by email, overcoming the inefficiency of the current linear search method within a potentially large array.

What is the root cause of that problem?

The linear search approach has a time complexity of O(n), which leads to slow performance as the array's size increases.

What changes do you think we should make in order to solve the problem?

We can implement a caching mechanism using a hash map that stores each email as a unique key and its corresponding personal details as the value.

Changed code
+    // Hash map to maintain a cache of email to personal details
+    const emailToDetailsCache = new Map<string, PersonalDetails>();

+    // Function to populate the cache
+    function buildEmailToDetailsCache(allPersonalDetails: OnyxEntry<PersonalDetailsList>) {
+        // Update the hash map whenever there is a change
+        emailToDetailsCache.clear();

+        Object.values(allPersonalDetails ?? {}).forEach((detail) => {
+            if (!detail?.login) {
+                return;
+            }

+            emailToDetailsCache.set(detail.login, detail);
+        });
+    }

    let personalDetails: Array<PersonalDetails | null> = [];
    let allPersonalDetails: OnyxEntry<PersonalDetailsList> = {};
    Onyx.connect({
        key: ONYXKEYS.PERSONAL_DETAILS_LIST,
        callback: (val) => {
+            buildEmailToDetailsCache(val);
            personalDetails = Object.values(val ?? {});
            allPersonalDetails = val;
        },
    });

    function getPersonalDetailByEmail(email: string): PersonalDetails | undefined {
-        return (Object.values(allPersonalDetails ?? {}) as PersonalDetails[]).find((detail) => detail?.login === email);
+        return emailToDetailsCache.get(email);
    }

On a HT account with a allPersonalDetails lenght of 4301 accounts, I console.time'd the results of calling getPersonalDetailByEmail at the top of WorkspaceWorkflowsPage and these are the results:

Average time before: 1.435 ms
Average time after 1: 0.5275 ms
Average time after 2: 0.185 ms
Average time after 3: 0.165 ms

Overall performance improvement from before to after:

  • Improvement after 1: 63.2%
  • Improvement after 2: 87.1%
  • Improvement after 3: 88.5%

Result screenshots (before / after)

MacOS: Chrome (console)
  • before
before
  • after 1
after-1
  • after 2
after-2
  • after 3
after-3

@rojiphil
Copy link
Contributor

Update: I could not review the proposals today. Tomorrow, I will prioritize the proposal review and share the feedback.

@flodnv
Copy link
Contributor

flodnv commented Mar 26, 2024

On a HT account with a allPersonalDetails lenght of 4301 accounts

Please time this with at least 50,000 accounts

@ikevin127
Copy link
Contributor

@flodnv I would if I could, but I don't have access to such an account.
If you know how I could do that - I'd be happy to re-test ✌

@nayabatir1
Copy link

nayabatir1 commented Mar 26, 2024

@flodnv I've tested my approach with ~500K details, and average response time is under 0.01ms.

For testing purpose Map key is random unique string instead of actual user email and value is of type PersonalDetails.

Sample logs image

@rojiphil
Copy link
Contributor

rojiphil commented Mar 27, 2024

Thanks all for your proposals.
@ahmedGaber93 Your proposal could have advantage of less memory consumption but this would result in O(1)+O(1) time complexity which is less efficient than O(1)
@godofoutcasts94 Your proposal focusses on a generic implementation but lacks clarity on how it integrates with the existing code base. e.g. We do not have to define PersonalDetails as the type already exists.
@FitseTLT @abzokhattab @nayabatir1 @ikevin127 Your proposals have similar implementations with O(1) time complexity.

@abzokhattab proposal LGTM as that was the first one to have a more complete workable solution than others. As an optimization, we can populate emailToDetailsCache within Onyx.connect callback itself. But, we can take this up at the PR stage.
🎀👀🎀 C+ reviewed

Copy link

melvin-bot bot commented Mar 27, 2024

Triggered auto assignment to @luacmartins, see https://stackoverflow.com/c/expensify/questions/7972 for more details.

@ghost
Copy link

ghost commented Mar 27, 2024

Thanks all for your proposals. @ahmedGaber93 Your proposal could have advantage of less memory consumption but this would result in O(1)+O(1) time complexity which is less efficient than O(1) @godofoutcasts94 Your proposal focusses on a generic implementation but lacks clarity on how it integrates with the existing code base. e.g. We do not have to define PersonalDetails as the type already exists. @FitseTLT @abzokhattab @nayabatir1 @ikevin127 Your proposals have similar implementations with O(1) time complexity.

@abzokhattab proposal LGTM as that was the first one to have a more complete workable solution than others. As an optimization, we can populate emailToDetailsCache within Onyx.connect callback itself. But, we can take this up at the PR stage. 🎀👀🎀

Yes @rojiphil I know we don't have to define PersonalDetails as it already exists, I forgot to mention it my bad. But still if you want or if you can consider my proposal then I can do some changes in it and show you that how we can integrate in our codebase and also I would like to state that Map increases speed and efficiency but reduces time complexity as well.

Copy link

melvin-bot bot commented Apr 1, 2024

📣 It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? 💸

@melvin-bot melvin-bot bot removed the Help Wanted Apply this label when an issue is open to proposals by contributors label Apr 1, 2024
Copy link

melvin-bot bot commented Apr 1, 2024

📣 @rojiphil 🎉 An offer has been automatically sent to your Upwork account for the Reviewer role 🎉 Thanks for contributing to the Expensify app!

Offer link
Upwork job

Copy link

melvin-bot bot commented Apr 1, 2024

📣 @abzokhattab 🎉 An offer has been automatically sent to your Upwork account for the Contributor role 🎉 Thanks for contributing to the Expensify app!

Offer link
Upwork job
Please accept the offer and leave a comment on the Github issue letting us know when we can expect a PR to be ready for review 🧑‍💻
Keep in mind: Code of Conduct | Contributing 📖

@joekaufmanexpensify
Copy link
Contributor

@abzokhattab is there an ETA for PR here?

@abzokhattab
Copy link
Contributor

The PR will be ready today. thanks Joe

@melvin-bot melvin-bot bot added Reviewing Has a PR in review Weekly KSv2 and removed Daily KSv2 labels Apr 3, 2024
@flodnv flodnv added Daily KSv2 and removed Weekly KSv2 labels Apr 3, 2024
@joekaufmanexpensify
Copy link
Contributor

PR is out on staging

@joekaufmanexpensify joekaufmanexpensify added Weekly KSv2 and removed Daily KSv2 labels Apr 8, 2024
@joekaufmanexpensify
Copy link
Contributor

PR on production. Automation didn't work but payment due in 2 days (2024-04-17)

@joekaufmanexpensify joekaufmanexpensify changed the title [$500] Improve performance of getPersonalDetailByEmail [hold for payment 2024-04-17] [$500] Improve performance of getPersonalDetailByEmail Apr 15, 2024
@joekaufmanexpensify joekaufmanexpensify added Daily KSv2 and removed Weekly KSv2 labels Apr 18, 2024
@joekaufmanexpensify
Copy link
Contributor

All set to issue payment here. We need to pay:

Noting that payment here is $500 each since this issue was opened prior to baseline being set at $250.

@joekaufmanexpensify
Copy link
Contributor

@rojiphil please accept the upwork offer so we can pay you

@joekaufmanexpensify
Copy link
Contributor

@abzokhattab $500 sent and contract ended!

@rojiphil
Copy link
Contributor

please accept the upwork offer so we can pay you

@joekaufmanexpensify Accepted offer. Thanks

@joekaufmanexpensify
Copy link
Contributor

@rojiphil $500 sent and contract ended!

@joekaufmanexpensify
Copy link
Contributor

Upwork job closed.

@joekaufmanexpensify
Copy link
Contributor

All set, thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something is broken. Auto assigns a BugZero manager. Daily KSv2 External Added to denote the issue can be worked on by a contributor Reviewing Has a PR in review
Projects
No open projects
Archived in project
Development

No branches or pull requests