-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redesign the public dashboard to be more compact and to support error bars #86
Comments
This article talks about why pie chart Is not the preferred visualization tool. https://www.businessinsider.com/pie-charts-are-the-worst-2013-6. Before exploring into the alternatives for the pie chart, it's important to understand what's the requirement for the change. |
That is indeed the argument for using stacked bar charts. What is your point here?
I am not sure what "show the proportion to the whole representation." means. We do want to represent the uncertainty, and we want to give people a quick glance of the relative proportions displayed. That is why non-stacked bar charts are not a great option.
So what is your suggestion? Note that a cursory search may not give you the perfect answer, particularly for a negative result (i.e. "error bars on stacked bar charts don't work"). All that means is that the person who responded didn't know how to make it work.
Proving a negative takes a lot more than 2 SO links. |
I was thinking moreover in the direction of using bar chart instead of stacked bar charts, but we would not be able to show the relative proportions - which is equally important.
Combining values and limiting the number of different labels might be a solution. Let us consider a scenario two scenarios,
|
One possible solution to this could be to use tooltip such that user can hover over the Others section and see the distribution of labels. In this way, the representation of information is better. |
In relation to the design presented in the below link: The propose design is to bin different charts into one under the basis of some common parameter. I understand you've a better understanding of the usage of these charts, but I am just trying to understand if this is what the end-user of public-dashboard would like to see? Would it be appropriate to involve them or their inputs in the redesign process of public dashboard? |
We’re using date specific snapshot of charts. Meaning we have a chart for a particular date with the associated metrics. So, binning multiple charts into a single one in case of stacked bar chart representation, might lead to problem in comparing different individual charts. This might involve quite a bit of changes and not sure how relevant it would be to the requirements from the end-users, but if we could provide a timeline of changes - like using a stream chart as showcased here. The users might be able to compare different variation of the metric over a period of time more easily. So, the question is - is it more relevant for users to compare between two similar chart - metrics OR understand the change in an individual metric over a period of time? |
I couldn't identify the actual implementation of stacked bar charts in #78 . I looked into the plots.py file for different charts implementations, there are three variations of bar charts:
But I could not see any implementation of stacked bar charts. Could you please suggest me if I am looking in wrong file/place? |
Check the notebooks. There should be a new notebook that is not in master. |
wrt questions like
Figure it out! Read up on visualization materials, including any classes that you took, and make a couple of proposals, write out their pros and cons and then we can pick between them. |
We cannot use tooltips. The plots are static images. Please remember the software architecture. |
This is the same behavior as the current pie chart, so I don't see any issue with it.
The end-users are relying on us as technical/visualization experts, so you should figure out what is recommended from a visualization perspective (not easiest to implement, but most correct from a visualization perspective) and then we can see if we have any feedback. We should probably start with feedback from the internal team (UI team) and then can maybe ask others in the group.
We do have timeseries plots right now to see the variance over time, but they are typically for a single metric, which makes them not super confusing, and allows us to support error bars. The problem with the stream chart is that it can get very confusing very quickly. We had people try multiple superimposed timeseries (with and without error bars, #49) before and it was very very messy. We may want to include those as well although again as a stacked chart, but I think it would be too confusing for most of our partners. |
Found it!
|
Summarization of my investigation for representation of parts to whole charts (Composition charts): There can be three categories based on the representation of each categories of data.
As mentioned in the previous design suggestion, Stacked Bar Charts sound like a good candidate to represent part of a whole relationship, while also representing the uncertainty with error bars. References:
|
I really don't understand this comment. wrt
I don't see this. The proportion of
Why? If we are comparing the "parts of the whole" for two bars, then 20% should be represented the same way, just like it is in the pie chart.
What is the wrong inference? The reason that the it is "clear that the total sample of data is 22,554 on the left, while it is 4655 on the right" is because we include the numbers in addition to the percentage. A trivial fix would be to include the number along with the % in the stacked bar chart as well.
This just indicates that there is an implementation to work from, not that it is perfect, or that it cannot be changed. |
There are few issues with the #78 |
@iantei you need to look through the notebook more carefully. The |
Here's the execution of
Here is the aggregated representation of both the above charts into a single one:
![]() From the above data and aggregate representation chart, I had the following concern: Since, we're representing both these bars together. The disparity in the % representation to the actual number might give wrong impression at glance to the end user. However, as you suggested "including number along with the % in the stacked bar chart" would be a probable fix. And as represented above, it would be ideal for the end user to compare the similar metric related charts if we bucket them together, and represent it rather than create two stacked bar charts and place it next to each other for comparison. |
The approach I have taken for this implementation is:
|
wrt: #86 (comment)
Are you recommending that we use stacked bar charts, or not? I will also note that the graph looks quite ugly. There is a lot of space between the bars, and the numbers are essentially not visible. Were you able to use the modified graphs created by @Abby-Wheelis? |
Our goal is to identify charts which would suffice our two requirements:
Accounting to these two requirements, 100% Stacked Bar chart seemed like the ideal candidate. Adding to part of my previous comment, I think 100% Stacked Bar chart with the label of number alongside its percentage provides enough justification that percentage representation in one bar is not directly comparable to the other one. For instance in the below example, Comparing
The end user could get a wrong impression, accounting the percentage for Moreover, this representation of 100% stacked bar chart gives the end user a good way to compare mode of commute is more popular between each of these trips.
I modified the width of the bars and changed the font size, it seems better now.
I took reference from the same notebook |
@iantei @shankari from our discussion today, for handling the design concern in the comment above: non-graphical options:
graphical/visual options:
Other concerns: Next steps:
This is what lots of modes with low counts might look like (example from data analysis work for usaid-laos-ev), a limit has been places such that modes with low count (which is most of them in this case) do not have count and % labels, but you can use the legend to see visually that mode has a low count (in an ideal world colors would not repeat, but time was limited when I made this): |
An interesting observation for the above representation. Dataset used - usaid-laos-ev
Though informative, for 'All' Date, on the left chart - which is represented as the exploding version of 'Other' category on the left "Total Trips" bar chart, there are many labels with less proportion, therefore this will lead to inelegant representation of the charts. And there will be certain cases, where there are no 'Other' label altogether, therefore there will just be a single stacked bar chart. This might represent non-uniform representation for these charts. Some charts showing up with two bar charts while some having just one. |
Good point about the case where there are no "Other" modes - we should be sure whatever solution we come up with handles this as elegantly as possible! I would be very curious to see what other people think about the "With All Modes" option, since we don't need error bars the fact that many modes show up as a small sliver seems like it might not be too much of an issue. Exploding the "Other" category is also very representative, and I think there would be a way to make sure charts with not "Others" have a blank space next to them, rather than the current widening. However, if we are still going to have similar cases with the slivers of modes with only 1-2 count I question if that representation is "worth" the complication of exploding bars and separating the charts. I look forward to hearing the opinions of the rest of the team in our meeting this afternoon! |
@Abby-Wheelis we also discussed today that the exploded graphs here display the percentages for the explosion, which is incorrect. IMHO, we should instead have it as the percentage of the original, or at least see what that looks. ![]() |
As per the discussion from the afternoon meeting, we would be proceeding with the option of binning charts based on certain category, while keeping the expanded version of Other, while also giving the user an option to select a dropdown for table with all information regarding the label's count and proportion. |
On the account of proceeding with the above proposed solution. I would like to understand a few considerations which we need to make.
Bin 1: Based on trip count
Bin 2: Based on trip count (sensed)
Bin 3: Based on purpose(mode specific)
Bin 4: Based on Replaced mode (mode specific)
Bin 5: Based on Mileage
Bin 6: Based on Purpose
Bin 7: Based on Mileage (Sensed)
|
Notes from meeting today: Proposed combinations - keep the same metrics together and show up to three bars: total(sensed mode), inferred, and labeled
Steps:
|
Design considerations for text: Text for these chart/table pairs will need to consider total number of trips, number of trips with inferences, and number of trips with labels, likely alongside rates. Bar labels - goal to be clear on source of labels
example:
Concerns about this volume of text:
Concerns of obscuring this text:
Compromise options:
Depending on how early mocking and tests go, we can make a final decision on where to put the text about counts of trips and users for each bar. |
Accounting to this design approach, I will merge all three notebooks On the contrary, I do see the case of creating a super notebook, and losing the ability to just execute a single notebook for each specifics. For example, if I just wanted to run for |
I think that keeping the notebook organized and maybe including some markdown text to separate sections and make notes for people running the notebook could help with this concern. If we were to decide that it is an important feature for testing, we could think about ways to control what metrics are displayed - for example have a way for all of the charts to run, but only show the "sensed" bars. We do want to get automated tests implemented at some point though, so this may be a good argument for automated testing - if we have testing then it may be less of a concern that we can't perform tests one at a time ourselves. |
Changes with the usage of subplots instead of a consolidated data frame to validate the proposed change.
Note:
@shankari Does this change adhere to your proposal of using subplots? |
@iantei compare the charts - they are very similar except that the x axis is repeated. Do you think that looks good? I don't 😄
The "same color choice" is not due to the color changes not being incorporated, since the sensed mode charts are not affected by color changes anyway. It is because the two bar charts are generated separately. While incorporating the color changes, I would suggest using the basemode mapping to ensure that the colors are consistent. Similar to the phone code, we should use the same color palette for I am fine with deferring that change if it is too complex to include now. Ideally, the change would be in |
Future work relevant to this project as discussed in our meeting today based on #123 and the review of it:
|
The color selection of
This might be addressed easily once the base-mode color implementation is incorporated. |
For now, I think it's alright if the colors repeat but are in different legends, that is much better than if they repeat within the same legend, which we should make sure to avoid. I think you're right that the color-mode mapping would make the color maps a little easier to manage (and perhaps more intuitive to look at) |
At a high level, each plot that we generate in the public dashboard has the following steps:
plotting involves some internal pre-processing to make the results meaningful. We convert everything to percentages, combine small entries for the charts and save both charts and related text. All of this is standard and should be encapsulated into a standard function, and was already encapsulated by @iantei The challenge comes with the filtering and aggregating preprocessing. @iantei had structured those as Right now, given that all the preprocessing fits that template, I pass in some preprocessing templates, and handle both preprocessing and plotting in the same function. However, as we expand the public dashboard codebase to handle surveys, we may need more complex preprocessing that goes beyond Concretely
can be transformed into
Or if the reset_index/set_axis/sort_values, recur every time, as
This still makes it clear what we are doing here (plotting the mode_confirm property as counts) without having to look at the implementation of a library function, and keeping the plot code nicely encapsulated. @Abby-Wheelis I am now checking what kind of preprocessing you do for the survey code... |
@iantei note that the two options to split pre-processing and plotting will get us back to the original template for the plot call, which took in labels and values instead of a dataframe. There's a reason we used that structure in the first place 😄 |
To be consistent with e-mission#86 (comment)
Closing this since it has moved to production. |
The public dashboard (e.g. https://open-access-openpath.nrel.gov/public/ or https://durham-openpath.nrel.gov/public/) currently represent mode and mileage shares are pie charts. Apparently, per visualization best-practices, pie charts are bad. I am not 100% convinced about this, but I am not a visualization expert either.
In the future, we also want to be able to represent uncertainty in the metrics (as part of the "count every trip" project), presumably through something like error bars. We want to redesign the metrics to meet these dual goals.
One option is to use stacked bar charts, which are widely promoted as the replacement for bar charts (although we would need to think about how to represent error bars in that case). I am open to other visualizations, and would like to see a simple comparison of potential replacements and their pro/con.
I will also point out that there is an existing implementation of the metrics using stacked bar charts in #78 so if we do choose to go with that, the implementation should be pretty simple and not take much time.
This was originally tracked in #83 but we are moving the discussion here because that one got too unweildy
The text was updated successfully, but these errors were encountered: