Host DataBalanceAnalysis-AdultCensusIncome cell outputs in blob inste…

…ad of inline, use Interpretability-Image Explainers as outstanding notebook in features/responsible_ai/
microsoft · Nov 5, 2021 · 7565de5 · 7565de5
1 parent 70ee581
commit 7565de5
Show file tree

Hide file tree

Showing 7 changed files with 29 additions and 65 deletions.
diff --git a/notebooks/DataBalanceAnalysis - Adult Census Income.ipynb b/notebooks/DataBalanceAnalysis - Adult Census Income.ipynb
diff --git a/website/docs/examples/responsible_ai/DataBalanceAnalysis - Adult Census Income.md b/website/docs/examples/responsible_ai/DataBalanceAnalysis - Adult Census Income.md
@@ -158,9 +158,7 @@ fig.tight_layout()
 plt.show()
 ```
 
-
-![png](DataBalanceAnalysis-AdultCensusIncome_files/DataBalanceAnalysis-AdultCensusIncome_13_0.png)
-
+![Demographic Parity of Races in Adult Dataset](https://mmlspark.blob.core.windows.net/graphics/responsible_ai/DataBalanceAnalysis_AdultCensusIncome_RacesDP.png)
 
 #### Interpret Feature Balance Measures
 
@@ -273,9 +271,7 @@ fig.tight_layout()
 plt.show()
 ```
 
-
-![png](DataBalanceAnalysis-AdultCensusIncome_files/DataBalanceAnalysis-AdultCensusIncome_18_0.png)
-
+![Distribution Balance Measures of Sex and Race in Adult Dataset](https://mmlspark.blob.core.windows.net/graphics/responsible_ai/DataBalanceAnalysis_AdultCensusIncome_DistributionMeasures.png)
 
 #### Interpret Distribution Balance Measures
 

diff --git a/...Analysis-AdultCensusIncome_files/DataBalanceAnalysis-AdultCensusIncome_13_0.png b/...Analysis-AdultCensusIncome_files/DataBalanceAnalysis-AdultCensusIncome_13_0.png
diff --git a/...Analysis-AdultCensusIncome_files/DataBalanceAnalysis-AdultCensusIncome_18_0.png b/...Analysis-AdultCensusIncome_files/DataBalanceAnalysis-AdultCensusIncome_18_0.png
diff --git a/website/docs/features/responsible_ai/Data Balance Analysis.md b/website/docs/features/responsible_ai/Data Balance Analysis.md
@@ -175,22 +175,22 @@ This involves under-sampling from majority class and over-sampling from minority
   1. Under-sampling may remove valuable information.  
   2. Over-sampling may cause overfitting and poor generalization on test set.
 
-![Bar chart undersampling and oversampling](https://mmlspark.blob.core.windows.net/graphics/exploratory/DataBalanceAnalysis_SamplingBar.png)
+![Bar chart undersampling and oversampling](https://mmlspark.blob.core.windows.net/graphics/responsible_ai/DataBalanceAnalysis_SamplingBar.png)
 
 There are smarter techniques to under-sample and over-sample in literature and implemented in Python’s [imbalanced-learn](https://imbalanced-learn.org/stable/) package.  
 
 For example, we can cluster the records of the majority class, and do the under-sampling by removing records from each cluster, thus seeking to preserve information.  
 
 One technique of under-sampling is use of Tomek Links. Tomek links are pairs of very close instances but of opposite classes. Removing the instances of the majority class of each pair increases the space between the two classes, facilitating the classification process. A similar way to under-sample majority class is using Near-Miss. It first calculates the distance between all the points in the larger class with the points in the smaller class. When two points belonging to different classes are very close to each other in the distribution, this algorithm eliminates the datapoint of the larger class thereby trying to balance the distribution.
 
-![Tomek Links](https://mmlspark.blob.core.windows.net/graphics/exploratory/DataBalanceAnalysis_TomekLinks.png)
+![Tomek Links](https://mmlspark.blob.core.windows.net/graphics/responsible_ai/DataBalanceAnalysis_TomekLinks.png)
 
 In over-sampling, instead of creating exact copies of the minority class records, we can introduce small variations into those copies, creating more diverse synthetic samples. This technique is called SMOTE (Synthetic Minority Oversampling Technique). It randomly picks a point from the minority class and computes the k-nearest neighbors for this point. The synthetic points are added between the chosen point and its neighbors.
 
-![Synthetic Samples](https://mmlspark.blob.core.windows.net/graphics/exploratory/DataBalanceAnalysis_SyntheticSamples.png)
+![Synthetic Samples](https://mmlspark.blob.core.windows.net/graphics/responsible_ai/DataBalanceAnalysis_SyntheticSamples.png)
 
 ### Reweighting
 
 There is an expected and observed value in each table cell. The weight is essentially expected / observed value. This is easy to extend to multiple features with more than 2 groups. The weights are then incorporated in loss function of model training.  
 
-![Reweighting](https://mmlspark.blob.core.windows.net/graphics/exploratory/DataBalanceAnalysis_Reweight.png)
+![Reweighting](https://mmlspark.blob.core.windows.net/graphics/responsible_ai/DataBalanceAnalysis_Reweight.png)
diff --git a/...ai/Interpretability - Image Explainers.md → ...ai/Interpretability - Image Explainers.md b/...ai/Interpretability - Image Explainers.md → ...ai/Interpretability - Image Explainers.md
diff --git a/website/notebookconvert.py b/website/notebookconvert.py
@@ -14,23 +14,10 @@ def add_header_to_markdown(folder, md):
         f.close()
 
 
-def convert_notebook_to_markdown(folder, nb, outputdir):
-    file_path = os.path.join(folder, nb)
+def convert_notebook_to_markdown(file_path, outputdir):
     print(f"Converting {file_path} into markdown")
-
-    # If the notebook contains cell outputs such as figures, a folder containing cell output images is generated alongside the markdown file
-    # By default, both the folder and files contain the notebook name. But spaces in the notebook name create linking errors in the generated markdown
-    # Therefore, we first generate the markdown file, output folder, and output files with no spaces
-    nb_no_spaces = nb.replace(" ", "").replace(".ipynb", "")
-
-    convert_cmd = f'jupyter nbconvert --output-dir="{outputdir}" --NbConvertApp.output_base="{nb_no_spaces}" --to markdown "{file_path}"'
+    convert_cmd = f'jupyter nbconvert --output-dir="{outputdir}" --to markdown "{file_path}"'
     os.system(convert_cmd)
-
-    # Afterwards, we rename the generated markdown file to ensure that the markdown file has the same name as notebook
-    md_no_spaces = os.path.join(outputdir, f"{nb_no_spaces}.md")
-    md_final = os.path.join(outputdir, nb.replace(".ipynb", ".md"))
-    print(f"Renaming {md_no_spaces} to {md_final}")
-    os.rename(md_no_spaces, md_final)
     print()
 
 
@@ -42,7 +29,10 @@ def convert_allnotebooks_in_folder(folder, outputdir):
         "CognitiveServices": os.path.join(outputdir, "examples", "cognitive_services"),
         "DataBalanceAnalysis": os.path.join(outputdir, "examples", "responsible_ai"),
         "DeepLearning": os.path.join(outputdir, "examples", "deep_learning"),
-        "Interpretability": os.path.join(outputdir, "examples", "responsible_ai"),
+        "Interpretability - Image Explainers": os.path.join(outputdir, "features", "responsible_ai"),
+        "Interpretability - Explanation Dashboard": os.path.join(outputdir, "examples", "responsible_ai"),
+        "Interpretability - Tabular SHAP explainer": os.path.join(outputdir, "examples", "responsible_ai"),
+        "Interpretability - Text Explainers": os.path.join(outputdir, "examples", "responsible_ai"),
         "ModelInterpretability": os.path.join(outputdir, "examples", "responsible_ai"),
         "Regression": os.path.join(outputdir, "examples", "regression"),
         "TextAnalytics": os.path.join(outputdir, "examples", "text_analytics"),
@@ -70,7 +60,7 @@ def convert_allnotebooks_in_folder(folder, outputdir):
             if os.path.exists(os.path.join(finaldir, md)):
                 os.remove(os.path.join(finaldir, md))
 
-            convert_notebook_to_markdown(folder, nb, finaldir)
+            convert_notebook_to_markdown(os.path.join(folder, nb), finaldir)
             add_header_to_markdown(finaldir, md)