Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input output swap #2239

Merged
merged 13 commits into from
Jan 29, 2019
Merged

Input output swap #2239

merged 13 commits into from
Jan 29, 2019

Conversation

sfilipi
Copy link
Member

@sfilipi sfilipi commented Jan 25, 2019

Fixes issue #2064

1- Change the order of the parameters from inputColumn, outputColumn to outputColumnName, sourceColumnName.

2 - Changing the "source" parameter name and field in the Columninfo classes, to be "sourceColumnName", as suggested.

3- Changing the "name" parameter to "outputColumnName" in the:

  • estimator extension APIs
  • estimator ctors
  • column pairs expressed through tuples, because in context it reads better than name.

Note: in the columnInfo classes i left it to "name" because "outputColumnName" makes no sense.

4 - Nit on standardizing the XML comments.
5 - Arranging the order of the parameters to be: outputColumnName, required parameters, nullable sourceColumnName.
6 - fixed some bugs i bumped into.

@codecov
Copy link

codecov bot commented Jan 25, 2019

Codecov Report

Merging #2239 into master will increase coverage by 5.7%.
The diff coverage is 77.8%.

@@            Coverage Diff             @@
##           master    #2239      +/-   ##
==========================================
+ Coverage   65.43%   71.13%    +5.7%     
==========================================
  Files         766      779      +13     
  Lines      133371   140271    +6900     
  Branches    15669    16046     +377     
==========================================
+ Hits        87269    99785   +12516     
+ Misses      41889    36034    -5855     
- Partials     4213     4452     +239
Flag Coverage Δ
#Debug 71.13% <77.8%> (+5.7%) ⬆️
#production 67.56% <70%> (+5.27%) ⬆️
#test 85.05% <95.14%> (+3.52%) ⬆️

/// - Each such metadata column is itself compatible with the input metadata column.
/// </summary>
[BestFriend]
internal bool IsCompatibleWith(Column inputColumn)
internal bool IsCompatibleWith(Column source)
Copy link
Member Author

@sfilipi sfilipi Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Column source) [](start = 43, length = 14)

unnecessary change.. #Resolved

@@ -385,9 +385,9 @@ private static void GetColTypesAndIndex(IHostEnvironment env, IDataView inputDat

for (int i = 0; i < columns.Length; i++)
{
var col = inputSchema.GetColumnOrNull(columns[i].Input);
var col = inputSchema.GetColumnOrNull(columns[i].Source);
Copy link
Member Author

@sfilipi sfilipi Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source [](start = 65, length = 6)

SourceColumnName #Resolved

public readonly string Input;
public readonly string Output;
public readonly string Name;
public readonly string Source;
Copy link
Member Author

@sfilipi sfilipi Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source [](start = 35, length = 6)

SourceColumnName #Resolved

string output = null, ImagePixelExtractorTransformer.ColorBits colors = ImagePixelExtractorTransformer.Defaults.Colors,
public ImagePixelExtractingEstimator(IHostEnvironment env,
string name,
string source = null,
Copy link
Member Author

@sfilipi sfilipi Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source [](start = 19, length = 6)

sourceColumnName #Resolved

public ImagePixelExtractingEstimator(IHostEnvironment env, string input,
string output = null, ImagePixelExtractorTransformer.ColorBits colors = ImagePixelExtractorTransformer.Defaults.Colors,
public ImagePixelExtractingEstimator(IHostEnvironment env,
string name,
Copy link
Member Author

@sfilipi sfilipi Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name [](start = 19, length = 4)

outputColumnName #Resolved

public readonly string Input;
public readonly string Output;
public readonly string Name;
public readonly string Source;
Copy link
Member Author

@sfilipi sfilipi Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source [](start = 35, length = 6)

SourceColumName #Resolved

public DnnImageFeaturizerEstimator(IHostEnvironment env, Func<DnnImageFeaturizerInput, EstimatorChain<ColumnCopyingTransformer>> modelFactory, string inputColumn, string outputColumn)
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="sourceColumnName"/>.</param>
/// <param name="sourceColumnName">Name of column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>
public DnnImageFeaturizerEstimator(IHostEnvironment env, Func<DnnImageFeaturizerInput, EstimatorChain<ColumnCopyingTransformer>> modelFactory, string outputColumnName, string sourceColumnName = null)
Copy link
Member Author

@sfilipi sfilipi Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IHostEnvironment env, Func<DnnImageFeaturizerInput, EstimatorChain> modelFactory, [](start = 43, length = 107)

outputColumnName should come first. #Resolved

/// <param name="gpuDeviceId">Optional GPU device ID to run execution on. Null for CPU.</param>
/// <param name="fallbackToCpu">If GPU error, raise exception or fallback to CPU.</param>
public OnnxTransformer(IHostEnvironment env, string modelFile, string inputColumn, string outputColumn, int? gpuDeviceId = null, bool fallbackToCpu = false)
public OnnxTransformer(IHostEnvironment env, string modelFile, string outputColumnName, string sourceColumnName = null, int? gpuDeviceId = null, bool fallbackToCpu = false)
Copy link
Member Author

@sfilipi sfilipi Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string outputColumnName, [](start = 71, length = 24)

outputColumName should come first #Resolved

/// <param name="gpuDeviceId">Optional GPU device ID to run execution on. Null for CPU.</param>
/// <param name="fallbackToCpu">If GPU error, raise exception or fallback to CPU.</param>
public OnnxTransformer(IHostEnvironment env, string modelFile, string[] inputColumns, string[] outputColumns, int? gpuDeviceId = null, bool fallbackToCpu = false)
public OnnxTransformer(IHostEnvironment env, string modelFile, string[] outputColumnNames, string[] sourceColumnNames, int? gpuDeviceId = null, bool fallbackToCpu = false)
Copy link
Member Author

@sfilipi sfilipi Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string[] outputColumnNames [](start = 70, length = 27)

outputColumnNames should come first #Resolved

/// <param name="gpuDeviceId">Optional GPU device ID to run execution on. Null for CPU.</param>
/// <param name="fallbackToCpu">If GPU error, raise exception or fallback to CPU.</param>
public OnnxScoringEstimator(IHostEnvironment env, string modelFile, string[] inputColumns, string[] outputColumns, int? gpuDeviceId = null, bool fallbackToCpu = false)
: this(env, new OnnxTransformer(env, modelFile, inputColumns, outputColumns, gpuDeviceId, fallbackToCpu))
public OnnxScoringEstimator(IHostEnvironment env, string modelFile, string[] names, string[] sources, int? gpuDeviceId = null, bool fallbackToCpu = false)
Copy link
Member Author

@sfilipi sfilipi Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

names [](start = 85, length = 5)

outputColumnName, and it also should be the first param. #Resolved

public ColumnConcatenatingEstimator (IHostEnvironment env, string outputColumn, params string[] inputColumns)
/// <param name="name">The name of the resulting column.</param>
/// <param name="sources">The columns to concatenate together.</param>
public ColumnConcatenatingEstimator (IHostEnvironment env, string name, params string[] sources)
Copy link
Member Author

@sfilipi sfilipi Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sources [](start = 96, length = 7)

sourceColumnNames #Resolved

{
}

public ColumnCopyingEstimator(IHostEnvironment env, params (string input, string output)[] columns)
public ColumnCopyingEstimator(IHostEnvironment env, params (string name, string sourceColumnName)[] columns)
Copy link
Member Author

@sfilipi sfilipi Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name [](start = 75, length = 4)

outputColumnName #Resolved

public readonly string Input;
public readonly string Output;
public readonly string Name;
public readonly string Source;
Copy link
Member Author

@sfilipi sfilipi Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source [](start = 35, length = 6)

SourceColumnName #Resolved

@@ -385,9 +385,9 @@ public bool Equals(Reconciler other)
{
Contracts.Assert(toOutput.Length == 1);

var pairs = new List<(string[] inputs, string output)>();
var pairs = new List<(string name, string[] sources)>();
Copy link
Member Author

@sfilipi sfilipi Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name [](start = 45, length = 4)

outputColumn #Resolved

@@ -48,13 +48,13 @@ public sealed class Arguments : TransformInputBase
/// The names of the model inputs.
/// </summary>
[Argument(ArgumentType.Multiple | ArgumentType.Required, HelpText = "The names of the model inputs", ShortName = "inputs", SortOrder = 1)]
public string[] InputColumns;
public string[] Sources;
Copy link
Member Author

@sfilipi sfilipi Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sources [](start = 28, length = 7)

Revert #Resolved

/// <param name="sources">The name of the input data columns. Must match model's input names.</param>
/// <param name="names">The output columns to generate. Names must match model specifications. Data types are inferred from model.</param>
public TensorFlowTransformer(IHostEnvironment env, TensorFlowModelInfo tfModelInfo, string[] names ,string[] sources)
: this(env, tfModelInfo.Session, names, sources, TensorFlowUtils.IsSavedModel(env, tfModelInfo.ModelPath) ? tfModelInfo.ModelPath : null, false)
Copy link
Member Author

@sfilipi sfilipi Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outputColumnName #Resolved

@@ -1088,13 +1088,13 @@ public sealed class TensorFlowEstimator : IEstimator<TensorFlowTransformer>
private readonly ColumnType[] _outputTypes;
private TensorFlowTransformer _transformer;

public TensorFlowEstimator(IHostEnvironment env, string modelLocation, string[] inputs, string[] outputs)
: this(env, TensorFlowUtils.LoadTensorFlowModel(env, modelLocation), inputs, outputs)
public TensorFlowEstimator(IHostEnvironment env, string modelLocation, string[] names, string[] sources)
Copy link
Member Author

@sfilipi sfilipi Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

names [](start = 88, length = 5)

outputColumnNames #Resolved

{
}

public TensorFlowEstimator(IHostEnvironment env, TensorFlowModelInfo tensorFlowModel, string[] inputs, string[] outputs)
: this(env, CreateArguments(tensorFlowModel, inputs, outputs), tensorFlowModel)
public TensorFlowEstimator(IHostEnvironment env, TensorFlowModelInfo tensorFlowModel, string[] names, string[] sources)
Copy link
Member Author

@sfilipi sfilipi Jan 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sources [](start = 119, length = 7)

sourceColumnNames #Resolved

@sfilipi
Copy link
Member Author

sfilipi commented Jan 25, 2019

The deltas are suspicious, because i didn't add/remove any lines of code. I am only renaming parameter.

I wander if it is because of libmf being on a different hash. Will revert that.


In reply to: 457427507 [](ancestors = 457427507)

@sfilipi sfilipi changed the title WIP - Input output swap Input output swap Jan 27, 2019
@sfilipi sfilipi self-assigned this Jan 28, 2019
@@ -48,14 +48,14 @@ public static void FeatureSelectionTransform()
// In this example we define a CountFeatureSelectingEstimator, that selects slots in a feature vector that have more non-default
// values than the specified count. This transformation can be used to remove slots with too many missing values.
var countSelectEst = ml.Transforms.FeatureSelection.SelectFeaturesBasedOnCount(
inputColumn: "Features", outputColumn: "FeaturesCountSelect", count: 695);
outputColumnName: "FeaturesCountSelect", sourceColumnName: "Features", count: 695);
Copy link
Contributor

@TomFinley TomFinley Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were we preferring the name "input" compared to "source" for the column names? Or is this exceptional for some reason? #Resolved

Copy link
Member Author

@sfilipi sfilipi Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we agreed to sourceColumnName..


In reply to: 251532233 [](ancestors = 251532233)

Copy link
Contributor

@TomFinley TomFinley Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we agreed to sourceColumnName..

Are you sure @sfilipi ? Perhaps I am misinterpreting your summary of #2064, but it seems fairly clear to me. is this different for some other reasons?

the name of the column resulting from the transformation will be "outputColumnName" and the name of the column/columns to apply the transformation too would be "inputColumnName".
#Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HA! this is how deep you had planted the "source" name in my mind!

changing it to be truthful to the discussion summary.


In reply to: 251678189 [](ancestors = 251678189)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HA! this is how deep you had planted the "source" name in my mind!

😄 Now that I know I have this ability, I solemnly swear to use my special powers only for good.

(My own good, that is.)

@sfilipi
Copy link
Member Author

sfilipi commented Jan 28, 2019

@wschin, @zeahmed would you guys have time to take a look today?
thanks. #Resolved

@TomFinley
Copy link
Contributor

TomFinley commented Jan 28, 2019

Note that I added two tests from a recent commit whenever you sync again @sfilipi , CategoricalOneHotEncodingFromSideData and ValueToKeyFromSideData, that might get auto-merged, but you'll want to pay attention since they had things in the original order. #Resolved

@@ -52,10 +51,10 @@ public void CategoricalHashWorkout()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[](start = 75, length = 2)

nit

string inputColumn,
string outputColumn = null,
string name,
string source = null,
Copy link
Member Author

@sfilipi sfilipi Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix #Resolved

string[] inputColumns,
string outputColumn,
string name,
string[] sources,
Copy link
Member Author

@sfilipi sfilipi Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix #Resolved

@@ -1144,15 +1147,15 @@ internal static class Defaults
/// <paramref name="invertHash"/> specifies the upper bound of the number of distinct input values mapping to a hash that should be retained.
/// <value>0</value> does not retain any input values. <value>-1</value> retains all input values mapping to each hash.</param>
public NgramHashingEstimator(IHostEnvironment env,
(string[] inputs, string output)[] columns,
(string name, string[] sources)[] columns,
Copy link
Member Author

@sfilipi sfilipi Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(string name, string[] sourc [](start = 11, length = 29)

fix #Resolved

}

private static IDataView UnaliasIfNeeded(IHostEnvironment env, IDataView input, KeyValuePair<string, string>[] hiddenNames)
{
if (Utils.Size(hiddenNames) == 0)
return input;

input = new ColumnCopyingTransformer(env, hiddenNames.Select(x => (Input: x.Key, Output: x.Value)).ToArray()).Transform(input);
input = new ColumnCopyingTransformer(env, hiddenNames.Select(x => (Name: x.Key, Source: x.Value)).ToArray()).Transform(input);
Copy link
Member Author

@sfilipi sfilipi Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source [](start = 92, length = 6)

fix #Resolved

@@ -178,13 +178,13 @@ private sealed class Mapper : OneToOneMapperBase
private sealed class ColInfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outputColumnName [](start = 38, length = 16)

name?

…classes, to be "sourceColumnName", as suggested.

Changing the "name" parameter to "outputColumnName" in the:
- estimator extension APIs
- estimator ctors
- column pairs expressed through tuples, because in context it reads better than name.

Note: in the columnInfo classes i left it to "name" because "outputColumnName" makes no sense.

2 - Nit on standartizing the XML comments.
3 - Arranging the order of the parameters to be: outputColumnName, required parameters, nullable sourceColumnName.
reverting update to libmf
fix typo in missingValueIndicatorTransformer
a few more name, source pairs changed
/// <param name="inputColumns">The input columns</param>
/// <param name="outputColumn">The output column</param>
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnNames"/>.</param>
/// <param name="inputColumnNames">Name of the column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>
Copy link
Contributor

@zeahmed zeahmed Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

column [](start = 55, length = 6)

Columns?? #Resolved

/// <param name="inputColumns">The columns containing text to compute bag of word vector.</param>
/// <param name="outputColumn">The column containing output tokens.</param>
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnNames"/>.</param>
/// <param name="inputColumnNames">Name of the column to transform.</param>
Copy link
Contributor

@zeahmed zeahmed Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

column [](start = 55, length = 6)

Columns?? #Resolved

/// <param name="inputColumns">The columns containing text to compute bag of word vector.</param>
/// <param name="outputColumn">The column containing output tokens.</param>
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnNames"/>.</param>
/// <param name="inputColumnNames">Name of the column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>
Copy link
Contributor

@zeahmed zeahmed Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

column [](start = 55, length = 6)

Columns?? #Resolved

/// <param name="inputColumns">Name of input columns containing tokenized text.</param>
/// <param name="outputColumn">Name of output column, will contain the ngram vector.</param>
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnNames"/>.</param>
/// <param name="inputColumnNames">Name of the column to transform.</param>
Copy link
Contributor

@zeahmed zeahmed Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

column [](start = 55, length = 6)

Columns?? #Resolved

Copy link
Contributor

@zeahmed zeahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good. I have left a few comments.

@sfilipi sfilipi merged commit e383091 into dotnet:master Jan 29, 2019
@sfilipi sfilipi deleted the inputOutputSwap branch January 29, 2019 21:27
@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants