-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Input output swap #2239
Input output swap #2239
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2239 +/- ##
==========================================
+ Coverage 65.43% 71.13% +5.7%
==========================================
Files 766 779 +13
Lines 133371 140271 +6900
Branches 15669 16046 +377
==========================================
+ Hits 87269 99785 +12516
+ Misses 41889 36034 -5855
- Partials 4213 4452 +239
|
/// - Each such metadata column is itself compatible with the input metadata column. | ||
/// </summary> | ||
[BestFriend] | ||
internal bool IsCompatibleWith(Column inputColumn) | ||
internal bool IsCompatibleWith(Column source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Column source) [](start = 43, length = 14)
unnecessary change.. #Resolved
@@ -385,9 +385,9 @@ private static void GetColTypesAndIndex(IHostEnvironment env, IDataView inputDat | |||
|
|||
for (int i = 0; i < columns.Length; i++) | |||
{ | |||
var col = inputSchema.GetColumnOrNull(columns[i].Input); | |||
var col = inputSchema.GetColumnOrNull(columns[i].Source); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Source [](start = 65, length = 6)
SourceColumnName #Resolved
public readonly string Input; | ||
public readonly string Output; | ||
public readonly string Name; | ||
public readonly string Source; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Source [](start = 35, length = 6)
SourceColumnName #Resolved
string output = null, ImagePixelExtractorTransformer.ColorBits colors = ImagePixelExtractorTransformer.Defaults.Colors, | ||
public ImagePixelExtractingEstimator(IHostEnvironment env, | ||
string name, | ||
string source = null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source [](start = 19, length = 6)
sourceColumnName #Resolved
public ImagePixelExtractingEstimator(IHostEnvironment env, string input, | ||
string output = null, ImagePixelExtractorTransformer.ColorBits colors = ImagePixelExtractorTransformer.Defaults.Colors, | ||
public ImagePixelExtractingEstimator(IHostEnvironment env, | ||
string name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name [](start = 19, length = 4)
outputColumnName #Resolved
public readonly string Input; | ||
public readonly string Output; | ||
public readonly string Name; | ||
public readonly string Source; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Source [](start = 35, length = 6)
SourceColumName #Resolved
public DnnImageFeaturizerEstimator(IHostEnvironment env, Func<DnnImageFeaturizerInput, EstimatorChain<ColumnCopyingTransformer>> modelFactory, string inputColumn, string outputColumn) | ||
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="sourceColumnName"/>.</param> | ||
/// <param name="sourceColumnName">Name of column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param> | ||
public DnnImageFeaturizerEstimator(IHostEnvironment env, Func<DnnImageFeaturizerInput, EstimatorChain<ColumnCopyingTransformer>> modelFactory, string outputColumnName, string sourceColumnName = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IHostEnvironment env, Func<DnnImageFeaturizerInput, EstimatorChain> modelFactory, [](start = 43, length = 107)
outputColumnName should come first. #Resolved
/// <param name="gpuDeviceId">Optional GPU device ID to run execution on. Null for CPU.</param> | ||
/// <param name="fallbackToCpu">If GPU error, raise exception or fallback to CPU.</param> | ||
public OnnxTransformer(IHostEnvironment env, string modelFile, string inputColumn, string outputColumn, int? gpuDeviceId = null, bool fallbackToCpu = false) | ||
public OnnxTransformer(IHostEnvironment env, string modelFile, string outputColumnName, string sourceColumnName = null, int? gpuDeviceId = null, bool fallbackToCpu = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string outputColumnName, [](start = 71, length = 24)
outputColumName should come first #Resolved
/// <param name="gpuDeviceId">Optional GPU device ID to run execution on. Null for CPU.</param> | ||
/// <param name="fallbackToCpu">If GPU error, raise exception or fallback to CPU.</param> | ||
public OnnxTransformer(IHostEnvironment env, string modelFile, string[] inputColumns, string[] outputColumns, int? gpuDeviceId = null, bool fallbackToCpu = false) | ||
public OnnxTransformer(IHostEnvironment env, string modelFile, string[] outputColumnNames, string[] sourceColumnNames, int? gpuDeviceId = null, bool fallbackToCpu = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string[] outputColumnNames [](start = 70, length = 27)
outputColumnNames should come first #Resolved
/// <param name="gpuDeviceId">Optional GPU device ID to run execution on. Null for CPU.</param> | ||
/// <param name="fallbackToCpu">If GPU error, raise exception or fallback to CPU.</param> | ||
public OnnxScoringEstimator(IHostEnvironment env, string modelFile, string[] inputColumns, string[] outputColumns, int? gpuDeviceId = null, bool fallbackToCpu = false) | ||
: this(env, new OnnxTransformer(env, modelFile, inputColumns, outputColumns, gpuDeviceId, fallbackToCpu)) | ||
public OnnxScoringEstimator(IHostEnvironment env, string modelFile, string[] names, string[] sources, int? gpuDeviceId = null, bool fallbackToCpu = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
names [](start = 85, length = 5)
outputColumnName, and it also should be the first param. #Resolved
public ColumnConcatenatingEstimator (IHostEnvironment env, string outputColumn, params string[] inputColumns) | ||
/// <param name="name">The name of the resulting column.</param> | ||
/// <param name="sources">The columns to concatenate together.</param> | ||
public ColumnConcatenatingEstimator (IHostEnvironment env, string name, params string[] sources) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sources [](start = 96, length = 7)
sourceColumnNames #Resolved
{ | ||
} | ||
|
||
public ColumnCopyingEstimator(IHostEnvironment env, params (string input, string output)[] columns) | ||
public ColumnCopyingEstimator(IHostEnvironment env, params (string name, string sourceColumnName)[] columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name [](start = 75, length = 4)
outputColumnName #Resolved
public readonly string Input; | ||
public readonly string Output; | ||
public readonly string Name; | ||
public readonly string Source; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Source [](start = 35, length = 6)
SourceColumnName #Resolved
@@ -385,9 +385,9 @@ public bool Equals(Reconciler other) | |||
{ | |||
Contracts.Assert(toOutput.Length == 1); | |||
|
|||
var pairs = new List<(string[] inputs, string output)>(); | |||
var pairs = new List<(string name, string[] sources)>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name [](start = 45, length = 4)
outputColumn #Resolved
@@ -48,13 +48,13 @@ public sealed class Arguments : TransformInputBase | |||
/// The names of the model inputs. | |||
/// </summary> | |||
[Argument(ArgumentType.Multiple | ArgumentType.Required, HelpText = "The names of the model inputs", ShortName = "inputs", SortOrder = 1)] | |||
public string[] InputColumns; | |||
public string[] Sources; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sources [](start = 28, length = 7)
Revert #Resolved
/// <param name="sources">The name of the input data columns. Must match model's input names.</param> | ||
/// <param name="names">The output columns to generate. Names must match model specifications. Data types are inferred from model.</param> | ||
public TensorFlowTransformer(IHostEnvironment env, TensorFlowModelInfo tfModelInfo, string[] names ,string[] sources) | ||
: this(env, tfModelInfo.Session, names, sources, TensorFlowUtils.IsSavedModel(env, tfModelInfo.ModelPath) ? tfModelInfo.ModelPath : null, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputColumnName #Resolved
@@ -1088,13 +1088,13 @@ public sealed class TensorFlowEstimator : IEstimator<TensorFlowTransformer> | |||
private readonly ColumnType[] _outputTypes; | |||
private TensorFlowTransformer _transformer; | |||
|
|||
public TensorFlowEstimator(IHostEnvironment env, string modelLocation, string[] inputs, string[] outputs) | |||
: this(env, TensorFlowUtils.LoadTensorFlowModel(env, modelLocation), inputs, outputs) | |||
public TensorFlowEstimator(IHostEnvironment env, string modelLocation, string[] names, string[] sources) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
names [](start = 88, length = 5)
outputColumnNames #Resolved
{ | ||
} | ||
|
||
public TensorFlowEstimator(IHostEnvironment env, TensorFlowModelInfo tensorFlowModel, string[] inputs, string[] outputs) | ||
: this(env, CreateArguments(tensorFlowModel, inputs, outputs), tensorFlowModel) | ||
public TensorFlowEstimator(IHostEnvironment env, TensorFlowModelInfo tensorFlowModel, string[] names, string[] sources) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sources [](start = 119, length = 7)
sourceColumnNames #Resolved
The deltas are suspicious, because i didn't add/remove any lines of code. I am only renaming parameter. I wander if it is because of libmf being on a different hash. Will revert that. In reply to: 457427507 [](ancestors = 457427507) |
@@ -48,14 +48,14 @@ public static void FeatureSelectionTransform() | |||
// In this example we define a CountFeatureSelectingEstimator, that selects slots in a feature vector that have more non-default | |||
// values than the specified count. This transformation can be used to remove slots with too many missing values. | |||
var countSelectEst = ml.Transforms.FeatureSelection.SelectFeaturesBasedOnCount( | |||
inputColumn: "Features", outputColumn: "FeaturesCountSelect", count: 695); | |||
outputColumnName: "FeaturesCountSelect", sourceColumnName: "Features", count: 695); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were we preferring the name "input" compared to "source" for the column names? Or is this exceptional for some reason? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we agreed to sourceColumnName..
Are you sure @sfilipi ? Perhaps I am misinterpreting your summary of #2064, but it seems fairly clear to me. is this different for some other reasons?
the name of the column resulting from the transformation will be "outputColumnName" and the name of the column/columns to apply the transformation too would be "inputColumnName".
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HA! this is how deep you had planted the "source" name in my mind!
changing it to be truthful to the discussion summary.
In reply to: 251678189 [](ancestors = 251678189)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HA! this is how deep you had planted the "source" name in my mind!
😄 Now that I know I have this ability, I solemnly swear to use my special powers only for good.
(My own good, that is.)
Note that I added two tests from a recent commit whenever you sync again @sfilipi , CategoricalOneHotEncodingFromSideData and ValueToKeyFromSideData, that might get auto-merged, but you'll want to pay attention since they had things in the original order. #Resolved |
@@ -52,10 +51,10 @@ public void CategoricalHashWorkout() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 75, length = 2)
nit
string inputColumn, | ||
string outputColumn = null, | ||
string name, | ||
string source = null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix #Resolved
string[] inputColumns, | ||
string outputColumn, | ||
string name, | ||
string[] sources, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix #Resolved
@@ -1144,15 +1147,15 @@ internal static class Defaults | |||
/// <paramref name="invertHash"/> specifies the upper bound of the number of distinct input values mapping to a hash that should be retained. | |||
/// <value>0</value> does not retain any input values. <value>-1</value> retains all input values mapping to each hash.</param> | |||
public NgramHashingEstimator(IHostEnvironment env, | |||
(string[] inputs, string output)[] columns, | |||
(string name, string[] sources)[] columns, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(string name, string[] sourc [](start = 11, length = 29)
fix #Resolved
} | ||
|
||
private static IDataView UnaliasIfNeeded(IHostEnvironment env, IDataView input, KeyValuePair<string, string>[] hiddenNames) | ||
{ | ||
if (Utils.Size(hiddenNames) == 0) | ||
return input; | ||
|
||
input = new ColumnCopyingTransformer(env, hiddenNames.Select(x => (Input: x.Key, Output: x.Value)).ToArray()).Transform(input); | ||
input = new ColumnCopyingTransformer(env, hiddenNames.Select(x => (Name: x.Key, Source: x.Value)).ToArray()).Transform(input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Source [](start = 92, length = 6)
fix #Resolved
@@ -178,13 +178,13 @@ private sealed class Mapper : OneToOneMapperBase | |||
private sealed class ColInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputColumnName [](start = 38, length = 16)
name?
…classes, to be "sourceColumnName", as suggested. Changing the "name" parameter to "outputColumnName" in the: - estimator extension APIs - estimator ctors - column pairs expressed through tuples, because in context it reads better than name. Note: in the columnInfo classes i left it to "name" because "outputColumnName" makes no sense. 2 - Nit on standartizing the XML comments. 3 - Arranging the order of the parameters to be: outputColumnName, required parameters, nullable sourceColumnName.
reverting update to libmf fix typo in missingValueIndicatorTransformer
a few more name, source pairs changed
d219291
to
8006282
Compare
/// <param name="inputColumns">The input columns</param> | ||
/// <param name="outputColumn">The output column</param> | ||
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnNames"/>.</param> | ||
/// <param name="inputColumnNames">Name of the column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
column [](start = 55, length = 6)
Columns?? #Resolved
/// <param name="inputColumns">The columns containing text to compute bag of word vector.</param> | ||
/// <param name="outputColumn">The column containing output tokens.</param> | ||
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnNames"/>.</param> | ||
/// <param name="inputColumnNames">Name of the column to transform.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
column [](start = 55, length = 6)
Columns?? #Resolved
/// <param name="inputColumns">The columns containing text to compute bag of word vector.</param> | ||
/// <param name="outputColumn">The column containing output tokens.</param> | ||
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnNames"/>.</param> | ||
/// <param name="inputColumnNames">Name of the column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
column [](start = 55, length = 6)
Columns?? #Resolved
/// <param name="inputColumns">Name of input columns containing tokenized text.</param> | ||
/// <param name="outputColumn">Name of output column, will contain the ngram vector.</param> | ||
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnNames"/>.</param> | ||
/// <param name="inputColumnNames">Name of the column to transform.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
column [](start = 55, length = 6)
Columns?? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks good. I have left a few comments.
Fixes issue #2064
1- Change the order of the parameters from inputColumn, outputColumn to outputColumnName, sourceColumnName.
2 - Changing the "source" parameter name and field in the Columninfo classes, to be "sourceColumnName", as suggested.
3- Changing the "name" parameter to "outputColumnName" in the:
Note: in the columnInfo classes i left it to "name" because "outputColumnName" makes no sense.
4 - Nit on standardizing the XML comments.
5 - Arranging the order of the parameters to be: outputColumnName, required parameters, nullable sourceColumnName.
6 - fixed some bugs i bumped into.