-
Notifications
You must be signed in to change notification settings - Fork 842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: fix cognitive service errors #1176
Changes from 6 commits
946d73d
081797f
d856987
280122f
e75cfba
e667728
8bb1f52
b0d3be6
c124660
4a7553e
7dd0e9f
af052b1
b330bb3
50a574b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -38,6 +38,8 @@ trait HasTextInput extends HasServiceParams { | |
|
||
def setText(v: Seq[String]): this.type = setScalarParam(text, v) | ||
|
||
def setText(v: String): this.type = setScalarParam(text, Seq(v)) | ||
|
||
def getTextCol: String = getVectorParam(text) | ||
|
||
def setTextCol(v: String): this.type = setVectorParam(text, v) | ||
|
@@ -76,9 +78,15 @@ trait TextAsOnlyEntity extends HasTextInput with HasCognitiveServiceInput { | |
|
||
override protected def prepareEntity: Row => Option[AbstractHttpEntity] = { | ||
r => | ||
Some(new StringEntity( | ||
getValueOpt(r, text) | ||
.map(x => x.map(y => Map("Text" -> y))).toJson.compactPrint, ContentType.APPLICATION_JSON)) | ||
val textVal = getValueOpt(r, text) | ||
if (textVal.nonEmpty) { | ||
val content = textVal.get.getClass.getName match { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Chris' second issue was caused by having nulls within a batch. Perhaps we should add that case as a test and ensure this new func can handle. I think in my TA batching logic it had to get pretty hairy to handle that unfortunately. Perhaps we can use similar logic. Also we might want to consider a better solution that just batching elements into single arrays. But we can push that to a later PR as that is current behavior in TA There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For nulls in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the error comes from when there are nulls of text which would ordinarily be skipped but they are in a batch so the mess up the whole batch |
||
case "java.lang.String" => Seq(Map("Text" -> textVal.get.asInstanceOf[String])).toJson.compactPrint | ||
case _ => textVal.get.map(x => Map("Text" -> x)).toJson.compactPrint | ||
} | ||
Some(new StringEntity(content, ContentType.APPLICATION_JSON)) | ||
} | ||
else Some(new StringEntity(Map("Text" -> "").toJson.compactPrint, ContentType.APPLICATION_JSON)) | ||
} | ||
} | ||
|
||
|
@@ -162,6 +170,38 @@ class Translate(override val uid: String) extends TextTranslatorBase(uid) | |
|
||
def urlPath: String = "translate" | ||
|
||
override protected def prepareUrl: Row => String = { | ||
val urlParams: Array[ServiceParam[Any]] = | ||
getUrlParams.asInstanceOf[Array[ServiceParam[Any]]]; | ||
|
||
// This semicolon is needed to avoid argument confusion | ||
def replaceName(s: String): String = { | ||
if (s == "fromLanguage") { | ||
"from" | ||
} else if (s == "toLanguage") { | ||
"to" | ||
} else { | ||
s | ||
} | ||
} | ||
{ row: Row => | ||
val base = getUrl + "?api-version=3.0" | ||
val appended = if (!urlParams.isEmpty) { | ||
"&" + URLEncodingUtils.format(urlParams.flatMap(p => | ||
getValueOpt(row, p).map { | ||
v => | ||
if (p.name == "toLanguage" & v.getClass.getName == "java.lang.String") | ||
replaceName(p.name) -> p.toValueString(Seq(v)) | ||
else replaceName(p.name) -> p.toValueString(v) | ||
} | ||
).toMap) | ||
} else { | ||
"" | ||
} | ||
base + appended | ||
} | ||
} | ||
|
||
val toLanguage = new ServiceParam[Seq[String]](this, "toLanguage", "Specifies the language of the output" + | ||
" text. The target language must be one of the supported languages included in the translation scope." + | ||
" For example, use to=de to translate to German. It's possible to translate to multiple languages simultaneously" + | ||
|
@@ -171,6 +211,8 @@ class Translate(override val uid: String) extends TextTranslatorBase(uid) | |
|
||
def setToLanguage(v: Seq[String]): this.type = setScalarParam(toLanguage, v) | ||
|
||
def setToLanguage(v: String): this.type = setScalarParam(toLanguage, Seq(v)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice! We might need to add this to python methods too There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess this will automatically work? We're calling There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Youre right! |
||
|
||
def setToLanguageCol(v: String): this.type = setVectorParam(toLanguage, v) | ||
|
||
val fromLanguage = new ServiceParam[String](this, "fromLanguage", "Specifies the language of the input" + | ||
|
@@ -186,8 +228,8 @@ class Translate(override val uid: String) extends TextTranslatorBase(uid) | |
val textType = new ServiceParam[String](this, "textType", "Defines whether the text being" + | ||
" translated is plain text or HTML text. Any HTML needs to be a well-formed, complete element. Possible values" + | ||
" are: plain (default) or html.", { | ||
case Left(_) => true | ||
case Right(s) => Set("plain", "html")(s) | ||
case Left(s) => Set("plain", "html")(s) | ||
case Right(_) => true | ||
}, isURLParam = true) | ||
|
||
def setTextType(v: String): this.type = setScalarParam(textType, v) | ||
|
@@ -206,8 +248,8 @@ class Translate(override val uid: String) extends TextTranslatorBase(uid) | |
val profanityAction = new ServiceParam[String](this, "profanityAction", "Specifies how" + | ||
" profanities should be treated in translations. Possible values are: NoAction (default), Marked or Deleted. ", | ||
{ | ||
case Left(_) => true | ||
case Right(s) => Set("NoAction", "Marked", "Deleted")(s) | ||
case Left(s) => Set("NoAction", "Marked", "Deleted")(s) | ||
case Right(_) => true | ||
}, isURLParam = true) | ||
|
||
def setProfanityAction(v: String): this.type = setScalarParam(profanityAction, v) | ||
|
@@ -216,8 +258,8 @@ class Translate(override val uid: String) extends TextTranslatorBase(uid) | |
|
||
val profanityMarker = new ServiceParam[String](this, "profanityMarker", "Specifies how" + | ||
" profanities should be marked in translations. Possible values are: Asterisk (default) or Tag.", { | ||
case Left(_) => true | ||
case Right(s) => Set("Asterisk", "Tag")(s) | ||
case Left(s) => Set("Asterisk", "Tag")(s) | ||
case Right(_) => true | ||
}, isURLParam = true) | ||
|
||
def setProfanityMarker(v: String): this.type = setScalarParam(profanityMarker, v) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will never throw false if the error is in here. What I'm thinking is that we look at the required params in the transformSchema function to ensure all required params are set with either a value or a column. We can then call transformSchema before we transform to add validation there too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Should be an IllegalArgumentException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In CognitiveServiceBase, transformSchema and transform both call
getInternalTransformer
first, and in this function we createnew SimpleHTTPTransformer
andsetInputParser(getInternalInputParser(schema))
. And ingetInternalInputParser
we callinputFunc
, which callsshouldSkip
, and that's why I add it here. I know it will never throw false if the error happens here, so do we want to silent the error to catch it later and return null?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have the logic in getInternalTransformer. I think the logic should live in the "control-plane" which executes on the head node rather than code inside a mapPartitions like shouldSkip