-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Destination Postgres: fix \u0000(NULL) value processing #5336
Changes from all commits
379d318
0932f15
3657e61
13a944a
b0f13a4
c75dacd
792d89b
9cf43bb
3771a3b
c2b18a2
3edefb7
2175999
d374cbc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
/* | ||
* MIT License | ||
* | ||
* Copyright (c) 2020 Airbyte | ||
* | ||
* Permission is hereby granted, free of charge, to any person obtaining a copy | ||
* of this software and associated documentation files (the "Software"), to deal | ||
* in the Software without restriction, including without limitation the rights | ||
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
* copies of the Software, and to permit persons to whom the Software is | ||
* furnished to do so, subject to the following conditions: | ||
* | ||
* The above copyright notice and this permission notice shall be included in all | ||
* copies or substantial portions of the Software. | ||
* | ||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
* SOFTWARE. | ||
*/ | ||
|
||
package io.airbyte.integrations.destination.buffered_stream_consumer; | ||
|
||
import io.airbyte.protocol.models.AirbyteMessage; | ||
|
||
/** | ||
* Allows specifying transformation logic from Airbyte Json to String. | ||
*/ | ||
public interface StreamDateFormatter { | ||
|
||
String getFormattedDate(AirbyteMessage airbyteMessage); | ||
|
||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
/* | ||
* MIT License | ||
* | ||
* Copyright (c) 2020 Airbyte | ||
* | ||
* Permission is hereby granted, free of charge, to any person obtaining a copy | ||
* of this software and associated documentation files (the "Software"), to deal | ||
* in the Software without restriction, including without limitation the rights | ||
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
* copies of the Software, and to permit persons to whom the Software is | ||
* furnished to do so, subject to the following conditions: | ||
* | ||
* The above copyright notice and this permission notice shall be included in all | ||
* copies or substantial portions of the Software. | ||
* | ||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
* SOFTWARE. | ||
*/ | ||
|
||
package io.airbyte.integrations.destination.jdbc; | ||
|
||
import com.fasterxml.jackson.databind.JsonNode; | ||
import com.fasterxml.jackson.databind.node.ObjectNode; | ||
import java.util.function.Function; | ||
import java.util.function.Predicate; | ||
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
||
public class DataAdapter { | ||
|
||
private static final Logger LOGGER = LoggerFactory.getLogger(DataAdapter.class); | ||
|
||
private final Predicate<JsonNode> filterValueNode; | ||
private final Function<JsonNode, JsonNode> valueNodeAdapter; | ||
|
||
/** | ||
* Data adapter allows applying destination data rules. For example, Postgres destination can't | ||
* process text value with \u0000 unicode. You can describe filter condition for a value node and | ||
* function which adapts filtered value nodes. | ||
* | ||
* @param filterValueNode - filter condition which decide which value node should be adapted | ||
* @param valueNodeAdapter - transformation function which returns adapted value node | ||
*/ | ||
public DataAdapter( | ||
Predicate<JsonNode> filterValueNode, | ||
Function<JsonNode, JsonNode> valueNodeAdapter) { | ||
this.filterValueNode = filterValueNode; | ||
this.valueNodeAdapter = valueNodeAdapter; | ||
} | ||
|
||
public void adapt(JsonNode messageData) { | ||
if (messageData != null) { | ||
adaptAllValueNodes(messageData); | ||
} | ||
} | ||
|
||
private void adaptAllValueNodes(JsonNode rootNode) { | ||
adaptValueNodes(null, rootNode, null); | ||
} | ||
|
||
/** | ||
* The method inspects json node. In case, it's a value node we check the node by CheckFunction and | ||
* apply ValueNodeAdapter. Filtered nodes will be updated by adapted version. If element is an array | ||
* or an object, this we run the method recursively for them. | ||
* | ||
* @param fieldName Name of a json node | ||
* @param node Json node | ||
* @param parentNode Parent json node | ||
*/ | ||
private void adaptValueNodes(String fieldName, JsonNode node, JsonNode parentNode) { | ||
if (node.isValueNode() && filterValueNode.test(node)) { | ||
if (fieldName != null) { | ||
var adaptedNode = valueNodeAdapter.apply(node); | ||
((ObjectNode) parentNode).set(fieldName, adaptedNode); | ||
} else | ||
throw new RuntimeException("Unexpected value node without fieldName. Node: " + node); | ||
} else if (node.isArray()) { | ||
node.elements().forEachRemaining(arrayNode -> adaptValueNodes(null, arrayNode, node)); | ||
} else { | ||
node.fields().forEachRemaining(stringJsonNodeEntry -> adaptValueNodes(stringJsonNodeEntry.getKey(), stringJsonNodeEntry.getValue(), node)); | ||
} | ||
} | ||
|
||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -113,12 +113,11 @@ private static RecordWriter recordWriterFunction(Map<AirbyteStreamNameNamespaceP | |
return (AirbyteStreamNameNamespacePair pair, List<AirbyteRecordMessage> records) -> { | ||
for (AirbyteRecordMessage recordMessage : records) { | ||
var id = UUID.randomUUID(); | ||
var data = Jsons.serialize(recordMessage.getData()); | ||
if (sqlOperations.isValidData(data)) { | ||
if (sqlOperations.isValidData(recordMessage.getData())) { | ||
// TODO Truncate json data instead of throwing whole record away? | ||
// or should we upload it into a special rejected record folder in s3 instead? | ||
var emittedAt = Timestamp.from(Instant.ofEpochMilli(recordMessage.getEmittedAt())); | ||
pairToCopier.get(pair).write(id, data, emittedAt); | ||
pairToCopier.get(pair).write(id, Jsons.serialize(recordMessage.getData()), emittedAt); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a problem, we are not going to do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. currently, it's designed in a way that we don't have many options. Original implementation does additional serialization for all destinations. Here we have one additional serialization for one destination and only for the Copy flow. So, we already do a significant improvement here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool! Please create a follow up issue to resolve this |
||
} else { | ||
pairToIgnoredRecordCount.put(pair, pairToIgnoredRecordCount.getOrDefault(pair, 0L) + 1L); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am not sure I follow the logic of this method. When can
node.isValueNode()
be true? Also I dont like the fact thatfieldName
can be null and we dont have a null check.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if an element contains a value - it's a value node. An element also might be an array or object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DoNotPanicUA can you put a comment on this method explaining how it works so that its clear for anyone who is reading this code for the first time