-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update dumper to allow not force format #19
Comments
@akariv
def test_force_temporal_format():
import datetime
from dataflows import load, update_resource, dump_to_path
# Dump
Flow(
load('data/temporal.csv', name='temporal'),
update_resource(['temporal'], **{
'schema': {
'fields': [
{'name': 'date', 'type': 'date', 'format': '%Y-%m-%d', 'outputFormat': '%d/%m/%y'},
{'name': 'event', 'type': 'string'},
]
}
}),
dump_to_path('data/force_temporal_format', force_temporal_format=False)
).process()
# Load
flow = Flow(
load('data/force_temporal_format/datapackage.json')
)
data, dp, stats = flow.results()
# Assert
assert dp.descriptor == {
'profile': 'data-package',
'resources': [{
'dialect': {
'caseSensitiveHeader': False,
'delimiter': ',',
'doubleQuote': True,
'header': True,
'lineTerminator': '\r\n',
'quoteChar': '"',
'skipInitialSpace': False
},
'encoding': 'utf-8',
'format': 'csv',
'name': 'temporal',
'path': 'temporal.csv',
'profile': 'tabular-data-resource',
'schema': {
'fields': [
{'format': '%d/%m/%y', 'name': 'date', 'type': 'date'},
{'format': 'default', 'name': 'event', 'type': 'string'}
],
'missingValues': ['']
}
}]
}
assert data == [[
{'date': datetime.date(2015, 1, 2), 'event': 'start'},
{'date': datetime.date(2016, 6, 25), 'event': 'finish'}
]] |
I think we should make a distinction between dataflows and datapackage-pipelines here. On datapackage-pipelines, with the yaml restrictions, we could have a few 'preset' configurations - e.g. by using the force_temporal_format parameter. wdyt? |
It makes sense 👍 @cschloer is on vacation now but when he's back we can take a look at it from the "api client" perspective. |
@cschloer |
Hey, sorry for the late reply! From my understanding of what you said, you could add a flag to dump_to_path that uses the format specified in The only place where we currently interact with the |
Thanks @cschloer So let me verify how I see it: (1) we're going to add a config option to the DPP's - run: dump.to_path
parameters:
resources: [duplicate]
out-path: 'output'
pretty-descriptor: true
datetime-format-field: outputFormat (2) Them we have two options: (3) It will resolve #19 and #22 altogether cc @akariv |
Ok, thanks, I'll focus on finishing it then |
It's done in |
Currently the dump processors force a particular format for datetime values (for example). We need the ability to use the
format
value within the schema (saved in the datapackage.json) as the format that is used to dump datetime values. The suggestion was to just have a flag in dump_to_* that, when set to true, uses whatever is in the schema rather than a forced format.This is higher priority than the rest as we currently keep datetime fields as strings because their format is forced.
The text was updated successfully, but these errors were encountered: