-
-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeseries object for Astropy (APE9) #12
Conversation
Pinging myself so I can follow development and discussion. |
I am very interested in this. Thanks for putting this together @Cadair. |
and thanks to everyone that participate in the discussion at PyAstro16 :P |
Abstract | ||
-------- | ||
|
||
The goal of a timeseries object in astropy is to provide a core set of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the goal of sharing. that's a general astropy goal.
I think it would be helpful to list a number of examples of analyses that we would want to perform on these. |
I am interested in this feature, and think it would be a great addition. GWpy would likely move to using/inheriting this object, rather than the current |
I am in support of this APE! And I'm available to help with testing. |
I like it and would use it. |
I love it! |
I would like for @StingraySoftware to move towards using a TimeSeries object (or at least basing our |
APE9.rst
Outdated
|
||
This APE proposes that we place the following restrictions on binned data: | ||
|
||
#. Contiguious bins, i.e. the start of the i+1th bin is the end of the ith bin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we decided this was not going to be necessary.
Many different areas of astrophysics have to deal with 1D timeseries data, i.e. | ||
either sampling a continuous variable at fixed times or counting some events | ||
binned into time windows. These types of applications require some basic | ||
functionality: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many of these features already exist in QTable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some, but all would need to be customised for timeseries data (i.e. to preserve the index correctly.)
Mainly examples on the difference in construction between binned and sampled data.
APE9.rst
Outdated
functionality: | ||
|
||
#. Extending timeseries with extra rows | ||
#. Concatenating multiple timeseries objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-word to clarify that concatenation is extra columns.
|
||
Initialize a ``SampledTimeSeries`` with a time and a data column:: | ||
|
||
ts = SampledTimeSeries(time=['2016-03-22T12:30:31', '2016-03-22T12:30:32', '2016-03-22T12:30:40'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be able to construct from a start time and a list of "deltas"?
API. Some examples of constructing these objects is given below. | ||
|
||
|
||
Initialize a ``SampledTimeSeries`` with a time and a data column:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section should also demonstrate constructing these objects with Time and TimeDelta so that it's clear that's a thing.
APE9.rst
Outdated
#. Slicing / selecting time ranges (indexing) | ||
#. Re-binning and re-sampling timeseries | ||
#. Interpolating to different time stamps. | ||
#. Masking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and missing value (NaN)
I'm late to this, but have a couple things to add:
|
Good morning, I just read through this thread and find the discussion very useful and important. I have implemented, over the past 12 years, a bunch of packages for time domain analysis in Java, and have had to make all the choices that are being discussed above in terms of how to represent time-related data. Because my background is high energy astrophysics, it was clear from the very start that I needed to have EventList and TimeSeries as different objects. As it was mentioned above, EvenList can have other characteristics in addition to arrivalTime such as x, y, energy, and a quality flag, for example. In the TimeSeries this is not the case. But there can be fundamentally different kinds of time series, like what I've called CodedMaskTimeSeries that carries a lot more information than just intensity as a function of time, because the time series is actually reconstructed from images, and the errors depend on the angle at which the source was in the field of view, and on and on. So, I have a package for event lists and a package of time series, and in the time series package, I have an interface ITimeSeries, and abstract class AbstractTimeSeries, and two concrete classes TimeSeries, and CodedMaskTimeSeries going from the general to more specific, each class inheriting from the previous and each implementing the interface. I also have an interface ITimeSeriesFileReader with specific file readers that implement it for different kinds of input files (fits, ascii, qdp), such that the TimeSeriesReader just loops through to try each implemented type when given an input file. Similarly, there are several TimeSeriesFileWriters. The similarity between an EventList and TimeSeries in terms of its temporal information carrying capacity is used in the factory classes that can handle time data: the TimeSeriesMaker and PeriodogramMaker. These have methods to deal with event lists and with time series as inputs. For the periodograms, it is again with an AbstractPeriodogram that is used as the base, and then different types that extend it (FFTPeriodogram, RayleighPeriodogram, LombScarglePeriodogram, ZPeriodogram, ModifiedRayleighPeriodgram, LikelihoodPeriodogram, as well as AggregatePeriodogram to combine periodograms and AveragePeriodogram to express the result, which is the only periodogram that has uncertainties on the powers, because they are not used in any other one since the power a each frequency in a periodogram is a point estimate. Lastly, I have been recently re-writing the resampling of time series in order to make it more manageable, because dealing with resampling of a binned time series in a general way is very complicated. My time series are allowed to have both irregular sized bins and irregular spaces in between them, and everything is written to handle that. So, I have been working on decomposing the time series into a collection of intensity bins made up of (composition) an Intensity and a Bin, each of which carries specific information that defines everything we need to know. In this way, we can handle each IntensityBin one at a time. This is ongoing and I'm still sorting out difficulties like how to deal with different kinds of intensities (absolute quantity or density, for example) in a general manner. But in conclusion, I think that firstly, having EventList and TimeSeries objects makes sense, because an EventList is not really a kind of TimeSeries even though its arrival times can be used to make one. And secondly, it is very important to look down the line and consider all the kinds of things that will be done with the EventList and TimeSeries (e.g. periodograms or resampling) so that it doesn't become too difficult to do them when we get there. Please let me know if you want more details about anything addressed in this comment. I'd be happy to help further. Unfortunately, I haven't yet started coding in python. Maybe it's the time to start ;) I also need to update my repos on GitHub which I promise to do soon. |
@gbelanger - thank you for your insight - it's always really helpful to know that people have gone through similar design questions in the past and see how they have solved them. I think it would be really helpful if you take a look at what we have so far (once ready to try out) and see if you have any suggestions (and it would be great if you were interested in contributing!) |
I've now had a chance to digest all the suggestions above and have now updated the prototype implementation. I've also now put the implementation in a standalone package to make it easy to try out. The repository and docs are here:
It would be really helpful if people here can take it for a spin to see if you like the current API. The way I've set up this package is such that once we open a PR to astropy core, all commits will be preserved. Therefore, feel free to open pull requests, or issues! If you'd like to work on some of the open issues, just leave a comment! You can easily install the package with:
and note that things will work better if you have the latest developer version of Astropy (or the 3.1rc1). A few updates based on the above discussion:
Let me know if you have any feedback! |
I'm a little late to this party, but having looked at and tried the prototype I have a suggestion which might make the If I imagine my use cases for a class, it is to handle collections of data, which have a matching time attribute. Typically, data has values, uncertainty, and perhaps a mask indicating data quality - just like the It would also be incredibly powerful if arithmetic on flux_1 = NDDataArray(data=y_1, uncertainty=StdDevUncertainty(yerr_1), flags=flags_1)
flux_2 = NDDataArray(data=y_2, uncertainty=StdDevUncertainty(yerr_2), flags=flags_2)
ts_1 = TimeSeries(time=t, data={'flux': flux_1})
ts_2 = TimeSeries(time=t, data={'flux': flux_2})
rel_ts = ts_1/ts_2 ...and the Such behaviour is compatible with the APE as it stands, but can't be achieved if a Sorry if this suggestion comes after many people have done a lot of hard work; I hadn't realised this issue until I started to play with the implementation. Feel free to ignore this, or point out glaring errors |
@StuartLittlefair - thanks for the suggestion! I agree that using NDData might work well for some use cases. I'd like to make two suggestions:
I think both of these can be done of course, we don't have to choose. I'll also mention that for NDData, we changed the API a few times because we ended up making the base class try and be too smart, and some functionality is best left to sub-classes. I think we will need to do the same here - that is, functionality like I'll try and work on an NDData mix-in class implementation to see how much work it might be and whether it would suit your use case. I'll also see if I can re-use some of your prototype code. |
Should we merge this APE since astropy.timeseries is merged? |
@gully - I would rather wait until 4.0, and ask everyone (especially folks with non single band optical photometry use cases) here to give the current version a good try and report back issues or missing functionalities. |
Thank you @bsipocz , indeed learning from the existing timeseries implementation should feed back to the long term vision. |
(well, it's not my call to anyway to merge this, but the feedback I heard so far is that the current api is very much kepler type light curve centric. but it's very very difficult to anything about it without actual issues being opened...) |
Interesting, I should try out the timeseries class more to see what feedback I can give. I am curious about multichannel and sparse applications too, despite being Kepler-centric much of the time. I'm especially interested in use cases that combine sparse ground-based observations with high cadence space based observations. By "combine", I don't necessarily mean in the same "flat" timeseries object, though that's conceivable. I just mean making sparse and high cadence interoperate more easily would unlock some good science. The different bandpasses comes up a lot too. I re-read the APE governance protocol on the README and appreciate that a lot goes into formalizing the APE, so my use of "merge" wasn't the right choice. I had thought an APE was a pre-requisite to making new functionality, but I see now that making an experimental implementation further informs use cases and guides the functionality in the longer term. |
@gully - make sure you're playing with the dev version as there are some substantial improvements in the underlying classes that were necessary to be able to extend the functionalities (though none of those are reflected in the narrative docs yet) |
Hi, I am working on the TolTEC project and I am wring python code for handling the time stream data collected from an array of bolometers that scans in time. As already pointed out in the previous comments, since each time sample contains tightly packed data (in our case, a vector of size ~4000, matching the total number of detectors in the array), it would make most sense to have an NDData as the storage and have timeaxis mixin attached to one of the axis. I saw that #16 is still open and I was wondering what is the status of that? |
So... astropy/astropy#8540 was merged and released in v3.2. Where are we going with this in 2022? |
Pinging again in 2023. Are we just going to wrap this up since timeseries is already in core and all that? |
We should have a "Superseded by events" status for APEs 😅 (jk!) |
So is this APE complete now that |
Would have to ping @astrofrog and @Cadair again |
... years ago.
@eteq Please follow through with the Zenodo steps when they are online again (they had trouble minting DOIs the last few days). |
This is still in very early draft.