python - Strange behavior of pandas resampling -
i'm experiencing rather strange behavior of resampling function of pandas time-series (python). use latest version of pandas (0.12.0)
take following time series:
dates = [datetime(2011, 1, 2, 1), datetime(2011, 1, 2, 2), datetime(2011, 1, 2, 3), datetime(2011, 1, 2, 4), datetime(2011, 1, 2, 5), datetime(2011, 1, 2, 6)] ts = series(np.arange(6.), index=dates) then try resampling 66s , 65s. result get:
in [45]: ts.resample('66min') out[45]: 2011-01-02 01:00:00 0.5 2011-01-02 02:06:00 2.0 2011-01-02 03:12:00 3.0 2011-01-02 04:18:00 4.0 2011-01-02 05:24:00 5.0 freq: 66t, dtype: float64 in [46]: ts.resample('65min') out[46]: 2011-01-02 01:00:00 0 2011-01-02 02:05:00 nan 2011-01-02 03:10:00 nan 2011-01-02 04:15:00 nan 2011-01-02 05:20:00 nan 2011-01-02 06:25:00 nan freq: 65t, dtype: float64 i understand behavior when resampling 66s. takes mean (default) of values in respective interval. not understand , don't know how influence behavior 65s.
this simplified problem. background more complex data correction process, involving resampling.
any ideas?
perhaps want interpolate instead of resample. here's 1 way:
in [53]: index = pd.date_range(freq='66t', start=ts.first_valid_index(), periods=5) in [54]: ts.reindex(set(ts.index).union(index)).sort_index().interpolate('time').ix[index] out[54]: 2011-01-02 01:00:00 0.0 2011-01-02 02:06:00 1.1 2011-01-02 03:12:00 2.2 2011-01-02 04:18:00 3.3 2011-01-02 05:24:00 4.4 freq: 66t, dtype: float64 in [55]: index = pd.date_range(freq='65t', start=ts.first_valid_index(), periods=5) in [56]: ts.reindex(set(ts.index).union(index)).sort_index().interpolate('time').ix[index] out[56]: 2011-01-02 01:00:00 0.000000 2011-01-02 02:05:00 1.083333 2011-01-02 03:10:00 2.166667 2011-01-02 04:15:00 3.250000 2011-01-02 05:20:00 4.333333 freq: 65t, dtype: float64 that said, seems resample improved. @ first glance, behavior you've demonstrated mysterious and, agree, unhelpful. worth discussing.
Comments
Post a Comment