<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Timeseries |</title><link>https://yu-cheng.co/tags/timeseries/</link><atom:link href="https://yu-cheng.co/tags/timeseries/index.xml" rel="self" type="application/rss+xml"/><description>Timeseries</description><generator>HugoBlox Kit (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Mon, 13 Feb 2017 00:42:30 +0000</lastBuildDate><image><url>https://yu-cheng.co/media/icon_hu_87a968e0c4fc153c.png</url><title>Timeseries</title><link>https://yu-cheng.co/tags/timeseries/</link></image><item><title>Fill in the missing data using Python pandas</title><link>https://yu-cheng.co/blog/pandas_missing_value/</link><pubDate>Mon, 13 Feb 2017 00:42:30 +0000</pubDate><guid>https://yu-cheng.co/blog/pandas_missing_value/</guid><description>&lt;p&gt;One of the many advantages of Python is its abundant and often powerful Libraries. For my research, besides plotting maps, I often play with time series. When it comes to manipulating and plotting time series, no other tools can beat python pandas.&lt;/p&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;At the core of Pandas are the data structures: &lt;em&gt;Series&lt;/em&gt;, &lt;em&gt;DataFrame&lt;/em&gt; and &lt;em&gt;Panel&lt;/em&gt;. The ones I used the most are the first two. A &lt;em&gt;Series&lt;/em&gt; is an array labeled with timestamps, and a &lt;em&gt;DataFrame&lt;/em&gt; consists of many &lt;em&gt;Series&lt;/em&gt;. In a real-world use case, I use pandas to generate a range of time-axis, which is then attached to my Agulhas leakage time-series. After doing that, the value at a specific timestep can be easily retrieved by calling &lt;code&gt;Series['timestamp'].&lt;/code&gt; And to plot the whole time series is as simple as &lt;code&gt;Series.plot().&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;For a &lt;em&gt;DataFrame&lt;/em&gt;, to see the key statistics of a &lt;em&gt;DataFrame&lt;/em&gt; with many columns, simply use &lt;code&gt;DataFrame.describe()&lt;/code&gt;. A table with mean, standard deviation, counts, and percentiles will then pop up. To compare multiple time series visually, naively put &lt;code&gt;DataFrame.plot().&lt;/code&gt;&lt;/p&gt;
&lt;h4 id="working-with-missing-data"&gt;Working with missing data&lt;/h4&gt;
&lt;p&gt;Recently, I am calculating the Atlantic Ocean Heat Content (OHC).&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;#headers=[&amp;#39;date&amp;#39;,&amp;#39;OHC2000&amp;#39;,&amp;#39;OHC300&amp;#39;,&amp;#39;OHC700&amp;#39;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;OHC_multilevels&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;OHC_HRC07_1951-2002.csv&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# If it&amp;#39;s pandas generated, this is much easier.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;OHC_multilevels&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;figure&gt;&lt;img src="https://yu-cheng.co/img/output_32_1.png" width="600"&gt;&lt;figcaption&gt;
&lt;h4&gt;Atlantic OHC in multiple layers 1951-2002&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Obviously, something fishy happened near 1952 and again in 1971. Several months have values close to zero, which is unlikely. Going back to the data, I confirmed that the temperature and salinity fields of those months are missing. To clean up the time series, I first assigned &lt;code&gt;None&lt;/code&gt; to those months, and interpolate linearly using the neighboring months. Three time series in the same &lt;em&gt;DataFrame&lt;/em&gt; are processed using following two lines.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;OHC_multilevels&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;OHC_multilevels&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;OHC_multilevels&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interpolate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;figure&gt;&lt;img src="https://yu-cheng.co/img/output_33_1.png" width="600"&gt;&lt;figcaption&gt;
&lt;h4&gt;filled missing data with linear interpolation&lt;/h4&gt;
&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This is just a glimpse of the awesomness of pandas. More details can be found in the
.&lt;/p&gt;</description></item></channel></rss>