ASKSAGE: Sage Q&A Forum - RSS feedhttps://ask.sagemath.org/questions/Q&A Forum for SageenCopyright Sage, 2010. Some rights reserved under creative commons license.Sat, 06 Jun 2020 01:47:33 +0200Selection of a subset for ticks labelshttps://ask.sagemath.org/question/51756/selection-of-a-subset-for-ticks-labels/ I have loaded a long serie of interest rate at `https://data.oecd.org/interest/short-term-interest-rates.htm#indicator-chart`. Then I have loaded Pandas to manipulate thoses interest rates.
import pandas as pd
int_rates = pd.read_csv(r"C:\data\interetsmensuelsocde.csv")
int_rates
A some column are without interest I have applied
ir3=int_rates.loc[23522 : 29300,'TIME':'Value']
Until now I have no problems. But now I want to plot my serie
import matplotlib.pyplot as plt
ir3.plot(x="TIME", y="Value")
the problem is that x gives too much information. How can I use only a subset of x for labelizing the ticks ?Thu, 04 Jun 2020 12:36:10 +0200https://ask.sagemath.org/question/51756/selection-of-a-subset-for-ticks-labels/Answer by dan_fulea for <p>I have loaded a long serie of interest rate at <code>https://data.oecd.org/interest/short-term-interest-rates.htm#indicator-chart</code>. Then I have loaded Pandas to manipulate thoses interest rates.</p>
<pre><code>import pandas as pd
int_rates = pd.read_csv(r"C:\data\interetsmensuelsocde.csv")
int_rates
</code></pre>
<p>A some column are without interest I have applied </p>
<pre><code>ir3=int_rates.loc[23522 : 29300,'TIME':'Value']
</code></pre>
<p>Until now I have no problems. But now I want to plot my serie</p>
<pre><code>import matplotlib.pyplot as plt
ir3.plot(x="TIME", y="Value")
</code></pre>
<p>the problem is that x gives too much information. How can I use only a subset of x for labelizing the ticks ?</p>
https://ask.sagemath.org/question/51756/selection-of-a-subset-for-ticks-labels/?answer=51774#post-id-51774The line
import matplotlib.pyplot as plt
is "magic", each pandas plot will go "into `plt`".
So we are using a **pandas plot** in the line with `ir3.plot(x="TIME", y="Value")`. *This is not related to sage at all.* (The construction of `ir3` depends on some internet page that i needed to visit, in order to save some files with a specified name, in order to pick some specified locations from it. However, there are no TIME and no Value strings inside. So the OP can not be reconstructed on my machine... The effort to get some start is much bigger then typing the lines to answer the question.) The "ticks" are unclear in the context, i suppose they correspond to the lines in the file i do not have.
Anyways. Just apply a filter on these lines to isolate a subset.
I am answering this post only to show some problems that my occur when pandas intersects sage.
Here is a minimal example with data that it is easy to see and understand:
import pandas as pd
from io import StringIO
data = """TIME;Value
2019-03-01;234.32489
2019-04-11;231.23
2019-04-22;233.11
2019-06-03;235.965
2019-07-11;234.118456
2020-01-06;220.3782"""
ir3 = pd.read_csv(StringIO(data), sep=';')
ir3
This gives in the **sage ipython** console:
sage: ir3
TIME Value
0 Integer(2019)-Integer('03')-Integer('01') RealNumber('234.32489')
1 Integer(2019)-Integer('04')-Integer(11) RealNumber('231.23')
2 Integer(2019)-Integer('04')-Integer(22) RealNumber('233.11')
3 Integer(2019)-Integer('06')-Integer('03') RealNumber('235.965')
4 Integer(2019)-Integer('07')-Integer(11) RealNumber('234.118456')
5 Integer(2020)-Integer('01')-Integer('06') RealNumber('220.3782')
While in a native **ipython3** console we have rather:
In [2]: ir3
Out[2]:
TIME Value
0 2019-03-01 234.324890
1 2019-04-11 231.230000
2 2019-04-22 233.110000
3 2019-06-03 235.965000
4 2019-07-11 234.118456
5 2020-01-06 220.378200
There are clear differences. At any rate, if we want to pick (in **ipyhton3**) the entries in the locations $1,3,4$, then we just ask for:
In [11]: ir3.loc[ [1,3,4] ]
Out[11]:
TIME Value
1 2019-04-11 231.230000
3 2019-06-03 235.965000
4 2019-07-11 234.118456
Same extraction works also in the sage interpreter. But note how strange it is, the "integers" starting with a zero are converted to strings...
sage: ir3.loc[[1,3,4]]
TIME Value
1 Integer(2019)-Integer('04')-Integer(11) RealNumber('231.23')
3 Integer(2019)-Integer('06')-Integer('03') RealNumber('235.965')
4 Integer(2019)-Integer('07')-Integer(11) RealNumber('234.118456')
sage:
We may want to make the first column explicitly a date time column...
In [36]: ir3
Out[36]:
TIME Value
0 2019-03-01 234.324890
1 2019-04-11 231.230000
2 2019-04-22 233.110000
3 2019-06-03 235.965000
4 2019-07-11 234.118456
5 2020-01-06 220.378200
In [37]: ir3.dtypes
Out[37]:
TIME object
Value float64
dtype: object
In [38]: ir3['TIME'] = pd.to_datetime(ir3['TIME'], format='%Y-%m-%d')
In [39]: ir3
Out[39]:
TIME Value
0 2019-03-01 234.324890
1 2019-04-11 231.230000
2 2019-04-22 233.110000
3 2019-06-03 235.965000
4 2019-07-11 234.118456
5 2020-01-06 220.378200
In [40]: ir3.dtypes
Out[40]:
TIME datetime64[ns]
Value float64
dtype: object
Now we want to plot. In the ipython3 world the entries YYYY-MM-DD are now datetimes and the plot works.
In [42]: import matplotlib.pyplot as plt
In [43]: ir3.plot(x='TIME', y='Value')
Out[43]: <matplotlib.axes._subplots.AxesSubplot at 0x7fb0ceeae850>
In [44]: plt.show()
and some image pop-up shows the data as wanted. In general, it is not a good idea to implement an own selection when the data is large. One can let `pandas` do the job. (Aggregate.) There are many tutorial showing how to do this on the net, for instance:
[https://geo-python.github.io/2017/lessons/L7/pandas-plotting.html](https://geo-python.github.io/2017/lessons/L7/pandas-plotting.html)
They show for instance how to set the `TIME` as index column, well, `ir3.set_index('TIME')`, and how to `resample()`.
(Subgraphs are also shown.)
Note: Please give next time(s) a minimal example, mention what is the connection to sage, and try to share the effort in solving the problem. Please mention in detail *all steps* needed to get started. In this case, i could not found any "huge" downloadable file on the site, this is irritating. If some pieces of code involve "programming magic", please take time to explain it, people on this site have a rather mathematical background. You would maybe take some time to answer some questions on this site, thus also contributing to the community also from the other side. It would be a fair share of experience after years of questions.
Sat, 06 Jun 2020 00:41:33 +0200https://ask.sagemath.org/question/51756/selection-of-a-subset-for-ticks-labels/?answer=51774#post-id-51774Comment by Juanjo for <p>The line </p>
<pre><code>import matplotlib.pyplot as plt
</code></pre>
<p>is "magic", each pandas plot will go "into <code>plt</code>". </p>
<p>So we are using a <strong>pandas plot</strong> in the line with <code>ir3.plot(x="TIME", y="Value")</code>. <em>This is not related to sage at all.</em> (The construction of <code>ir3</code> depends on some internet page that i needed to visit, in order to save some files with a specified name, in order to pick some specified locations from it. However, there are no TIME and no Value strings inside. So the OP can not be reconstructed on my machine... The effort to get some start is much bigger then typing the lines to answer the question.) The "ticks" are unclear in the context, i suppose they correspond to the lines in the file i do not have. </p>
<p>Anyways. Just apply a filter on these lines to isolate a subset.</p>
<p>I am answering this post only to show some problems that my occur when pandas intersects sage.</p>
<p>Here is a minimal example with data that it is easy to see and understand:</p>
<pre><code>import pandas as pd
from io import StringIO
data = """TIME;Value
2019-03-01;234.32489
2019-04-11;231.23
2019-04-22;233.11
2019-06-03;235.965
2019-07-11;234.118456
2020-01-06;220.3782"""
ir3 = pd.read_csv(StringIO(data), sep=';')
ir3
</code></pre>
<p>This gives in the <strong>sage ipython</strong> console:</p>
<pre><code>sage: ir3
TIME Value
0 Integer(2019)-Integer('03')-Integer('01') RealNumber('234.32489')
1 Integer(2019)-Integer('04')-Integer(11) RealNumber('231.23')
2 Integer(2019)-Integer('04')-Integer(22) RealNumber('233.11')
3 Integer(2019)-Integer('06')-Integer('03') RealNumber('235.965')
4 Integer(2019)-Integer('07')-Integer(11) RealNumber('234.118456')
5 Integer(2020)-Integer('01')-Integer('06') RealNumber('220.3782')
While in a native **ipython3** console we have rather:
In [2]: ir3
Out[2]:
TIME Value
0 2019-03-01 234.324890
1 2019-04-11 231.230000
2 2019-04-22 233.110000
3 2019-06-03 235.965000
4 2019-07-11 234.118456
5 2020-01-06 220.378200
</code></pre>
<p>There are clear differences. At any rate, if we want to pick (in <strong>ipyhton3</strong>) the entries in the locations $1,3,4$, then we just ask for:</p>
<pre><code>In [11]: ir3.loc[ [1,3,4] ]
Out[11]:
TIME Value
1 2019-04-11 231.230000
3 2019-06-03 235.965000
4 2019-07-11 234.118456
</code></pre>
<p>Same extraction works also in the sage interpreter. But note how strange it is, the "integers" starting with a zero are converted to strings... </p>
<pre><code>sage: ir3.loc[[1,3,4]]
TIME Value
1 Integer(2019)-Integer('04')-Integer(11) RealNumber('231.23')
3 Integer(2019)-Integer('06')-Integer('03') RealNumber('235.965')
4 Integer(2019)-Integer('07')-Integer(11) RealNumber('234.118456')
sage:
</code></pre>
<p>We may want to make the first column explicitly a date time column...</p>
<pre><code>In [36]: ir3
Out[36]:
TIME Value
0 2019-03-01 234.324890
1 2019-04-11 231.230000
2 2019-04-22 233.110000
3 2019-06-03 235.965000
4 2019-07-11 234.118456
5 2020-01-06 220.378200
In [37]: ir3.dtypes
Out[37]:
TIME object
Value float64
dtype: object
In [38]: ir3['TIME'] = pd.to_datetime(ir3['TIME'], format='%Y-%m-%d')
In [39]: ir3
Out[39]:
TIME Value
0 2019-03-01 234.324890
1 2019-04-11 231.230000
2 2019-04-22 233.110000
3 2019-06-03 235.965000
4 2019-07-11 234.118456
5 2020-01-06 220.378200
In [40]: ir3.dtypes
Out[40]:
TIME datetime64[ns]
Value float64
dtype: object
</code></pre>
<p>Now we want to plot. In the ipython3 world the entries YYYY-MM-DD are now datetimes and the plot works.</p>
<pre><code>In [42]: import matplotlib.pyplot as plt
In [43]: ir3.plot(x='TIME', y='Value')
Out[43]: <matplotlib.axes._subplots.AxesSubplot at 0x7fb0ceeae850>
In [44]: plt.show()
</code></pre>
<p>and some image pop-up shows the data as wanted. In general, it is not a good idea to implement an own selection when the data is large. One can let <code>pandas</code> do the job. (Aggregate.) There are many tutorial showing how to do this on the net, for instance:</p>
<p><a href="https://geo-python.github.io/2017/lessons/L7/pandas-plotting.html">https://geo-python.github.io/2017/lessons/L7/pandas-plotting.html</a> </p>
<p>They show for instance how to set the <code>TIME</code> as index column, well, <code>ir3.set_index('TIME')</code>, and how to <code>resample()</code>.
(Subgraphs are also shown.)</p>
<p>Note: Please give next time(s) a minimal example, mention what is the connection to sage, and try to share the effort in solving the problem. Please mention in detail <em>all steps</em> needed to get started. In this case, i could not found any "huge" downloadable file on the site, this is irritating. If some pieces of code involve "programming magic", please take time to explain it, people on this site have a rather mathematical background. You would maybe take some time to answer some questions on this site, thus also contributing to the community also from the other side. It would be a fair share of experience after years of questions.</p>
https://ask.sagemath.org/question/51756/selection-of-a-subset-for-ticks-labels/?comment=51777#post-id-51777I would add, @Cyrille, that you should also revise many of your questions and either accept those answers that really solve the problem you had or, at least, say why the answers are unsatisfactory. In the latter case, your comments may lead to edits and new solutions. In both cases, you'll contribute to improve the overall quality of this site.Sat, 06 Jun 2020 01:47:33 +0200https://ask.sagemath.org/question/51756/selection-of-a-subset-for-ticks-labels/?comment=51777#post-id-51777