提交 aed7b9cf 编写于 作者: T TRHX

Site updated: 2020-08-07 09:16:17

上级 39168dc7
......@@ -733,7 +733,7 @@
<h6 class="tags">
<a class="tag" href="/tags/Github-Pages/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;Github Pages</a> <a class="tag" href="/tags/Hexo/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;Hexo</a>
<a class="tag" href="/tags/Hexo/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;Hexo</a> <a class="tag" href="/tags/Github-Pages/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;Github Pages</a>
</h6>
</span>
......
......@@ -747,7 +747,7 @@
<div class="new-meta-item meta-tags"><a class="tag" href="/tags/Github-Pages/" rel="nofollow"><i class="fas fa-tags" aria-hidden="true"></i>&nbsp;<p>Github Pages</p></a></div> <div class="new-meta-item meta-tags"><a class="tag" href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags" aria-hidden="true"></i>&nbsp;<p>Hexo</p></a></div>
<div class="new-meta-item meta-tags"><a class="tag" href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags" aria-hidden="true"></i>&nbsp;<p>Hexo</p></a></div> <div class="new-meta-item meta-tags"><a class="tag" href="/tags/Github-Pages/" rel="nofollow"><i class="fas fa-tags" aria-hidden="true"></i>&nbsp;<p>Github Pages</p></a></div>
......
......@@ -736,7 +736,7 @@
<h6 class="tags">
<a class="tag" href="/tags/Hexo/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;Hexo</a> <a class="tag" href="/tags/主题个性化/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;主题个性化</a> <a class="tag" href="/tags/Material-X/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;Material X</a> <a class="tag" href="/tags/spfk/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;spfk</a>
<a class="tag" href="/tags/主题个性化/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;主题个性化</a> <a class="tag" href="/tags/Hexo/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;Hexo</a> <a class="tag" href="/tags/Material-X/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;Material X</a> <a class="tag" href="/tags/spfk/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;spfk</a>
</h6>
</span>
......@@ -756,7 +756,7 @@
<h6 class="tags">
<a class="tag" href="/tags/Github-Pages/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;Github Pages</a> <a class="tag" href="/tags/Hexo/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;Hexo</a>
<a class="tag" href="/tags/Hexo/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;Hexo</a> <a class="tag" href="/tags/Github-Pages/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;Github Pages</a>
</h6>
</span>
......
......@@ -916,7 +916,7 @@
<div class="new-meta-item meta-tags"><a class="tag" href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags" aria-hidden="true"></i>&nbsp;<p>Hexo</p></a></div> <div class="new-meta-item meta-tags"><a class="tag" href="/tags/主题个性化/" rel="nofollow"><i class="fas fa-tags" aria-hidden="true"></i>&nbsp;<p>主题个性化</p></a></div> <div class="new-meta-item meta-tags"><a class="tag" href="/tags/Material-X/" rel="nofollow"><i class="fas fa-tags" aria-hidden="true"></i>&nbsp;<p>Material X</p></a></div> <div class="new-meta-item meta-tags"><a class="tag" href="/tags/spfk/" rel="nofollow"><i class="fas fa-tags" aria-hidden="true"></i>&nbsp;<p>spfk</p></a></div>
<div class="new-meta-item meta-tags"><a class="tag" href="/tags/主题个性化/" rel="nofollow"><i class="fas fa-tags" aria-hidden="true"></i>&nbsp;<p>主题个性化</p></a></div> <div class="new-meta-item meta-tags"><a class="tag" href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags" aria-hidden="true"></i>&nbsp;<p>Hexo</p></a></div> <div class="new-meta-item meta-tags"><a class="tag" href="/tags/Material-X/" rel="nofollow"><i class="fas fa-tags" aria-hidden="true"></i>&nbsp;<p>Material X</p></a></div> <div class="new-meta-item meta-tags"><a class="tag" href="/tags/spfk/" rel="nofollow"><i class="fas fa-tags" aria-hidden="true"></i>&nbsp;<p>spfk</p></a></div>
......
......@@ -817,7 +817,7 @@
<h6 class="tags">
<a class="tag" href="/tags/Hexo/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;Hexo</a> <a class="tag" href="/tags/主题个性化/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;主题个性化</a> <a class="tag" href="/tags/Material-X/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;Material X</a> <a class="tag" href="/tags/spfk/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;spfk</a>
<a class="tag" href="/tags/主题个性化/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;主题个性化</a> <a class="tag" href="/tags/Hexo/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;Hexo</a> <a class="tag" href="/tags/Material-X/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;Material X</a> <a class="tag" href="/tags/spfk/"><i class="fas fa-tags fa-fw" aria-hidden="true"></i>&nbsp;spfk</a>
</h6>
</span>
......
......@@ -638,7 +638,7 @@
<p>对于多层分组,元组的第一个元素将会是由键值组成的元组,第二个元素为数据块:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">&gt;&gt;&gt; </span><span class="keyword">import</span> pandas <span class="keyword">as</span> pd</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span><span class="keyword">import</span> numpy <span class="keyword">as</span> np</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>data = &#123;<span class="string">'key1'</span> : [<span class="string">'a'</span>, <span class="string">'b'</span>, <span class="string">'a'</span>, <span class="string">'b'</span>, <span class="string">'a'</span>, <span class="string">'b'</span>, <span class="string">'a'</span>, <span class="string">'a'</span>],</span><br><span class="line"> <span class="string">'key2'</span> : [<span class="string">'one'</span>, <span class="string">'one'</span>, <span class="string">'two'</span>, <span class="string">'three'</span>, <span class="string">'two'</span>, <span class="string">'two'</span>, <span class="string">'one'</span>, <span class="string">'three'</span>],</span><br><span class="line"> <span class="string">'data1'</span>: np.random.randn(<span class="number">8</span>),</span><br><span class="line"> <span class="string">'data2'</span>: np.random.randn(<span class="number">8</span>)&#125;</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>obj = pd.DataFrame(data)</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>obj</span><br><span class="line"> key1 key2 data1 data2</span><br><span class="line"><span class="number">0</span> a one <span class="number">-1.088762</span> <span class="number">0.668504</span></span><br><span class="line"><span class="number">1</span> b one <span class="number">0.275500</span> <span class="number">0.787844</span></span><br><span class="line"><span class="number">2</span> a two <span class="number">-0.108417</span> <span class="number">-0.491296</span></span><br><span class="line"><span class="number">3</span> b three <span class="number">0.019524</span> <span class="number">-0.363390</span></span><br><span class="line"><span class="number">4</span> a two <span class="number">0.453612</span> <span class="number">0.796999</span></span><br><span class="line"><span class="number">5</span> b two <span class="number">1.982858</span> <span class="number">1.501877</span></span><br><span class="line"><span class="number">6</span> a one <span class="number">1.101132</span> <span class="number">-1.928362</span></span><br><span class="line"><span class="number">7</span> a three <span class="number">0.524775</span> <span class="number">-1.205842</span></span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span></span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span><span class="keyword">for</span> group_name, group_data <span class="keyword">in</span> obj.groupby([<span class="string">'key1'</span>, <span class="string">'key2'</span>]):</span><br><span class="line"> print(group_name)</span><br><span class="line"> print(group_data)</span><br><span class="line"></span><br><span class="line"> </span><br><span class="line">(<span class="string">'a'</span>, <span class="string">'one'</span>)</span><br><span class="line"> key1 key2 data1 data2</span><br><span class="line"><span class="number">0</span> a one <span class="number">-1.088762</span> <span class="number">0.668504</span></span><br><span class="line"><span class="number">6</span> a one <span class="number">1.101132</span> <span class="number">-1.928362</span></span><br><span class="line">(<span class="string">'a'</span>, <span class="string">'three'</span>)</span><br><span class="line"> key1 key2 data1 data2</span><br><span class="line"><span class="number">7</span> a three <span class="number">0.524775</span> <span class="number">-1.205842</span></span><br><span class="line">(<span class="string">'a'</span>, <span class="string">'two'</span>)</span><br><span class="line"> key1 key2 data1 data2</span><br><span class="line"><span class="number">2</span> a two <span class="number">-0.108417</span> <span class="number">-0.491296</span></span><br><span class="line"><span class="number">4</span> a two <span class="number">0.453612</span> <span class="number">0.796999</span></span><br><span class="line">(<span class="string">'b'</span>, <span class="string">'one'</span>)</span><br><span class="line"> key1 key2 data1 data2</span><br><span class="line"><span class="number">1</span> b one <span class="number">0.2755</span> <span class="number">0.787844</span></span><br><span class="line">(<span class="string">'b'</span>, <span class="string">'three'</span>)</span><br><span class="line"> key1 key2 data1 data2</span><br><span class="line"><span class="number">3</span> b three <span class="number">0.019524</span> <span class="number">-0.36339</span></span><br><span class="line">(<span class="string">'b'</span>, <span class="string">'two'</span>)</span><br><span class="line"> key1 key2 data1 data2</span><br><span class="line"><span class="number">5</span> b two <span class="number">1.982858</span> <span class="number">1.501877</span></span><br></pre></td></tr></table></figure>
<h3 id="【03x05】对象转换"><a href="#【03x05】对象转换" class="headerlink" title="【03x05】对象转换"></a><font color="#4876FF">【03x05】对象转换</font></h3><p>GroupBy 对象支持转换成列表或字典:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">&gt;&gt;&gt; </span><span class="keyword">import</span> pandas <span class="keyword">as</span> pd</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span><span class="keyword">import</span> numpy <span class="keyword">as</span> np</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>data = &#123;<span class="string">'key1'</span> : [<span class="string">'a'</span>, <span class="string">'b'</span>, <span class="string">'a'</span>, <span class="string">'b'</span>, <span class="string">'a'</span>, <span class="string">'b'</span>, <span class="string">'a'</span>, <span class="string">'a'</span>],</span><br><span class="line"> <span class="string">'key2'</span> : [<span class="string">'one'</span>, <span class="string">'one'</span>, <span class="string">'two'</span>, <span class="string">'three'</span>, <span class="string">'two'</span>, <span class="string">'two'</span>, <span class="string">'one'</span>, <span class="string">'three'</span>],</span><br><span class="line"> <span class="string">'data1'</span>: np.random.randn(<span class="number">8</span>),</span><br><span class="line"> <span class="string">'data2'</span>: np.random.randn(<span class="number">8</span>)&#125;</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>obj = pd.DataFrame(data)</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>obj</span><br><span class="line"> key1 key2 data1 data2</span><br><span class="line"><span class="number">0</span> a one <span class="number">-0.607009</span> <span class="number">1.948301</span></span><br><span class="line"><span class="number">1</span> b one <span class="number">0.150818</span> <span class="number">-0.025095</span></span><br><span class="line"><span class="number">2</span> a two <span class="number">-2.086024</span> <span class="number">0.358164</span></span><br><span class="line"><span class="number">3</span> b three <span class="number">0.446061</span> <span class="number">1.708797</span></span><br><span class="line"><span class="number">4</span> a two <span class="number">0.745457</span> <span class="number">-0.980948</span></span><br><span class="line"><span class="number">5</span> b two <span class="number">0.981877</span> <span class="number">2.159327</span></span><br><span class="line"><span class="number">6</span> a one <span class="number">0.804480</span> <span class="number">-0.499661</span></span><br><span class="line"><span class="number">7</span> a three <span class="number">0.112884</span> <span class="number">0.004367</span></span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span></span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>grouped = obj.groupby(<span class="string">'key1'</span>)</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>list(grouped1)</span><br><span class="line">[(<span class="string">'a'</span>, key1 key2 data1 data2</span><br><span class="line"><span class="number">0</span> a one <span class="number">-0.607009</span> <span class="number">1.948301</span></span><br><span class="line"><span class="number">2</span> a two <span class="number">-2.086024</span> <span class="number">0.358164</span></span><br><span class="line"><span class="number">4</span> a two <span class="number">0.745457</span> <span class="number">-0.980948</span></span><br><span class="line"><span class="number">6</span> a one <span class="number">0.804480</span> <span class="number">-0.499661</span></span><br><span class="line"><span class="number">7</span> a three <span class="number">0.112884</span> <span class="number">0.004367</span>),</span><br><span class="line">(<span class="string">'b'</span>, key1 key2 data1 data2</span><br><span class="line"><span class="number">1</span> b one <span class="number">0.150818</span> <span class="number">-0.025095</span></span><br><span class="line"><span class="number">3</span> b three <span class="number">0.446061</span> <span class="number">1.708797</span></span><br><span class="line"><span class="number">5</span> b two <span class="number">0.981877</span> <span class="number">2.159327</span>)]</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span></span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>dict(list(grouped1))</span><br><span class="line">&#123;<span class="string">'a'</span>: key1 key2 data1 data2</span><br><span class="line"><span class="number">0</span> a one <span class="number">-0.607009</span> <span class="number">1.948301</span></span><br><span class="line"><span class="number">2</span> a two <span class="number">-2.086024</span> <span class="number">0.358164</span></span><br><span class="line"><span class="number">4</span> a two <span class="number">0.745457</span> <span class="number">-0.980948</span></span><br><span class="line"><span class="number">6</span> a one <span class="number">0.804480</span> <span class="number">-0.499661</span></span><br><span class="line"><span class="number">7</span> a three <span class="number">0.112884</span> <span class="number">0.004367</span>,</span><br><span class="line"><span class="string">'b'</span>: key1 key2 data1 data2</span><br><span class="line"><span class="number">1</span> b one <span class="number">0.150818</span> <span class="number">-0.025095</span></span><br><span class="line"><span class="number">3</span> b three <span class="number">0.446061</span> <span class="number">1.708797</span></span><br><span class="line"><span class="number">5</span> b two <span class="number">0.981877</span> <span class="number">2.159327</span>&#125;</span><br></pre></td></tr></table></figure>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">&gt;&gt;&gt; </span><span class="keyword">import</span> pandas <span class="keyword">as</span> pd</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span><span class="keyword">import</span> numpy <span class="keyword">as</span> np</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>data = &#123;<span class="string">'key1'</span> : [<span class="string">'a'</span>, <span class="string">'b'</span>, <span class="string">'a'</span>, <span class="string">'b'</span>, <span class="string">'a'</span>, <span class="string">'b'</span>, <span class="string">'a'</span>, <span class="string">'a'</span>],</span><br><span class="line"> <span class="string">'key2'</span> : [<span class="string">'one'</span>, <span class="string">'one'</span>, <span class="string">'two'</span>, <span class="string">'three'</span>, <span class="string">'two'</span>, <span class="string">'two'</span>, <span class="string">'one'</span>, <span class="string">'three'</span>],</span><br><span class="line"> <span class="string">'data1'</span>: np.random.randn(<span class="number">8</span>),</span><br><span class="line"> <span class="string">'data2'</span>: np.random.randn(<span class="number">8</span>)&#125;</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>obj = pd.DataFrame(data)</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>obj</span><br><span class="line"> key1 key2 data1 data2</span><br><span class="line"><span class="number">0</span> a one <span class="number">-0.607009</span> <span class="number">1.948301</span></span><br><span class="line"><span class="number">1</span> b one <span class="number">0.150818</span> <span class="number">-0.025095</span></span><br><span class="line"><span class="number">2</span> a two <span class="number">-2.086024</span> <span class="number">0.358164</span></span><br><span class="line"><span class="number">3</span> b three <span class="number">0.446061</span> <span class="number">1.708797</span></span><br><span class="line"><span class="number">4</span> a two <span class="number">0.745457</span> <span class="number">-0.980948</span></span><br><span class="line"><span class="number">5</span> b two <span class="number">0.981877</span> <span class="number">2.159327</span></span><br><span class="line"><span class="number">6</span> a one <span class="number">0.804480</span> <span class="number">-0.499661</span></span><br><span class="line"><span class="number">7</span> a three <span class="number">0.112884</span> <span class="number">0.004367</span></span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span></span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>grouped = obj.groupby(<span class="string">'key1'</span>)</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>list(grouped)</span><br><span class="line">[(<span class="string">'a'</span>, key1 key2 data1 data2</span><br><span class="line"><span class="number">0</span> a one <span class="number">-0.607009</span> <span class="number">1.948301</span></span><br><span class="line"><span class="number">2</span> a two <span class="number">-2.086024</span> <span class="number">0.358164</span></span><br><span class="line"><span class="number">4</span> a two <span class="number">0.745457</span> <span class="number">-0.980948</span></span><br><span class="line"><span class="number">6</span> a one <span class="number">0.804480</span> <span class="number">-0.499661</span></span><br><span class="line"><span class="number">7</span> a three <span class="number">0.112884</span> <span class="number">0.004367</span>),</span><br><span class="line">(<span class="string">'b'</span>, key1 key2 data1 data2</span><br><span class="line"><span class="number">1</span> b one <span class="number">0.150818</span> <span class="number">-0.025095</span></span><br><span class="line"><span class="number">3</span> b three <span class="number">0.446061</span> <span class="number">1.708797</span></span><br><span class="line"><span class="number">5</span> b two <span class="number">0.981877</span> <span class="number">2.159327</span>)]</span><br><span class="line">&gt;&gt;&gt;</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>dict(list(grouped))</span><br><span class="line">&#123;<span class="string">'a'</span>: key1 key2 data1 data2</span><br><span class="line"><span class="number">0</span> a one <span class="number">-0.607009</span> <span class="number">1.948301</span></span><br><span class="line"><span class="number">2</span> a two <span class="number">-2.086024</span> <span class="number">0.358164</span></span><br><span class="line"><span class="number">4</span> a two <span class="number">0.745457</span> <span class="number">-0.980948</span></span><br><span class="line"><span class="number">6</span> a one <span class="number">0.804480</span> <span class="number">-0.499661</span></span><br><span class="line"><span class="number">7</span> a three <span class="number">0.112884</span> <span class="number">0.004367</span>,</span><br><span class="line"><span class="string">'b'</span>: key1 key2 data1 data2</span><br><span class="line"><span class="number">1</span> b one <span class="number">0.150818</span> <span class="number">-0.025095</span></span><br><span class="line"><span class="number">3</span> b three <span class="number">0.446061</span> <span class="number">1.708797</span></span><br><span class="line"><span class="number">5</span> b two <span class="number">0.981877</span> <span class="number">2.159327</span>&#125;</span><br></pre></td></tr></table></figure>
<h2 id="【04x00】GroupBy-Apply-数据应用"><a href="#【04x00】GroupBy-Apply-数据应用" class="headerlink" title="【04x00】GroupBy Apply 数据应用"></a><font color="#FF0000">【04x00】GroupBy Apply 数据应用</font></h2><p>聚合指的是任何能够从数组产生标量值的数据转换过程,常用于对分组后的数据进行计算</p>
<h3 id="【04x01】聚合函数"><a href="#【04x01】聚合函数" class="headerlink" title="【04x01】聚合函数"></a><font color="#4876FF">【04x01】聚合函数</font></h3><p>之前的例子已经用过一些内置的聚合函数,比如 mean、count、min 以及 sum 等。常见的聚合运算如下表所示:</p>
<p>官方文档:<a href="https://pandas.pydata.org/docs/reference/groupby.html" target="_blank" rel="noopener">https://pandas.pydata.org/docs/reference/groupby.html</a></p>
......@@ -807,10 +807,10 @@
<div class='new-meta-box'>
<div class="new-meta-item date" itemprop="dateUpdated" datetime="2020-08-06T11:31:25+08:00">
<div class="new-meta-item date" itemprop="dateUpdated" datetime="2020-08-07T09:13:45+08:00">
<a class='notlink'>
<i class="fas fa-clock" aria-hidden="true"></i>
<p>最后更新于 2020年8月6</p>
<p>最后更新于 2020年8月7</p>
</a>
</div>
......
......@@ -682,10 +682,10 @@
<div class="full-width auto-padding tags">
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/主题个性化/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;主题个性化</a>
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/Material-X/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Material X</a>
<a href="/tags/spfk/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;spfk</a>
......@@ -961,10 +961,10 @@
<div class="full-width auto-padding tags">
<a href="/tags/Github-Pages/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Github Pages</a>
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/Github-Pages/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Github Pages</a>
</div>
</section>
......
......@@ -668,10 +668,10 @@
<div class="full-width auto-padding tags">
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/主题个性化/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;主题个性化</a>
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/Material-X/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Material X</a>
<a href="/tags/spfk/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;spfk</a>
......@@ -947,10 +947,10 @@
<div class="full-width auto-padding tags">
<a href="/tags/Github-Pages/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Github Pages</a>
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/Github-Pages/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Github Pages</a>
</div>
</section>
......
因为 它太大了无法显示 source diff 。你可以改为 查看blob
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://www.itrhx.com/2020/06/17/A84-Pandas-06/</loc>
<lastmod>2020-08-07</lastmod>
</url> <url>
<loc>https://www.itrhx.com/2020/07/13/A90-pyspider-51job/</loc>
<lastmod>2020-08-06</lastmod>
</url> <url>
......@@ -9,9 +12,6 @@
</url> <url>
<loc>https://www.itrhx.com/2020/06/26/A88-Pandas-10/</loc>
<lastmod>2020-08-06</lastmod>
</url> <url>
<loc>https://www.itrhx.com/2020/06/17/A84-Pandas-06/</loc>
<lastmod>2020-08-06</lastmod>
</url> <url>
<loc>https://www.itrhx.com/2020/06/11/A79-Pandas-01/</loc>
<lastmod>2020-08-06</lastmod>
......
......@@ -1473,10 +1473,10 @@
<div class="full-width auto-padding tags">
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/主题个性化/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;主题个性化</a>
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/Material-X/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Material X</a>
<a href="/tags/spfk/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;spfk</a>
......@@ -1623,10 +1623,10 @@
<div class="full-width auto-padding tags">
<a href="/tags/Github-Pages/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Github Pages</a>
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/Github-Pages/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Github Pages</a>
</div>
</section>
......
因为 它太大了无法显示 source diff 。你可以改为 查看blob
......@@ -1229,10 +1229,10 @@
<div class="full-width auto-padding tags">
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/主题个性化/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;主题个性化</a>
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/Material-X/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Material X</a>
<a href="/tags/spfk/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;spfk</a>
......@@ -1508,10 +1508,10 @@
<div class="full-width auto-padding tags">
<a href="/tags/Github-Pages/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Github Pages</a>
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/Github-Pages/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Github Pages</a>
</div>
</section>
......
......@@ -84,7 +84,7 @@
<entry>
<title><![CDATA[Python 数据分析三剑客之 Pandas(六):GroupBy 数据分裂/应用/合并]]></title>
<url>%2F2020%2F06%2F17%2FA84-Pandas-06%2F</url>
<content type="text"><![CDATA[Pandas 系列文章: Python 数据分析三剑客之 Pandas(一):认识 Pandas 及其 Series、DataFrame 对象 Python 数据分析三剑客之 Pandas(二):Index 索引对象以及各种索引操作 Python 数据分析三剑客之 Pandas(三):算术运算与缺失值的处理 Python 数据分析三剑客之 Pandas(四):函数应用、映射、排序和层级索引 Python 数据分析三剑客之 Pandas(五):统计计算与统计描述 Python 数据分析三剑客之 Pandas(六):GroupBy 数据分裂、应用与合并 Python 数据分析三剑客之 Pandas(七):合并数据集 Python 数据分析三剑客之 Pandas(八):数据重塑、重复数据处理与数据替换 Python 数据分析三剑客之 Pandas(九):时间序列 Python 数据分析三剑客之 Pandas(十):数据读写 专栏: 【NumPy 专栏】【Pandas 专栏】【Matplotlib 专栏】 推荐学习资料与网站: 【NumPy 中文网】【Pandas 中文网】【Matplotlib 中文网】【NumPy、Matplotlib、Pandas 速查表】 12345这里是一段防爬虫文本,请读者忽略。本文原创首发于 CSDN,作者 TRHX。博客首页:https://itrhx.blog.csdn.net/本文链接:https://itrhx.blog.csdn.net/article/details/106804881未经授权,禁止转载!恶意转载,后果自负!尊重原创,远离剽窃! 【01x00】GroupBy 机制对数据集进行分组并对各组应用一个函数(无论是聚合还是转换),通常是数据分析工作中的重要环节。在将数据集加载、融合、准备好之后,通常就是计算分组统计或生成透视表。Pandas 提供了一个灵活高效的 GroupBy 功能,虽然“分组”(group by)这个名字是借用 SQL 数据库语言的命令,但其理念引用发明 R 语言 frame 的 Hadley Wickham 的观点可能更合适:分裂(Split)、应用(Apply)和组合(Combine)。 分组运算过程:Split —&gt; Apply —&gt; Combine 分裂(Split):根据某些标准将数据分组; 应用(Apply):对每个组独立应用一个函数; 合并(Combine):把每个分组的计算结果合并起来。 官方介绍:https://pandas.pydata.org/docs/user_guide/groupby.html 【02x00】GroupBy 对象常见的 GroupBy 对象:Series.groupby、DataFrame.groupby,基本语法如下: 123456789Series.groupby(self, by=None, axis=0, level=None, as_index: bool = True, sort: bool = True, group_keys: bool = True, squeeze: bool = False, observed: bool = False) → ’groupby_generic.SeriesGroupBy’ 123456789DataFrame.groupby(self, by=None, axis=0, level=None, as_index: bool = True, sort: bool = True, group_keys: bool = True, squeeze: bool = False, observed: bool = False) → ’groupby_generic.DataFrameGroupBy’ 官方文档: https://pandas.pydata.org/docs/reference/api/pandas.Series.groupby.html https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html 常用参数解释如下: 参数 描述 by 映射、函数、标签或标签列表,用于确定分组依据的分组。如果 by 是函数,则会在对象索引的每个值上调用它。 如果传递了 dict 或 Series,则将使用 Series 或 dict 的值来确定组(将 Series 的值首先对齐;请参见.align() 方法)。 如果传递了 ndarray,则按原样使用这些值来确定组。标签或标签列表可以按自身中的列传递给分组。 注意,元组被解释为(单个)键 axis 沿指定轴拆分,默认 0,0 or ‘index’,1 or ‘columns’,只有在 DataFrame 中才有 1 or &#39;columns’ level 如果轴是 MultiIndex(层次结构),则按特定层级进行分组,默认 None as_index bool 类型,默认 True,对于聚合输出,返回以组标签为索引的对象。仅与 DataFrame 输入相关。as_index=False 实际上是“SQL样式”分组输出 sort bool 类型,默认 True,对组键排序。关闭此选项可获得更好的性能。注:这不影响每组的观察顺序。Groupby 保留每个组中行的顺序 group_keys bool 类型,默认 True,调用 apply 方法时,是否将组键(keys)添加到索引( index)以标识块 squeeze bool 类型,默认 False,如果可能,减少返回类型的维度,否则返回一致的类型 groupby() 进行分组,GroupBy 对象没有进行实际运算,只是包含分组的中间数据,示例如下: 123456789101112131415161718192021222324&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; data = &#123;'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'], 'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'data1': np.random.randn(8), 'data2': np.random.randn(8)&#125;&gt;&gt;&gt; &gt;&gt;&gt; obj = pd.DataFrame(data)&gt;&gt;&gt; obj key1 key2 data1 data20 a one -0.804160 -0.8689051 b one -0.086990 0.3257412 a two 0.757992 0.5411013 b three -0.281435 0.0978414 a two 0.817757 -0.6436995 b two -0.462760 -0.3211966 a one -0.403699 0.6021387 a three 0.883940 -0.850526&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1')&lt;pandas.core.groupby.generic.DataFrameGroupBy object at 0x03CDB7C0&gt;&gt;&gt;&gt; &gt;&gt;&gt; obj['data1'].groupby(obj['key1'])&lt;pandas.core.groupby.generic.SeriesGroupBy object at 0x03CDB748&gt; 【03x00】GroupBy Split 数据分裂【03x01】分组运算前面通过 groupby() 方法获得了一个 GroupBy 对象,它实际上还没有进行任何计算,只是含有一些有关分组键 obj[&#39;key1&#39;] 的中间数据而已。换句话说,该对象已经有了接下来对各分组执行运算所需的一切信息。例如,我们可以调用 GroupBy 的 mean() 方法来计算分组平均值,size() 方法返回每个分组的元素个数: 123456789101112131415161718192021222324252627282930313233343536373839404142434445&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; data = &#123;'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'], 'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'data1': np.random.randn(8), 'data2': np.random.randn(8)&#125;&gt;&gt;&gt; &gt;&gt;&gt; obj = pd.DataFrame(data)&gt;&gt;&gt; obj key1 key2 data1 data20 a one -0.544099 -0.6140791 b one 2.193712 0.1010052 a two -0.004683 0.8827703 b three 0.312858 1.7321054 a two 0.011089 0.0895875 b two 0.292165 1.3276386 a one -1.433291 -0.2389717 a three -0.004724 -2.117326&gt;&gt;&gt; &gt;&gt;&gt; grouped1 = obj.groupby('key1')&gt;&gt;&gt; grouped2 = obj['data1'].groupby(obj['key1'])&gt;&gt;&gt; &gt;&gt;&gt; grouped1.mean() data1 data2key1 a -0.395142 -0.399604b 0.932912 1.053583&gt;&gt;&gt; &gt;&gt;&gt; grouped2.mean()key1a -0.395142b 0.932912Name: data1, dtype: float64&gt;&gt;&gt;&gt;&gt;&gt; grouped1.size()key1a 5b 3dtype: int64&gt;&gt;&gt; &gt;&gt;&gt; grouped2.size()key1a 5b 3Name: data1, dtype: int64 【03x02】按类型按列分组groupby() 方法 axis 参数默认是 0,通过设置也可以在其他任何轴上进行分组,也支持按照类型(dtype)进行分组: 12345678910111213141516171819202122232425262728293031323334353637383940&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; data = &#123;'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'], 'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'data1': np.random.randn(8), 'data2': np.random.randn(8)&#125;&gt;&gt;&gt; obj = pd.DataFrame(data)&gt;&gt;&gt; obj key1 key2 data1 data20 a one -0.607009 1.9483011 b one 0.150818 -0.0250952 a two -2.086024 0.3581643 b three 0.446061 1.7087974 a two 0.745457 -0.9809485 b two 0.981877 2.1593276 a one 0.804480 -0.4996617 a three 0.112884 0.004367&gt;&gt;&gt; &gt;&gt;&gt; obj.dtypeskey1 objectkey2 objectdata1 float64data2 float64dtype: object&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby(obj.dtypes, axis=1).size()float64 2object 2dtype: int64&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby(obj.dtypes, axis=1).sum() float64 object0 1.341291 aone1 0.125723 bone2 -1.727860 atwo3 2.154858 bthree4 -0.235491 atwo5 3.141203 btwo6 0.304819 aone7 0.117251 athree 【03x03】自定义分组groupby() 方法中可以一次传入多个数组的列表,也可以自定义一组分组键。也可以通过一个字典、一个函数,或者按照索引层级进行分组。 传入多个数组的列表: 12345678910111213141516171819202122232425262728293031323334&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; data = &#123;'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'], 'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'data1': np.random.randn(8), 'data2': np.random.randn(8)&#125;&gt;&gt;&gt; obj = pd.DataFrame(data)&gt;&gt;&gt; obj key1 key2 data1 data20 a one -0.841652 0.6880551 b one 0.510042 -0.5611712 a two -0.418862 -0.1459833 b three -1.104698 0.5631584 a two 0.329527 -0.8931085 b two 0.753653 -0.3425206 a one -0.882527 -1.1213297 a three 1.726794 0.160244&gt;&gt;&gt; &gt;&gt;&gt; means = obj['data1'].groupby([obj['key1'], obj['key2']]).mean()&gt;&gt;&gt; meanskey1 key2 a one -0.862090 three 1.726794 two -0.044667b one 0.510042 three -1.104698 two 0.753653Name: data1, dtype: float64&gt;&gt;&gt; &gt;&gt;&gt; means.unstack()key2 one three twokey1 a -0.862090 1.726794 -0.044667b 0.510042 -1.104698 0.753653 自定义分组键: 1234567891011121314151617181920212223&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; obj = pd.DataFrame(&#123;'key1' : ['a', 'a', 'b', 'b', 'a'], 'key2' : ['one', 'two', 'one', 'two', 'one'], 'data1' : np.random.randn(5), 'data2' : np.random.randn(5)&#125;)&gt;&gt;&gt; obj key1 key2 data1 data20 a one -0.024003 0.3504801 a two -0.767534 -0.1004262 b one -0.594983 -1.9455803 b two -0.374482 0.8175924 a one 0.755452 -0.137759&gt;&gt;&gt; &gt;&gt;&gt; states = np.array(['Wuhan', 'Beijing', 'Beijing', 'Wuhan', 'Wuhan'])&gt;&gt;&gt; years = np.array([2005, 2005, 2006, 2005, 2006])&gt;&gt;&gt; &gt;&gt;&gt; obj['data1'].groupby([states, years]).mean()Beijing 2005 -0.767534 2006 -0.594983Wuhan 2005 -0.199242 2006 0.755452Name: data1, dtype: float64 【03x03x01】字典分组通过字典进行分组: 1234567891011121314151617181920212223242526272829303132333435&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; obj = pd.DataFrame(np.random.randint(1, 10, (5,5)), columns=['a', 'b', 'c', 'd', 'e'], index=['A', 'B', 'C', 'D', 'E'])&gt;&gt;&gt; obj a b c d eA 1 4 7 1 9B 8 2 4 7 8C 9 8 2 5 1D 2 4 2 8 3E 7 5 7 2 3&gt;&gt;&gt; &gt;&gt;&gt; obj_dict = &#123;'a':'Python', 'b':'Python', 'c':'Java', 'd':'C++', 'e':'Java'&#125;&gt;&gt;&gt; obj.groupby(obj_dict, axis=1).size()C++ 1Java 2Python 2dtype: int64&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby(obj_dict, axis=1).count() C++ Java PythonA 1 2 2B 1 2 2C 1 2 2D 1 2 2E 1 2 2&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby(obj_dict, axis=1).sum() C++ Java PythonA 1 16 5B 7 12 10C 5 3 17D 8 5 6E 2 10 12 【03x03x02】函数分组通过函数进行分组: 123456789101112131415161718192021222324&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; obj = pd.DataFrame(np.random.randint(1, 10, (5,5)), columns=['a', 'b', 'c', 'd', 'e'], index=['AA', 'BBB', 'CC', 'D', 'EE'])&gt;&gt;&gt; obj a b c d eAA 3 9 5 8 2BBB 1 4 2 2 6CC 9 2 4 7 6D 2 5 5 7 1EE 8 8 8 2 2&gt;&gt;&gt; &gt;&gt;&gt; def group_key(idx): """ idx 为列索引或行索引 """ return len(idx)&gt;&gt;&gt; obj.groupby(group_key).size() # 等价于 obj.groupby(len).size()1 12 33 1dtype: int64 【03x03x03】索引层级分组通过不同索引层级进行分组: 1234567891011121314151617181920212223242526272829&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; columns = pd.MultiIndex.from_arrays([['Python', 'Java', 'Python', 'Java', 'Python'], ['A', 'A', 'B', 'C', 'B']], names=['language', 'index'])&gt;&gt;&gt; obj = pd.DataFrame(np.random.randint(1, 10, (5, 5)), columns=columns)&gt;&gt;&gt; objlanguage Python Java Python Java Pythonindex A A B C B0 7 1 9 8 51 4 5 4 5 62 4 3 1 9 53 6 6 3 8 14 7 9 2 8 2&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby(level='language', axis=1).sum()language Java Python0 9 211 10 142 12 103 14 104 17 11&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby(level='index', axis=1).sum()index A B C0 8 14 81 9 10 52 7 6 93 12 4 84 16 4 8 【03x04】分组迭代GroupBy 对象支持迭代,对于单层分组,可以产生一组二元元组,由分组名和数据块组成: 1234567891011121314151617181920212223242526272829303132333435&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; data = &#123;'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'], 'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'data1': np.random.randn(8), 'data2': np.random.randn(8)&#125;&gt;&gt;&gt; obj = pd.DataFrame(data)&gt;&gt;&gt; obj key1 key2 data1 data20 a one -1.088762 0.6685041 b one 0.275500 0.7878442 a two -0.108417 -0.4912963 b three 0.019524 -0.3633904 a two 0.453612 0.7969995 b two 1.982858 1.5018776 a one 1.101132 -1.9283627 a three 0.524775 -1.205842&gt;&gt;&gt; &gt;&gt;&gt; for group_name, group_data in obj.groupby('key1'): print(group_name) print(group_data) a key1 key2 data1 data20 a one -1.088762 0.6685042 a two -0.108417 -0.4912964 a two 0.453612 0.7969996 a one 1.101132 -1.9283627 a three 0.524775 -1.205842b key1 key2 data1 data21 b one 0.275500 0.7878443 b three 0.019524 -0.3633905 b two 1.982858 1.501877 对于多层分组,元组的第一个元素将会是由键值组成的元组,第二个元素为数据块: 12345678910111213141516171819202122232425262728293031323334353637383940414243&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; data = &#123;'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'], 'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'data1': np.random.randn(8), 'data2': np.random.randn(8)&#125;&gt;&gt;&gt; obj = pd.DataFrame(data)&gt;&gt;&gt; obj key1 key2 data1 data20 a one -1.088762 0.6685041 b one 0.275500 0.7878442 a two -0.108417 -0.4912963 b three 0.019524 -0.3633904 a two 0.453612 0.7969995 b two 1.982858 1.5018776 a one 1.101132 -1.9283627 a three 0.524775 -1.205842&gt;&gt;&gt; &gt;&gt;&gt; for group_name, group_data in obj.groupby(['key1', 'key2']): print(group_name) print(group_data) ('a', 'one') key1 key2 data1 data20 a one -1.088762 0.6685046 a one 1.101132 -1.928362('a', 'three') key1 key2 data1 data27 a three 0.524775 -1.205842('a', 'two') key1 key2 data1 data22 a two -0.108417 -0.4912964 a two 0.453612 0.796999('b', 'one') key1 key2 data1 data21 b one 0.2755 0.787844('b', 'three') key1 key2 data1 data23 b three 0.019524 -0.36339('b', 'two') key1 key2 data1 data25 b two 1.982858 1.501877 【03x05】对象转换GroupBy 对象支持转换成列表或字典: 123456789101112131415161718192021222324252627282930313233343536373839404142&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; data = &#123;'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'], 'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'data1': np.random.randn(8), 'data2': np.random.randn(8)&#125;&gt;&gt;&gt; obj = pd.DataFrame(data)&gt;&gt;&gt; obj key1 key2 data1 data20 a one -0.607009 1.9483011 b one 0.150818 -0.0250952 a two -2.086024 0.3581643 b three 0.446061 1.7087974 a two 0.745457 -0.9809485 b two 0.981877 2.1593276 a one 0.804480 -0.4996617 a three 0.112884 0.004367&gt;&gt;&gt; &gt;&gt;&gt; grouped = obj.groupby('key1')&gt;&gt;&gt; list(grouped1)[('a', key1 key2 data1 data20 a one -0.607009 1.9483012 a two -2.086024 0.3581644 a two 0.745457 -0.9809486 a one 0.804480 -0.4996617 a three 0.112884 0.004367),('b', key1 key2 data1 data21 b one 0.150818 -0.0250953 b three 0.446061 1.7087975 b two 0.981877 2.159327)]&gt;&gt;&gt; &gt;&gt;&gt; dict(list(grouped1))&#123;'a': key1 key2 data1 data20 a one -0.607009 1.9483012 a two -2.086024 0.3581644 a two 0.745457 -0.9809486 a one 0.804480 -0.4996617 a three 0.112884 0.004367,'b': key1 key2 data1 data21 b one 0.150818 -0.0250953 b three 0.446061 1.7087975 b two 0.981877 2.159327&#125; 【04x00】GroupBy Apply 数据应用聚合指的是任何能够从数组产生标量值的数据转换过程,常用于对分组后的数据进行计算 【04x01】聚合函数之前的例子已经用过一些内置的聚合函数,比如 mean、count、min 以及 sum 等。常见的聚合运算如下表所示: 官方文档:https://pandas.pydata.org/docs/reference/groupby.html 方法 描述 count 非NA值的数量 describe 针对Series或各DataFrame列计算汇总统计 min 计算最小值 max 计算最大值 argmin 计算能够获取到最小值的索引位置(整数) argmax 计算能够获取到最大值的索引位置(整数) idxmin 计算能够获取到最小值的索引值 idxmax 计算能够获取到最大值的索引值 quantile 计算样本的分位数(0到1) sum 值的总和 mean 值的平均数 median 值的算术中位数(50%分位数) mad 根据平均值计算平均绝对离差 var 样本值的方差 std 样本值的标准差 应用示例: 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; obj = &#123;'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'], 'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'data1': np.random.randint(1,10, 8), 'data2': np.random.randint(1,10, 8)&#125;&gt;&gt;&gt; obj = pd.DataFrame(obj)&gt;&gt;&gt; obj key1 key2 data1 data20 a one 9 71 b one 5 92 a two 2 43 b three 3 44 a two 5 15 b two 5 96 a one 1 87 a three 2 4&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').sum() data1 data2key1 a 19 24b 13 22&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').max() key2 data1 data2key1 a two 9 8b two 5 9&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').min() key2 data1 data2key1 a one 1 1b one 3 4&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').mean() data1 data2key1 a 3.800000 4.800000b 4.333333 7.333333&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').size()key1a 5b 3dtype: int64&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').count() key2 data1 data2key1 a 5 5 5b 3 3 3&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').describe() data1 ... data2 count mean std min 25% ... min 25% 50% 75% maxkey1 ... a 5.0 3.800000 3.271085 1.0 2.0 ... 1.0 4.0 4.0 7.0 8.0b 3.0 4.333333 1.154701 3.0 4.0 ... 4.0 6.5 9.0 9.0 9.0[2 rows x 16 columns] 【04x02】自定义函数如果自带的内置函数满足不了我们的要求,则可以自定义一个聚合函数,然后传入 GroupBy.agg(func) 或 GroupBy.aggregate(func) 方法中即可。func 的参数为 groupby 索引对应的记录。 123456789101112131415161718192021222324252627282930313233&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; obj = &#123;'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'], 'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'data1': np.random.randint(1,10, 8), 'data2': np.random.randint(1,10, 8)&#125;&gt;&gt;&gt; obj = pd.DataFrame(obj)&gt;&gt;&gt; obj key1 key2 data1 data20 a one 9 71 b one 5 92 a two 2 43 b three 3 44 a two 5 15 b two 5 96 a one 1 87 a three 2 4&gt;&gt;&gt; &gt;&gt;&gt; def peak_range(df): return df.max() - df.min()&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').agg(peak_range) data1 data2key1 a 8 7b 2 5&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').agg(lambda df : df.max() - df.min()) data1 data2key1 a 8 7b 2 5 【04x03】对不同列作用不同函数使用字典可以对不同列作用不同的聚合函数: 123456789101112131415161718192021222324252627282930313233&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; obj = &#123;'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'], 'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'data1': np.random.randint(1,10, 8), 'data2': np.random.randint(1,10, 8)&#125;&gt;&gt;&gt; obj = pd.DataFrame(obj)&gt;&gt;&gt; obj key1 key2 data1 data20 a one 9 71 b one 5 92 a two 2 43 b three 3 44 a two 5 15 b two 5 96 a one 1 87 a three 2 4&gt;&gt;&gt; &gt;&gt;&gt; dict1 = &#123;'data1':'mean', 'data2':'sum'&#125;&gt;&gt;&gt; dict2 = &#123;'data1':['mean','max'], 'data2':'sum'&#125;&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').agg(dict1) data1 data2key1 a 3.800000 24b 4.333333 22&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').agg(dict2) data1 data2 mean max sumkey1 a 3.800000 9 24b 4.333333 5 22 【04x04】GroupBy.apply()apply() 方法会将待处理的对象拆分成多个片段,然后对各片段调用传入的函数,最后尝试将各片段组合到一起。 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; obj = pd.DataFrame(&#123;'A':['bob','sos','bob','sos','bob','sos','bob','bob'], 'B':['one','one','two','three','two','two','one','three'], 'C':[3,1,4,1,5,9,2,6], 'D':[1,2,3,4,5,6,7,8]&#125;)&gt;&gt;&gt; obj A B C D0 bob one 3 11 sos one 1 22 bob two 4 33 sos three 1 44 bob two 5 55 sos two 9 66 bob one 2 77 bob three 6 8&gt;&gt;&gt; &gt;&gt;&gt; grouped = obj.groupby('A')&gt;&gt;&gt; for name, group in grouped: print(name) print(group) bob A B C D0 bob one 3 12 bob two 4 34 bob two 5 56 bob one 2 77 bob three 6 8sos A B C D1 sos one 1 23 sos three 1 45 sos two 9 6&gt;&gt;&gt; &gt;&gt;&gt; grouped.apply(lambda x:x.describe()) # 对 bob 和 sos 两组数据使用 describe 方法 C DA bob count 5.000000 5.000000 mean 4.000000 4.800000 std 1.581139 2.863564 min 2.000000 1.000000 25% 3.000000 3.000000 50% 4.000000 5.000000 75% 5.000000 7.000000 max 6.000000 8.000000sos count 3.000000 3.000000 mean 3.666667 4.000000 std 4.618802 2.000000 min 1.000000 2.000000 25% 1.000000 3.000000 50% 1.000000 4.000000 75% 5.000000 5.000000 max 9.000000 6.000000&gt;&gt;&gt;&gt;&gt;&gt; grouped.apply(lambda x:x.min()) # # 对 bob 和 sos 两组数据使用 min 方法 A B C DA bob bob one 2 1sos sos one 1 2 12345这里是一段防爬虫文本,请读者忽略。本文原创首发于 CSDN,作者 TRHX。博客首页:https://itrhx.blog.csdn.net/本文链接:https://itrhx.blog.csdn.net/article/details/106804881未经授权,禁止转载!恶意转载,后果自负!尊重原创,远离剽窃!]]></content>
<content type="text"><![CDATA[Pandas 系列文章: Python 数据分析三剑客之 Pandas(一):认识 Pandas 及其 Series、DataFrame 对象 Python 数据分析三剑客之 Pandas(二):Index 索引对象以及各种索引操作 Python 数据分析三剑客之 Pandas(三):算术运算与缺失值的处理 Python 数据分析三剑客之 Pandas(四):函数应用、映射、排序和层级索引 Python 数据分析三剑客之 Pandas(五):统计计算与统计描述 Python 数据分析三剑客之 Pandas(六):GroupBy 数据分裂、应用与合并 Python 数据分析三剑客之 Pandas(七):合并数据集 Python 数据分析三剑客之 Pandas(八):数据重塑、重复数据处理与数据替换 Python 数据分析三剑客之 Pandas(九):时间序列 Python 数据分析三剑客之 Pandas(十):数据读写 专栏: 【NumPy 专栏】【Pandas 专栏】【Matplotlib 专栏】 推荐学习资料与网站: 【NumPy 中文网】【Pandas 中文网】【Matplotlib 中文网】【NumPy、Matplotlib、Pandas 速查表】 12345这里是一段防爬虫文本,请读者忽略。本文原创首发于 CSDN,作者 TRHX。博客首页:https://itrhx.blog.csdn.net/本文链接:https://itrhx.blog.csdn.net/article/details/106804881未经授权,禁止转载!恶意转载,后果自负!尊重原创,远离剽窃! 【01x00】GroupBy 机制对数据集进行分组并对各组应用一个函数(无论是聚合还是转换),通常是数据分析工作中的重要环节。在将数据集加载、融合、准备好之后,通常就是计算分组统计或生成透视表。Pandas 提供了一个灵活高效的 GroupBy 功能,虽然“分组”(group by)这个名字是借用 SQL 数据库语言的命令,但其理念引用发明 R 语言 frame 的 Hadley Wickham 的观点可能更合适:分裂(Split)、应用(Apply)和组合(Combine)。 分组运算过程:Split —&gt; Apply —&gt; Combine 分裂(Split):根据某些标准将数据分组; 应用(Apply):对每个组独立应用一个函数; 合并(Combine):把每个分组的计算结果合并起来。 官方介绍:https://pandas.pydata.org/docs/user_guide/groupby.html 【02x00】GroupBy 对象常见的 GroupBy 对象:Series.groupby、DataFrame.groupby,基本语法如下: 123456789Series.groupby(self, by=None, axis=0, level=None, as_index: bool = True, sort: bool = True, group_keys: bool = True, squeeze: bool = False, observed: bool = False) → ’groupby_generic.SeriesGroupBy’ 123456789DataFrame.groupby(self, by=None, axis=0, level=None, as_index: bool = True, sort: bool = True, group_keys: bool = True, squeeze: bool = False, observed: bool = False) → ’groupby_generic.DataFrameGroupBy’ 官方文档: https://pandas.pydata.org/docs/reference/api/pandas.Series.groupby.html https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html 常用参数解释如下: 参数 描述 by 映射、函数、标签或标签列表,用于确定分组依据的分组。如果 by 是函数,则会在对象索引的每个值上调用它。 如果传递了 dict 或 Series,则将使用 Series 或 dict 的值来确定组(将 Series 的值首先对齐;请参见.align() 方法)。 如果传递了 ndarray,则按原样使用这些值来确定组。标签或标签列表可以按自身中的列传递给分组。 注意,元组被解释为(单个)键 axis 沿指定轴拆分,默认 0,0 or ‘index’,1 or ‘columns’,只有在 DataFrame 中才有 1 or &#39;columns’ level 如果轴是 MultiIndex(层次结构),则按特定层级进行分组,默认 None as_index bool 类型,默认 True,对于聚合输出,返回以组标签为索引的对象。仅与 DataFrame 输入相关。as_index=False 实际上是“SQL样式”分组输出 sort bool 类型,默认 True,对组键排序。关闭此选项可获得更好的性能。注:这不影响每组的观察顺序。Groupby 保留每个组中行的顺序 group_keys bool 类型,默认 True,调用 apply 方法时,是否将组键(keys)添加到索引( index)以标识块 squeeze bool 类型,默认 False,如果可能,减少返回类型的维度,否则返回一致的类型 groupby() 进行分组,GroupBy 对象没有进行实际运算,只是包含分组的中间数据,示例如下: 123456789101112131415161718192021222324&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; data = &#123;'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'], 'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'data1': np.random.randn(8), 'data2': np.random.randn(8)&#125;&gt;&gt;&gt; &gt;&gt;&gt; obj = pd.DataFrame(data)&gt;&gt;&gt; obj key1 key2 data1 data20 a one -0.804160 -0.8689051 b one -0.086990 0.3257412 a two 0.757992 0.5411013 b three -0.281435 0.0978414 a two 0.817757 -0.6436995 b two -0.462760 -0.3211966 a one -0.403699 0.6021387 a three 0.883940 -0.850526&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1')&lt;pandas.core.groupby.generic.DataFrameGroupBy object at 0x03CDB7C0&gt;&gt;&gt;&gt; &gt;&gt;&gt; obj['data1'].groupby(obj['key1'])&lt;pandas.core.groupby.generic.SeriesGroupBy object at 0x03CDB748&gt; 【03x00】GroupBy Split 数据分裂【03x01】分组运算前面通过 groupby() 方法获得了一个 GroupBy 对象,它实际上还没有进行任何计算,只是含有一些有关分组键 obj[&#39;key1&#39;] 的中间数据而已。换句话说,该对象已经有了接下来对各分组执行运算所需的一切信息。例如,我们可以调用 GroupBy 的 mean() 方法来计算分组平均值,size() 方法返回每个分组的元素个数: 123456789101112131415161718192021222324252627282930313233343536373839404142434445&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; data = &#123;'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'], 'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'data1': np.random.randn(8), 'data2': np.random.randn(8)&#125;&gt;&gt;&gt; &gt;&gt;&gt; obj = pd.DataFrame(data)&gt;&gt;&gt; obj key1 key2 data1 data20 a one -0.544099 -0.6140791 b one 2.193712 0.1010052 a two -0.004683 0.8827703 b three 0.312858 1.7321054 a two 0.011089 0.0895875 b two 0.292165 1.3276386 a one -1.433291 -0.2389717 a three -0.004724 -2.117326&gt;&gt;&gt; &gt;&gt;&gt; grouped1 = obj.groupby('key1')&gt;&gt;&gt; grouped2 = obj['data1'].groupby(obj['key1'])&gt;&gt;&gt; &gt;&gt;&gt; grouped1.mean() data1 data2key1 a -0.395142 -0.399604b 0.932912 1.053583&gt;&gt;&gt; &gt;&gt;&gt; grouped2.mean()key1a -0.395142b 0.932912Name: data1, dtype: float64&gt;&gt;&gt;&gt;&gt;&gt; grouped1.size()key1a 5b 3dtype: int64&gt;&gt;&gt; &gt;&gt;&gt; grouped2.size()key1a 5b 3Name: data1, dtype: int64 【03x02】按类型按列分组groupby() 方法 axis 参数默认是 0,通过设置也可以在其他任何轴上进行分组,也支持按照类型(dtype)进行分组: 12345678910111213141516171819202122232425262728293031323334353637383940&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; data = &#123;'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'], 'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'data1': np.random.randn(8), 'data2': np.random.randn(8)&#125;&gt;&gt;&gt; obj = pd.DataFrame(data)&gt;&gt;&gt; obj key1 key2 data1 data20 a one -0.607009 1.9483011 b one 0.150818 -0.0250952 a two -2.086024 0.3581643 b three 0.446061 1.7087974 a two 0.745457 -0.9809485 b two 0.981877 2.1593276 a one 0.804480 -0.4996617 a three 0.112884 0.004367&gt;&gt;&gt; &gt;&gt;&gt; obj.dtypeskey1 objectkey2 objectdata1 float64data2 float64dtype: object&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby(obj.dtypes, axis=1).size()float64 2object 2dtype: int64&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby(obj.dtypes, axis=1).sum() float64 object0 1.341291 aone1 0.125723 bone2 -1.727860 atwo3 2.154858 bthree4 -0.235491 atwo5 3.141203 btwo6 0.304819 aone7 0.117251 athree 【03x03】自定义分组groupby() 方法中可以一次传入多个数组的列表,也可以自定义一组分组键。也可以通过一个字典、一个函数,或者按照索引层级进行分组。 传入多个数组的列表: 12345678910111213141516171819202122232425262728293031323334&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; data = &#123;'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'], 'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'data1': np.random.randn(8), 'data2': np.random.randn(8)&#125;&gt;&gt;&gt; obj = pd.DataFrame(data)&gt;&gt;&gt; obj key1 key2 data1 data20 a one -0.841652 0.6880551 b one 0.510042 -0.5611712 a two -0.418862 -0.1459833 b three -1.104698 0.5631584 a two 0.329527 -0.8931085 b two 0.753653 -0.3425206 a one -0.882527 -1.1213297 a three 1.726794 0.160244&gt;&gt;&gt; &gt;&gt;&gt; means = obj['data1'].groupby([obj['key1'], obj['key2']]).mean()&gt;&gt;&gt; meanskey1 key2 a one -0.862090 three 1.726794 two -0.044667b one 0.510042 three -1.104698 two 0.753653Name: data1, dtype: float64&gt;&gt;&gt; &gt;&gt;&gt; means.unstack()key2 one three twokey1 a -0.862090 1.726794 -0.044667b 0.510042 -1.104698 0.753653 自定义分组键: 1234567891011121314151617181920212223&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; obj = pd.DataFrame(&#123;'key1' : ['a', 'a', 'b', 'b', 'a'], 'key2' : ['one', 'two', 'one', 'two', 'one'], 'data1' : np.random.randn(5), 'data2' : np.random.randn(5)&#125;)&gt;&gt;&gt; obj key1 key2 data1 data20 a one -0.024003 0.3504801 a two -0.767534 -0.1004262 b one -0.594983 -1.9455803 b two -0.374482 0.8175924 a one 0.755452 -0.137759&gt;&gt;&gt; &gt;&gt;&gt; states = np.array(['Wuhan', 'Beijing', 'Beijing', 'Wuhan', 'Wuhan'])&gt;&gt;&gt; years = np.array([2005, 2005, 2006, 2005, 2006])&gt;&gt;&gt; &gt;&gt;&gt; obj['data1'].groupby([states, years]).mean()Beijing 2005 -0.767534 2006 -0.594983Wuhan 2005 -0.199242 2006 0.755452Name: data1, dtype: float64 【03x03x01】字典分组通过字典进行分组: 1234567891011121314151617181920212223242526272829303132333435&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; obj = pd.DataFrame(np.random.randint(1, 10, (5,5)), columns=['a', 'b', 'c', 'd', 'e'], index=['A', 'B', 'C', 'D', 'E'])&gt;&gt;&gt; obj a b c d eA 1 4 7 1 9B 8 2 4 7 8C 9 8 2 5 1D 2 4 2 8 3E 7 5 7 2 3&gt;&gt;&gt; &gt;&gt;&gt; obj_dict = &#123;'a':'Python', 'b':'Python', 'c':'Java', 'd':'C++', 'e':'Java'&#125;&gt;&gt;&gt; obj.groupby(obj_dict, axis=1).size()C++ 1Java 2Python 2dtype: int64&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby(obj_dict, axis=1).count() C++ Java PythonA 1 2 2B 1 2 2C 1 2 2D 1 2 2E 1 2 2&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby(obj_dict, axis=1).sum() C++ Java PythonA 1 16 5B 7 12 10C 5 3 17D 8 5 6E 2 10 12 【03x03x02】函数分组通过函数进行分组: 123456789101112131415161718192021222324&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; obj = pd.DataFrame(np.random.randint(1, 10, (5,5)), columns=['a', 'b', 'c', 'd', 'e'], index=['AA', 'BBB', 'CC', 'D', 'EE'])&gt;&gt;&gt; obj a b c d eAA 3 9 5 8 2BBB 1 4 2 2 6CC 9 2 4 7 6D 2 5 5 7 1EE 8 8 8 2 2&gt;&gt;&gt; &gt;&gt;&gt; def group_key(idx): """ idx 为列索引或行索引 """ return len(idx)&gt;&gt;&gt; obj.groupby(group_key).size() # 等价于 obj.groupby(len).size()1 12 33 1dtype: int64 【03x03x03】索引层级分组通过不同索引层级进行分组: 1234567891011121314151617181920212223242526272829&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; columns = pd.MultiIndex.from_arrays([['Python', 'Java', 'Python', 'Java', 'Python'], ['A', 'A', 'B', 'C', 'B']], names=['language', 'index'])&gt;&gt;&gt; obj = pd.DataFrame(np.random.randint(1, 10, (5, 5)), columns=columns)&gt;&gt;&gt; objlanguage Python Java Python Java Pythonindex A A B C B0 7 1 9 8 51 4 5 4 5 62 4 3 1 9 53 6 6 3 8 14 7 9 2 8 2&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby(level='language', axis=1).sum()language Java Python0 9 211 10 142 12 103 14 104 17 11&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby(level='index', axis=1).sum()index A B C0 8 14 81 9 10 52 7 6 93 12 4 84 16 4 8 【03x04】分组迭代GroupBy 对象支持迭代,对于单层分组,可以产生一组二元元组,由分组名和数据块组成: 1234567891011121314151617181920212223242526272829303132333435&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; data = &#123;'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'], 'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'data1': np.random.randn(8), 'data2': np.random.randn(8)&#125;&gt;&gt;&gt; obj = pd.DataFrame(data)&gt;&gt;&gt; obj key1 key2 data1 data20 a one -1.088762 0.6685041 b one 0.275500 0.7878442 a two -0.108417 -0.4912963 b three 0.019524 -0.3633904 a two 0.453612 0.7969995 b two 1.982858 1.5018776 a one 1.101132 -1.9283627 a three 0.524775 -1.205842&gt;&gt;&gt; &gt;&gt;&gt; for group_name, group_data in obj.groupby('key1'): print(group_name) print(group_data) a key1 key2 data1 data20 a one -1.088762 0.6685042 a two -0.108417 -0.4912964 a two 0.453612 0.7969996 a one 1.101132 -1.9283627 a three 0.524775 -1.205842b key1 key2 data1 data21 b one 0.275500 0.7878443 b three 0.019524 -0.3633905 b two 1.982858 1.501877 对于多层分组,元组的第一个元素将会是由键值组成的元组,第二个元素为数据块: 12345678910111213141516171819202122232425262728293031323334353637383940414243&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; data = &#123;'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'], 'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'data1': np.random.randn(8), 'data2': np.random.randn(8)&#125;&gt;&gt;&gt; obj = pd.DataFrame(data)&gt;&gt;&gt; obj key1 key2 data1 data20 a one -1.088762 0.6685041 b one 0.275500 0.7878442 a two -0.108417 -0.4912963 b three 0.019524 -0.3633904 a two 0.453612 0.7969995 b two 1.982858 1.5018776 a one 1.101132 -1.9283627 a three 0.524775 -1.205842&gt;&gt;&gt; &gt;&gt;&gt; for group_name, group_data in obj.groupby(['key1', 'key2']): print(group_name) print(group_data) ('a', 'one') key1 key2 data1 data20 a one -1.088762 0.6685046 a one 1.101132 -1.928362('a', 'three') key1 key2 data1 data27 a three 0.524775 -1.205842('a', 'two') key1 key2 data1 data22 a two -0.108417 -0.4912964 a two 0.453612 0.796999('b', 'one') key1 key2 data1 data21 b one 0.2755 0.787844('b', 'three') key1 key2 data1 data23 b three 0.019524 -0.36339('b', 'two') key1 key2 data1 data25 b two 1.982858 1.501877 【03x05】对象转换GroupBy 对象支持转换成列表或字典: 123456789101112131415161718192021222324252627282930313233343536373839404142&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; data = &#123;'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'], 'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'data1': np.random.randn(8), 'data2': np.random.randn(8)&#125;&gt;&gt;&gt; obj = pd.DataFrame(data)&gt;&gt;&gt; obj key1 key2 data1 data20 a one -0.607009 1.9483011 b one 0.150818 -0.0250952 a two -2.086024 0.3581643 b three 0.446061 1.7087974 a two 0.745457 -0.9809485 b two 0.981877 2.1593276 a one 0.804480 -0.4996617 a three 0.112884 0.004367&gt;&gt;&gt; &gt;&gt;&gt; grouped = obj.groupby('key1')&gt;&gt;&gt; list(grouped)[('a', key1 key2 data1 data20 a one -0.607009 1.9483012 a two -2.086024 0.3581644 a two 0.745457 -0.9809486 a one 0.804480 -0.4996617 a three 0.112884 0.004367),('b', key1 key2 data1 data21 b one 0.150818 -0.0250953 b three 0.446061 1.7087975 b two 0.981877 2.159327)]&gt;&gt;&gt;&gt;&gt;&gt; dict(list(grouped))&#123;'a': key1 key2 data1 data20 a one -0.607009 1.9483012 a two -2.086024 0.3581644 a two 0.745457 -0.9809486 a one 0.804480 -0.4996617 a three 0.112884 0.004367,'b': key1 key2 data1 data21 b one 0.150818 -0.0250953 b three 0.446061 1.7087975 b two 0.981877 2.159327&#125; 【04x00】GroupBy Apply 数据应用聚合指的是任何能够从数组产生标量值的数据转换过程,常用于对分组后的数据进行计算 【04x01】聚合函数之前的例子已经用过一些内置的聚合函数,比如 mean、count、min 以及 sum 等。常见的聚合运算如下表所示: 官方文档:https://pandas.pydata.org/docs/reference/groupby.html 方法 描述 count 非NA值的数量 describe 针对Series或各DataFrame列计算汇总统计 min 计算最小值 max 计算最大值 argmin 计算能够获取到最小值的索引位置(整数) argmax 计算能够获取到最大值的索引位置(整数) idxmin 计算能够获取到最小值的索引值 idxmax 计算能够获取到最大值的索引值 quantile 计算样本的分位数(0到1) sum 值的总和 mean 值的平均数 median 值的算术中位数(50%分位数) mad 根据平均值计算平均绝对离差 var 样本值的方差 std 样本值的标准差 应用示例: 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; obj = &#123;'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'], 'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'data1': np.random.randint(1,10, 8), 'data2': np.random.randint(1,10, 8)&#125;&gt;&gt;&gt; obj = pd.DataFrame(obj)&gt;&gt;&gt; obj key1 key2 data1 data20 a one 9 71 b one 5 92 a two 2 43 b three 3 44 a two 5 15 b two 5 96 a one 1 87 a three 2 4&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').sum() data1 data2key1 a 19 24b 13 22&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').max() key2 data1 data2key1 a two 9 8b two 5 9&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').min() key2 data1 data2key1 a one 1 1b one 3 4&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').mean() data1 data2key1 a 3.800000 4.800000b 4.333333 7.333333&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').size()key1a 5b 3dtype: int64&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').count() key2 data1 data2key1 a 5 5 5b 3 3 3&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').describe() data1 ... data2 count mean std min 25% ... min 25% 50% 75% maxkey1 ... a 5.0 3.800000 3.271085 1.0 2.0 ... 1.0 4.0 4.0 7.0 8.0b 3.0 4.333333 1.154701 3.0 4.0 ... 4.0 6.5 9.0 9.0 9.0[2 rows x 16 columns] 【04x02】自定义函数如果自带的内置函数满足不了我们的要求,则可以自定义一个聚合函数,然后传入 GroupBy.agg(func) 或 GroupBy.aggregate(func) 方法中即可。func 的参数为 groupby 索引对应的记录。 123456789101112131415161718192021222324252627282930313233&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; obj = &#123;'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'], 'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'data1': np.random.randint(1,10, 8), 'data2': np.random.randint(1,10, 8)&#125;&gt;&gt;&gt; obj = pd.DataFrame(obj)&gt;&gt;&gt; obj key1 key2 data1 data20 a one 9 71 b one 5 92 a two 2 43 b three 3 44 a two 5 15 b two 5 96 a one 1 87 a three 2 4&gt;&gt;&gt; &gt;&gt;&gt; def peak_range(df): return df.max() - df.min()&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').agg(peak_range) data1 data2key1 a 8 7b 2 5&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').agg(lambda df : df.max() - df.min()) data1 data2key1 a 8 7b 2 5 【04x03】对不同列作用不同函数使用字典可以对不同列作用不同的聚合函数: 123456789101112131415161718192021222324252627282930313233&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; import numpy as np&gt;&gt;&gt; obj = &#123;'key1' : ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'a'], 'key2' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'data1': np.random.randint(1,10, 8), 'data2': np.random.randint(1,10, 8)&#125;&gt;&gt;&gt; obj = pd.DataFrame(obj)&gt;&gt;&gt; obj key1 key2 data1 data20 a one 9 71 b one 5 92 a two 2 43 b three 3 44 a two 5 15 b two 5 96 a one 1 87 a three 2 4&gt;&gt;&gt; &gt;&gt;&gt; dict1 = &#123;'data1':'mean', 'data2':'sum'&#125;&gt;&gt;&gt; dict2 = &#123;'data1':['mean','max'], 'data2':'sum'&#125;&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').agg(dict1) data1 data2key1 a 3.800000 24b 4.333333 22&gt;&gt;&gt; &gt;&gt;&gt; obj.groupby('key1').agg(dict2) data1 data2 mean max sumkey1 a 3.800000 9 24b 4.333333 5 22 【04x04】GroupBy.apply()apply() 方法会将待处理的对象拆分成多个片段,然后对各片段调用传入的函数,最后尝试将各片段组合到一起。 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960&gt;&gt;&gt; import pandas as pd&gt;&gt;&gt; obj = pd.DataFrame(&#123;'A':['bob','sos','bob','sos','bob','sos','bob','bob'], 'B':['one','one','two','three','two','two','one','three'], 'C':[3,1,4,1,5,9,2,6], 'D':[1,2,3,4,5,6,7,8]&#125;)&gt;&gt;&gt; obj A B C D0 bob one 3 11 sos one 1 22 bob two 4 33 sos three 1 44 bob two 5 55 sos two 9 66 bob one 2 77 bob three 6 8&gt;&gt;&gt; &gt;&gt;&gt; grouped = obj.groupby('A')&gt;&gt;&gt; for name, group in grouped: print(name) print(group) bob A B C D0 bob one 3 12 bob two 4 34 bob two 5 56 bob one 2 77 bob three 6 8sos A B C D1 sos one 1 23 sos three 1 45 sos two 9 6&gt;&gt;&gt; &gt;&gt;&gt; grouped.apply(lambda x:x.describe()) # 对 bob 和 sos 两组数据使用 describe 方法 C DA bob count 5.000000 5.000000 mean 4.000000 4.800000 std 1.581139 2.863564 min 2.000000 1.000000 25% 3.000000 3.000000 50% 4.000000 5.000000 75% 5.000000 7.000000 max 6.000000 8.000000sos count 3.000000 3.000000 mean 3.666667 4.000000 std 4.618802 2.000000 min 1.000000 2.000000 25% 1.000000 3.000000 50% 1.000000 4.000000 75% 5.000000 5.000000 max 9.000000 6.000000&gt;&gt;&gt;&gt;&gt;&gt; grouped.apply(lambda x:x.min()) # # 对 bob 和 sos 两组数据使用 min 方法 A B C DA bob bob one 2 1sos sos one 1 2 12345这里是一段防爬虫文本,请读者忽略。本文原创首发于 CSDN,作者 TRHX。博客首页:https://itrhx.blog.csdn.net/本文链接:https://itrhx.blog.csdn.net/article/details/106804881未经授权,禁止转载!恶意转载,后果自负!尊重原创,远离剽窃!]]></content>
<categories>
<category>Python 数据分析</category>
<category>Pandas</category>
......@@ -1140,8 +1140,8 @@
<category>Hexo</category>
</categories>
<tags>
<tag>Hexo</tag>
<tag>主题个性化</tag>
<tag>Hexo</tag>
<tag>Material X</tag>
<tag>spfk</tag>
</tags>
......@@ -1166,8 +1166,8 @@
<category>Hexo</category>
</categories>
<tags>
<tag>Github Pages</tag>
<tag>Hexo</tag>
<tag>Github Pages</tag>
</tags>
</entry>
<entry>
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://www.itrhx.com/2020/06/17/A84-Pandas-06/</loc>
<lastmod>2020-08-07T01:13:45.027Z</lastmod>
</url>
<url>
<loc>https://www.itrhx.com/2020/07/13/A90-pyspider-51job/</loc>
......@@ -22,13 +29,6 @@
</url>
<url>
<loc>https://www.itrhx.com/2020/06/17/A84-Pandas-06/</loc>
<lastmod>2020-08-06T03:31:25.195Z</lastmod>
</url>
<url>
<loc>https://www.itrhx.com/2020/06/11/A79-Pandas-01/</loc>
......@@ -422,28 +422,28 @@
</url>
<url>
<loc>https://www.itrhx.com/games/2048/index.html</loc>
<loc>https://www.itrhx.com/games/element/index.html</loc>
<lastmod>2019-12-29T06:55:50.751Z</lastmod>
</url>
<url>
<loc>https://www.itrhx.com/box/about/index.html</loc>
<loc>https://www.itrhx.com/games/cat/index.html</loc>
<lastmod>2019-12-29T06:55:50.751Z</lastmod>
</url>
<url>
<loc>https://www.itrhx.com/games/cat/index.html</loc>
<loc>https://www.itrhx.com/box/about/index.html</loc>
<lastmod>2019-12-29T06:55:50.751Z</lastmod>
</url>
<url>
<loc>https://www.itrhx.com/games/element/index.html</loc>
<loc>https://www.itrhx.com/games/2048/index.html</loc>
<lastmod>2019-12-29T06:55:50.751Z</lastmod>
......
......@@ -1342,10 +1342,10 @@
<div class="full-width auto-padding tags">
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/主题个性化/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;主题个性化</a>
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/Material-X/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Material X</a>
<a href="/tags/spfk/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;spfk</a>
......@@ -1492,10 +1492,10 @@
<div class="full-width auto-padding tags">
<a href="/tags/Github-Pages/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Github Pages</a>
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/Github-Pages/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Github Pages</a>
</div>
</section>
......
......@@ -547,10 +547,10 @@
<div class="full-width auto-padding tags">
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/主题个性化/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;主题个性化</a>
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/Material-X/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Material X</a>
<a href="/tags/spfk/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;spfk</a>
......
......@@ -547,10 +547,10 @@
<div class="full-width auto-padding tags">
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/主题个性化/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;主题个性化</a>
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/Material-X/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Material X</a>
<a href="/tags/spfk/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;spfk</a>
......
......@@ -547,10 +547,10 @@
<div class="full-width auto-padding tags">
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/主题个性化/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;主题个性化</a>
<a href="/tags/Hexo/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Hexo</a>
<a href="/tags/Material-X/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;Material X</a>
<a href="/tags/spfk/" rel="nofollow"><i class="fas fa-tags fa-fw"></i>&nbsp;spfk</a>
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册