<figureclass="highlight python"><table><tr><tdclass="gutter"><pre><spanclass="line">1</span><br><spanclass="line">2</span><br><spanclass="line">3</span><br><spanclass="line">4</span><br><spanclass="line">5</span><br><spanclass="line">6</span><br><spanclass="line">7</span><br><spanclass="line">8</span><br><spanclass="line">9</span><br><spanclass="line">10</span><br><spanclass="line">11</span><br><spanclass="line">12</span><br><spanclass="line">13</span><br><spanclass="line">14</span><br><spanclass="line">15</span><br><spanclass="line">16</span><br><spanclass="line">17</span><br><spanclass="line">18</span><br><spanclass="line">19</span><br><spanclass="line">20</span><br><spanclass="line">21</span><br><spanclass="line">22</span><br><spanclass="line">23</span><br><spanclass="line">24</span><br><spanclass="line">25</span><br><spanclass="line">26</span><br><spanclass="line">27</span><br><spanclass="line">28</span><br><spanclass="line">29</span><br><spanclass="line">30</span><br><spanclass="line">31</span><br><spanclass="line">32</span><br><spanclass="line">33</span><br><spanclass="line">34</span><br><spanclass="line">35</span><br><spanclass="line">36</span><br><spanclass="line">37</span><br><spanclass="line">38</span><br><spanclass="line">39</span><br><spanclass="line">40</span><br><spanclass="line">41</span><br><spanclass="line">42</span><br><spanclass="line">43</span><br></pre></td><tdclass="code"><pre><spanclass="line"><spanclass="meta">>>></span><spanclass="keyword">import</span> pandas <spanclass="keyword">as</span> pd</span><br><spanclass="line"><spanclass="meta">>>></span><spanclass="keyword">import</span> numpy <spanclass="keyword">as</span> np</span><br><spanclass="line"><spanclass="meta">>>></span>data = {<spanclass="string">'key1'</span> : [<spanclass="string">'a'</span>, <spanclass="string">'b'</span>, <spanclass="string">'a'</span>, <spanclass="string">'b'</span>, <spanclass="string">'a'</span>, <spanclass="string">'b'</span>, <spanclass="string">'a'</span>, <spanclass="string">'a'</span>],</span><br><spanclass="line"><spanclass="string">'key2'</span> : [<spanclass="string">'one'</span>, <spanclass="string">'one'</span>, <spanclass="string">'two'</span>, <spanclass="string">'three'</span>, <spanclass="string">'two'</span>, <spanclass="string">'two'</span>, <spanclass="string">'one'</span>, <spanclass="string">'three'</span>],</span><br><spanclass="line"><spanclass="string">'data1'</span>: np.random.randn(<spanclass="number">8</span>),</span><br><spanclass="line"><spanclass="string">'data2'</span>: np.random.randn(<spanclass="number">8</span>)}</span><br><spanclass="line"><spanclass="meta">>>></span>obj = pd.DataFrame(data)</span><br><spanclass="line"><spanclass="meta">>>></span>obj</span><br><spanclass="line"> key1 key2 data1 data2</span><br><spanclass="line"><spanclass="number">0</span> a one <spanclass="number">-1.088762</span><spanclass="number">0.668504</span></span><br><spanclass="line"><spanclass="number">1</span> b one <spanclass="number">0.275500</span><spanclass="number">0.787844</span></span><br><spanclass="line"><spanclass="number">2</span> a two <spanclass="number">-0.108417</span><spanclass="number">-0.491296</span></span><br><spanclass="line"><spanclass="number">3</span> b three <spanclass="number">0.019524</span><spanclass="number">-0.363390</span></span><br><spanclass="line"><spanclass="number">4</span> a two <spanclass="number">0.453612</span><spanclass="number">0.796999</span></span><br><spanclass="line"><spanclass="number">5</span> b two <spanclass="number">1.982858</span><spanclass="number">1.501877</span></span><br><spanclass="line"><spanclass="number">6</span> a one <spanclass="number">1.101132</span><spanclass="number">-1.928362</span></span><br><spanclass="line"><spanclass="number">7</span> a three <spanclass="number">0.524775</span><spanclass="number">-1.205842</span></span><br><spanclass="line"><spanclass="meta">>>></span></span><br><spanclass="line"><spanclass="meta">>>></span><spanclass="keyword">for</span> group_name, group_data <spanclass="keyword">in</span> obj.groupby([<spanclass="string">'key1'</span>, <spanclass="string">'key2'</span>]):</span><br><spanclass="line"> print(group_name)</span><br><spanclass="line"> print(group_data)</span><br><spanclass="line"></span><br><spanclass="line"></span><br><spanclass="line">(<spanclass="string">'a'</span>, <spanclass="string">'one'</span>)</span><br><spanclass="line"> key1 key2 data1 data2</span><br><spanclass="line"><spanclass="number">0</span> a one <spanclass="number">-1.088762</span><spanclass="number">0.668504</span></span><br><spanclass="line"><spanclass="number">6</span> a one <spanclass="number">1.101132</span><spanclass="number">-1.928362</span></span><br><spanclass="line">(<spanclass="string">'a'</span>, <spanclass="string">'three'</span>)</span><br><spanclass="line"> key1 key2 data1 data2</span><br><spanclass="line"><spanclass="number">7</span> a three <spanclass="number">0.524775</span><spanclass="number">-1.205842</span></span><br><spanclass="line">(<spanclass="string">'a'</span>, <spanclass="string">'two'</span>)</span><br><spanclass="line"> key1 key2 data1 data2</span><br><spanclass="line"><spanclass="number">2</span> a two <spanclass="number">-0.108417</span><spanclass="number">-0.491296</span></span><br><spanclass="line"><spanclass="number">4</span> a two <spanclass="number">0.453612</span><spanclass="number">0.796999</span></span><br><spanclass="line">(<spanclass="string">'b'</span>, <spanclass="string">'one'</span>)</span><br><spanclass="line"> key1 key2 data1 data2</span><br><spanclass="line"><spanclass="number">1</span> b one <spanclass="number">0.2755</span><spanclass="number">0.787844</span></span><br><spanclass="line">(<spanclass="string">'b'</span>, <spanclass="string">'three'</span>)</span><br><spanclass="line"> key1 key2 data1 data2</span><br><spanclass="line"><spanclass="number">3</span> b three <spanclass="number">0.019524</span><spanclass="number">-0.36339</span></span><br><spanclass="line">(<spanclass="string">'b'</span>, <spanclass="string">'two'</span>)</span><br><spanclass="line"> key1 key2 data1 data2</span><br><spanclass="line"><spanclass="number">5</span> b two <spanclass="number">1.982858</span><spanclass="number">1.501877</span></span><br></pre></td></tr></table></figure>
<figureclass="highlight python"><table><tr><tdclass="gutter"><pre><spanclass="line">1</span><br><spanclass="line">2</span><br><spanclass="line">3</span><br><spanclass="line">4</span><br><spanclass="line">5</span><br><spanclass="line">6</span><br><spanclass="line">7</span><br><spanclass="line">8</span><br><spanclass="line">9</span><br><spanclass="line">10</span><br><spanclass="line">11</span><br><spanclass="line">12</span><br><spanclass="line">13</span><br><spanclass="line">14</span><br><spanclass="line">15</span><br><spanclass="line">16</span><br><spanclass="line">17</span><br><spanclass="line">18</span><br><spanclass="line">19</span><br><spanclass="line">20</span><br><spanclass="line">21</span><br><spanclass="line">22</span><br><spanclass="line">23</span><br><spanclass="line">24</span><br><spanclass="line">25</span><br><spanclass="line">26</span><br><spanclass="line">27</span><br><spanclass="line">28</span><br><spanclass="line">29</span><br><spanclass="line">30</span><br><spanclass="line">31</span><br><spanclass="line">32</span><br><spanclass="line">33</span><br><spanclass="line">34</span><br><spanclass="line">35</span><br><spanclass="line">36</span><br><spanclass="line">37</span><br><spanclass="line">38</span><br><spanclass="line">39</span><br><spanclass="line">40</span><br><spanclass="line">41</span><br><spanclass="line">42</span><br></pre></td><tdclass="code"><pre><spanclass="line"><spanclass="meta">>>></span><spanclass="keyword">import</span> pandas <spanclass="keyword">as</span> pd</span><br><spanclass="line"><spanclass="meta">>>></span><spanclass="keyword">import</span> numpy <spanclass="keyword">as</span> np</span><br><spanclass="line"><spanclass="meta">>>></span>data = {<spanclass="string">'key1'</span> : [<spanclass="string">'a'</span>, <spanclass="string">'b'</span>, <spanclass="string">'a'</span>, <spanclass="string">'b'</span>, <spanclass="string">'a'</span>, <spanclass="string">'b'</span>, <spanclass="string">'a'</span>, <spanclass="string">'a'</span>],</span><br><spanclass="line"><spanclass="string">'key2'</span> : [<spanclass="string">'one'</span>, <spanclass="string">'one'</span>, <spanclass="string">'two'</span>, <spanclass="string">'three'</span>, <spanclass="string">'two'</span>, <spanclass="string">'two'</span>, <spanclass="string">'one'</span>, <spanclass="string">'three'</span>],</span><br><spanclass="line"><spanclass="string">'data1'</span>: np.random.randn(<spanclass="number">8</span>),</span><br><spanclass="line"><spanclass="string">'data2'</span>: np.random.randn(<spanclass="number">8</span>)}</span><br><spanclass="line"><spanclass="meta">>>></span>obj = pd.DataFrame(data)</span><br><spanclass="line"><spanclass="meta">>>></span>obj</span><br><spanclass="line"> key1 key2 data1 data2</span><br><spanclass="line"><spanclass="number">0</span> a one <spanclass="number">-0.607009</span><spanclass="number">1.948301</span></span><br><spanclass="line"><spanclass="number">1</span> b one <spanclass="number">0.150818</span><spanclass="number">-0.025095</span></span><br><spanclass="line"><spanclass="number">2</span> a two <spanclass="number">-2.086024</span><spanclass="number">0.358164</span></span><br><spanclass="line"><spanclass="number">3</span> b three <spanclass="number">0.446061</span><spanclass="number">1.708797</span></span><br><spanclass="line"><spanclass="number">4</span> a two <spanclass="number">0.745457</span><spanclass="number">-0.980948</span></span><br><spanclass="line"><spanclass="number">5</span> b two <spanclass="number">0.981877</span><spanclass="number">2.159327</span></span><br><spanclass="line"><spanclass="number">6</span> a one <spanclass="number">0.804480</span><spanclass="number">-0.499661</span></span><br><spanclass="line"><spanclass="number">7</span> a three <spanclass="number">0.112884</span><spanclass="number">0.004367</span></span><br><spanclass="line"><spanclass="meta">>>></span></span><br><spanclass="line"><spanclass="meta">>>></span>grouped = obj.groupby(<spanclass="string">'key1'</span>)</span><br><spanclass="line"><spanclass="meta">>>></span>list(grouped1)</span><br><spanclass="line">[(<spanclass="string">'a'</span>, key1 key2 data1 data2</span><br><spanclass="line"><spanclass="number">0</span> a one <spanclass="number">-0.607009</span><spanclass="number">1.948301</span></span><br><spanclass="line"><spanclass="number">2</span> a two <spanclass="number">-2.086024</span><spanclass="number">0.358164</span></span><br><spanclass="line"><spanclass="number">4</span> a two <spanclass="number">0.745457</span><spanclass="number">-0.980948</span></span><br><spanclass="line"><spanclass="number">6</span> a one <spanclass="number">0.804480</span><spanclass="number">-0.499661</span></span><br><spanclass="line"><spanclass="number">7</span> a three <spanclass="number">0.112884</span><spanclass="number">0.004367</span>),</span><br><spanclass="line">(<spanclass="string">'b'</span>, key1 key2 data1 data2</span><br><spanclass="line"><spanclass="number">1</span> b one <spanclass="number">0.150818</span><spanclass="number">-0.025095</span></span><br><spanclass="line"><spanclass="number">3</span> b three <spanclass="number">0.446061</span><spanclass="number">1.708797</span></span><br><spanclass="line"><spanclass="number">5</span> b two <spanclass="number">0.981877</span><spanclass="number">2.159327</span>)]</span><br><spanclass="line"><spanclass="meta">>>></span></span><br><spanclass="line"><spanclass="meta">>>></span>dict(list(grouped1))</span><br><spanclass="line">{<spanclass="string">'a'</span>: key1 key2 data1 data2</span><br><spanclass="line"><spanclass="number">0</span> a one <spanclass="number">-0.607009</span><spanclass="number">1.948301</span></span><br><spanclass="line"><spanclass="number">2</span> a two <spanclass="number">-2.086024</span><spanclass="number">0.358164</span></span><br><spanclass="line"><spanclass="number">4</span> a two <spanclass="number">0.745457</span><spanclass="number">-0.980948</span></span><br><spanclass="line"><spanclass="number">6</span> a one <spanclass="number">0.804480</span><spanclass="number">-0.499661</span></span><br><spanclass="line"><spanclass="number">7</span> a three <spanclass="number">0.112884</span><spanclass="number">0.004367</span>,</span><br><spanclass="line"><spanclass="string">'b'</span>: key1 key2 data1 data2</span><br><spanclass="line"><spanclass="number">1</span> b one <spanclass="number">0.150818</span><spanclass="number">-0.025095</span></span><br><spanclass="line"><spanclass="number">3</span> b three <spanclass="number">0.446061</span><spanclass="number">1.708797</span></span><br><spanclass="line"><spanclass="number">5</span> b two <spanclass="number">0.981877</span><spanclass="number">2.159327</span>}</span><br></pre></td></tr></table></figure>
<figureclass="highlight python"><table><tr><tdclass="gutter"><pre><spanclass="line">1</span><br><spanclass="line">2</span><br><spanclass="line">3</span><br><spanclass="line">4</span><br><spanclass="line">5</span><br><spanclass="line">6</span><br><spanclass="line">7</span><br><spanclass="line">8</span><br><spanclass="line">9</span><br><spanclass="line">10</span><br><spanclass="line">11</span><br><spanclass="line">12</span><br><spanclass="line">13</span><br><spanclass="line">14</span><br><spanclass="line">15</span><br><spanclass="line">16</span><br><spanclass="line">17</span><br><spanclass="line">18</span><br><spanclass="line">19</span><br><spanclass="line">20</span><br><spanclass="line">21</span><br><spanclass="line">22</span><br><spanclass="line">23</span><br><spanclass="line">24</span><br><spanclass="line">25</span><br><spanclass="line">26</span><br><spanclass="line">27</span><br><spanclass="line">28</span><br><spanclass="line">29</span><br><spanclass="line">30</span><br><spanclass="line">31</span><br><spanclass="line">32</span><br><spanclass="line">33</span><br><spanclass="line">34</span><br><spanclass="line">35</span><br><spanclass="line">36</span><br><spanclass="line">37</span><br><spanclass="line">38</span><br><spanclass="line">39</span><br><spanclass="line">40</span><br><spanclass="line">41</span><br><spanclass="line">42</span><br></pre></td><tdclass="code"><pre><spanclass="line"><spanclass="meta">>>></span><spanclass="keyword">import</span> pandas <spanclass="keyword">as</span> pd</span><br><spanclass="line"><spanclass="meta">>>></span><spanclass="keyword">import</span> numpy <spanclass="keyword">as</span> np</span><br><spanclass="line"><spanclass="meta">>>></span>data = {<spanclass="string">'key1'</span> : [<spanclass="string">'a'</span>, <spanclass="string">'b'</span>, <spanclass="string">'a'</span>, <spanclass="string">'b'</span>, <spanclass="string">'a'</span>, <spanclass="string">'b'</span>, <spanclass="string">'a'</span>, <spanclass="string">'a'</span>],</span><br><spanclass="line"><spanclass="string">'key2'</span> : [<spanclass="string">'one'</span>, <spanclass="string">'one'</span>, <spanclass="string">'two'</span>, <spanclass="string">'three'</span>, <spanclass="string">'two'</span>, <spanclass="string">'two'</span>, <spanclass="string">'one'</span>, <spanclass="string">'three'</span>],</span><br><spanclass="line"><spanclass="string">'data1'</span>: np.random.randn(<spanclass="number">8</span>),</span><br><spanclass="line"><spanclass="string">'data2'</span>: np.random.randn(<spanclass="number">8</span>)}</span><br><spanclass="line"><spanclass="meta">>>></span>obj = pd.DataFrame(data)</span><br><spanclass="line"><spanclass="meta">>>></span>obj</span><br><spanclass="line"> key1 key2 data1 data2</span><br><spanclass="line"><spanclass="number">0</span> a one <spanclass="number">-0.607009</span><spanclass="number">1.948301</span></span><br><spanclass="line"><spanclass="number">1</span> b one <spanclass="number">0.150818</span><spanclass="number">-0.025095</span></span><br><spanclass="line"><spanclass="number">2</span> a two <spanclass="number">-2.086024</span><spanclass="number">0.358164</span></span><br><spanclass="line"><spanclass="number">3</span> b three <spanclass="number">0.446061</span><spanclass="number">1.708797</span></span><br><spanclass="line"><spanclass="number">4</span> a two <spanclass="number">0.745457</span><spanclass="number">-0.980948</span></span><br><spanclass="line"><spanclass="number">5</span> b two <spanclass="number">0.981877</span><spanclass="number">2.159327</span></span><br><spanclass="line"><spanclass="number">6</span> a one <spanclass="number">0.804480</span><spanclass="number">-0.499661</span></span><br><spanclass="line"><spanclass="number">7</span> a three <spanclass="number">0.112884</span><spanclass="number">0.004367</span></span><br><spanclass="line"><spanclass="meta">>>></span></span><br><spanclass="line"><spanclass="meta">>>></span>grouped = obj.groupby(<spanclass="string">'key1'</span>)</span><br><spanclass="line"><spanclass="meta">>>></span>list(grouped)</span><br><spanclass="line">[(<spanclass="string">'a'</span>, key1 key2 data1 data2</span><br><spanclass="line"><spanclass="number">0</span> a one <spanclass="number">-0.607009</span><spanclass="number">1.948301</span></span><br><spanclass="line"><spanclass="number">2</span> a two <spanclass="number">-2.086024</span><spanclass="number">0.358164</span></span><br><spanclass="line"><spanclass="number">4</span> a two <spanclass="number">0.745457</span><spanclass="number">-0.980948</span></span><br><spanclass="line"><spanclass="number">6</span> a one <spanclass="number">0.804480</span><spanclass="number">-0.499661</span></span><br><spanclass="line"><spanclass="number">7</span> a three <spanclass="number">0.112884</span><spanclass="number">0.004367</span>),</span><br><spanclass="line">(<spanclass="string">'b'</span>, key1 key2 data1 data2</span><br><spanclass="line"><spanclass="number">1</span> b one <spanclass="number">0.150818</span><spanclass="number">-0.025095</span></span><br><spanclass="line"><spanclass="number">3</span> b three <spanclass="number">0.446061</span><spanclass="number">1.708797</span></span><br><spanclass="line"><spanclass="number">5</span> b two <spanclass="number">0.981877</span><spanclass="number">2.159327</span>)]</span><br><spanclass="line">>>></span><br><spanclass="line"><spanclass="meta">>>></span>dict(list(grouped))</span><br><spanclass="line">{<spanclass="string">'a'</span>: key1 key2 data1 data2</span><br><spanclass="line"><spanclass="number">0</span> a one <spanclass="number">-0.607009</span><spanclass="number">1.948301</span></span><br><spanclass="line"><spanclass="number">2</span> a two <spanclass="number">-2.086024</span><spanclass="number">0.358164</span></span><br><spanclass="line"><spanclass="number">4</span> a two <spanclass="number">0.745457</span><spanclass="number">-0.980948</span></span><br><spanclass="line"><spanclass="number">6</span> a one <spanclass="number">0.804480</span><spanclass="number">-0.499661</span></span><br><spanclass="line"><spanclass="number">7</span> a three <spanclass="number">0.112884</span><spanclass="number">0.004367</span>,</span><br><spanclass="line"><spanclass="string">'b'</span>: key1 key2 data1 data2</span><br><spanclass="line"><spanclass="number">1</span> b one <spanclass="number">0.150818</span><spanclass="number">-0.025095</span></span><br><spanclass="line"><spanclass="number">3</span> b three <spanclass="number">0.446061</span><spanclass="number">1.708797</span></span><br><spanclass="line"><spanclass="number">5</span> b two <spanclass="number">0.981877</span><spanclass="number">2.159327</span>}</span><br></pre></td></tr></table></figure>
<h3id="【04x01】聚合函数"><ahref="#【04x01】聚合函数"class="headerlink"title="【04x01】聚合函数"></a><fontcolor="#4876FF">【04x01】聚合函数</font></h3><p>之前的例子已经用过一些内置的聚合函数,比如 mean、count、min 以及 sum 等。常见的聚合运算如下表所示:</p>