Accessing rows via `sales[0]` then doesn't work because Pandas wants to use array indexing notation for getting columns. Instead, we have to use slightly more awkward notation:
During construction and debugging of software, I often like the explicit printing of the column names as is the default shown above. On the other hand, if we need the elements as a plain old Python list, we can do that with `list()`:
For machine learning, we often want to separate out one of the columns as the dependent variable, keeping the others as a group of independent variables. Notation we typically use is X -> Y, meaning the set of observations in X predict or classify results in Y.
For example, let's say we wanted to predict engine size given the efficiency, number of cylinders, and overall car weight. We need to separate out the engine size as Y and combining the other columns into X. Using Pandas, we can easily separate the variables and keep the variables names:
Converting to a NumPy array strips away the column names but let us treat it as a matrix, which is handy in a lot of cases (e.g., matrix addition). Separating columns from NumPy arrays is a bit more cumbersome, However:
| 5 | 2017-06-24 | get off my lawn | 99.0 | 8.90 | 4 | False |
### Missing data
Our data has a missing description, which we can ignore, but also has a missing size (numeric) and topic (categorical) entry.
### 缺失数据
* If the element is numeric, we replace the missing value with the column median and add a column to indicate 0 or 1 as to whether the value is missing.
* If the element is categorical, Pandas can handle the missing value automatically when we use parameter `dummy_na=True` on `get_dummies()` (see next section).
| 5 | 2017-06-24 | get off my lawn | 99.0 | 8.90 | 3 | False |
### Dummy variables
Instead, we can convert the categorical variable to dummy variables, also called "*one hot encoding*."
### 虚拟变量
If were lazy, we can just convert everything to dummies but the `Description` field is not something that we need to convert as it is mostly just information we are carrying along.