提交 651358eb 编写于 作者: W wizardforcel

2.1.

上级 e9a769b5
......@@ -5,50 +5,49 @@
我记得面对我的第一个编程任务(1980 年的 BASIC!),我完全没做出来。我甚至不知道如何开始解决这个问题。我很难过,尽管编码对我来说很快就会变得非常自然。我最初的困难的原因现在显而易见:教师完全没有提供将问题转换为正在运行的程序的技术或策略。我必须自己解决这个问题。
The approach of focusing on the syntax of a programming language in introductory courses is understandable. Problem-solving is not a precise, well-defined skill. It's more of an overall ability that gets honed with practice. Teaching and grading it is therefore challenging. It's much easier to jump immediately into the syntax of some simple programming language statements. Such an approach is concrete, and in principle, easy to understand but totally skips the part about why and when we need those statements. Professors that could survive in that environment as students usually go on to perpetuate the sink-or-swim approach when teaching other programmers.
In this course, I'd like to rectify that by focusing on both solving problems and learning to write complex Python code. To do that, we're going to follow an overall problem-solving strategy that involves designing a "work plan" or "algorithm," either on paper or in your head (when you get more experience). The plan helps us think about the problem long before we get to the coding phase. Part of the plan is to identify a suitable sequence of operation that solves our problem. This is the tricky bit so we'll reduce the scope of the solution space by: (1) restricting ourselves to a common set of operations and data structures, (2) applying well-established methods we can call "working backwards" and "reducing to a known solution", and finally, (3) taking advantage of the topic-specific nature of this introductory course to adopt a program outline that'll work for most data science problems. When we do finally get to Python programming, we'll restrict ourselves to a useful subset of the language. The goal is to teach you to program, not teach you the complete Python language.
在介绍性课程中关注编程语言语法的方法是可以理解的。 解决问题不是一种精确,定义好的技能。 它更像是一种通过练习磨练的整体能力。 因此,教学和评分是具有挑战性的。 立即跳转到一些简单的编程语言语句的语法,要容易得多。 这种方法具体,原则上易于理解,但完全忽略了我们什么时候以及为什么需要这些语句。 可以在这种环境中作为学生而生存的教授,在教其他程序员时通常会继续使用孤注一掷的方法。
Allow me to begin by making a distinction between <b>programming</b> (problem solving) and <b>coding</b> (expressing our solution in a particular programming language).
在本课程中,我想通过专注于解决问题和学习编写复杂的 Python 代码来纠正这一问题。要做到这一点,我们将遵循一个整体的问题解决策略,包括设计“工作计划”或“算法”,无论是纸上还是头脑中(当你获得更多经验时)。在我们进入编码阶段很久之前,该计划就帮助我们思考问题。计划的一部分是确定一个解决我们问题的合适操作顺序。这是一个棘手的问题,因此我们将通过以下方式缩小解决方案空间的范围:(1)将自己限制在一组通用的操作和数据结构中,(2)应用成熟的方法,我们称之为“向后工作”和“简化为一个已知的解决方案“,最后,(3)利用这个入门课程的主题特性,采用一个适用于大多数数据科学问题的程序大纲。当我们最终进入 Python 编程时,我们将自己局限于该语言的有用子集。目标是教你编程,而不是教你完整的 Python 语言。
## What is programming?
请允许我首先区分**编程**(问题解决)和**编码**(用特定编程语言表达我们的解决方案)。
When we think about programming, we immediately think about programming languages because we express ourselves using specific language syntax. But, that is like asking a physicist in which language they discuss physics. **Programming** is mostly about converting "word problems" (project descriptions) to an execution plan. The final act of **coding** (entering code) is required, of course, but learning to solve programming problems mentally is the most difficult process and is the most important.
## 编程是什么?
The same is true for natural languages. Learning to prove mathematical theorems is harder than learning to write up proofs in some natural language. In fact, much of the mathematical syntax is the same across natural languages just as it is for programming languages. Expressing your thoughts in Python or R, as you will do in the data science program, is the simplest part of the programming process. That said, writing correct code is often the most frustrating and time-consuming part of the process even for experienced programmers.
当我们考虑编程时,我们会立即考虑编程语言,因为我们使用特定的语言语法表达自己。 但是,这就像向物理学家询问他们讨论物理学的语言。 **编程**主要是将“单词问题”(项目描述)转换为执行计划。 当然,**编码**(输入代码)的最终行为是必需的,但学习在精神上解决编程问题是最困难的过程,也是最重要的过程。
Programming is more about *what* to say rather than *how* to say it. Solving a problem with a computer means identifying a sequence of operations, each of which solves a piece of the overall problem. Each operation might itself be a sequence of suboperations. Expressing those operations in Python or R is not the hard part. Identifying which operations are necessary and their relative order is the hard part.
自然语言也是如此。 学习证明数学定理比学习用某种自然语言编写证明更难。 实际上,大多数数学语法在自然语言中都是相同的,就像编程语言一样。 像在数据科学计划中一样,用 Python 或 R 表达您的想法是编程过程中最简单的部分。 也就是说,编写正确的代码通常是该过程中最令人沮丧和耗时的部分,即使对于有经验的程序员也是如此。
Let's start with an overall strategy for attacking programming problems.
编程更多是要表达*什么*而不是*如何*表达。 用计算机解决问题意味着识别一系列操作,每个操作都解决了整个问题的一部分。 每个操作本身可能是一系列子操作。 用 Python 或 R 表达这些操作并不困难。 确定哪些操作及其相对顺序是困难的部分。
## Problem-solving strategy
让我们从解决编程问题的整体策略入手。
Regardless of the software we're trying to write, there is an overall problem-solving strategy that we can follow.
## 解决问题的策略
**Step one** in any problem-solving situation is to fully understand the problem and clearly identify the goal. It might sound obvious, but any fuzziness in our understanding of the problem could send us off in the wrong direction. In a data science setting, the goal is usually a question we're trying to answer, such as "*which sales regions show the fastest year-on-year growth?*" (summary statistics), "*which transactions are fraudulent?*" (classifier) or "*what will a stock price be at a future date?*" (predictor). We should be able to precisely articulate the goal and the expected output using English words. If we can't do that, then no amount of coding expertise in Python or R will solve the problem. We'll see some examples shortly.
无论我们尝试编写什么软件,我们都可以遵循解决问题的整体策略。
**Step two** (or possibly part of step one) of the problem-solving process is to write out some input-output pairs by hand. Doing so helps us understand what the program will need to do and how it might do it. As we will see, this technique works not only for the overall input and output, but also works great for designing [functions](functions.ipynb) (reusable bits of code). **We can't automate operations with code if we can't identify and perform the operations manually.** Moreover, listing a bunch of cases usually highlights special cases, such as "when the input is negative, the output should be empty". In other words, the program should not crash with a negative number as input. Programmers call this *test-driven design*.
**在任何问题解决的情况下,第一步**是充分理解问题并清楚地确定目标。 这可能听起来很明显,但是我们对这个问题的理解中的任何模糊性都可能使我们走错方向。 在数据科学环境中,目标通常是我们试图回答的问题,例如“*哪个销售区域的同比增长最快?*”(摘要统计量),“*哪些交易是欺诈性的?*”(分类器)或“*未来某个日期股票价格是多少?*”(预测器)。 我们应该能够使用英语单词精确地表达目标和预期输出。 如果我们不能这样做,那么 Python 或 R 中没有任何编码的专业知识可以解决问题。 我们很快就会看到一些例子。
In a job interviewing setting, this step means immediately trying to draw a few instances of the problem. For example, if asked to process a list of numbers in some way, begin by putting three or four numbers up on the board or on a piece of paper. This naturally brings up a number of important questions that the interviewer is expecting you to ask, such as where the data comes from and whether it can all fit in memory etc...
**问题解决过程的第二步**(或可能是第一步的一部分)是手动写出一些输入 - 输出对。 这样做有助于我们了解程序需要做什么以及如何执行。 我们将要看到,这种技术不仅适用于整体输入和输出,而且适用于设计[函数](functions.ipynb)(可重用的代码段)。 **如果我们无法手动识别和执行操作,我们无法使用代码自动执行操作。**此外,列出一堆案例通常会突出特殊情况,例如“当输入为负时,输出应为空”。 换句话说,程序不应该以负数作为输入而崩溃。 程序员称之为*测试驱动设计*
**Step three** is to figure out what data or input, our raw materials, that we need to achieve the goal. Without the right data, we can't solve the problem. For example, I once mentored a student practicum team whose goal was to identify which customers of a website would upgrade to a professional account. The students only had data on users that had upgraded and no data on users who declined to upgrade. Whoops! You can't build an apples versus oranges classifier if you only have data on apples. If you don't have all the data you need, it's important to identify this requirement as part of the problem-solving process. Data acquisition often requires programming and we'll revisit the topic below as part of our generic program outline.
在求职面试设置中,此步骤意味着立即尝试绘制问题的几个实例。 例如,如果要求以某种方式处理数字列表,首先将三个或四个数字放在板上或纸上。 这自然会带来一些面试官期待你提出的重要问题,比如数据的来源以及它是否适合内存等......
**第三步**是弄清楚我们实现目标所需的数据或输入,即我们的原材料。 没有正确的数据,我们无法解决问题。 例如,我曾指导过一个学生实习团队,其目标是确定某个网站的哪些客户会升级到专业帐户。 学生只有已升级的用户数据,没有拒绝升级的用户数据。哎呀! 如果您只有苹果的数据,则无法构建苹果与橙子分类器。 如果您没有所需的所有数据,那么将此要求确定为问题解决过程的一部分非常重要。 数据采集通常需要编程,我们将回顾下面的主题,作为我们通用计划大纲的一部分。
At this point, we've actually set the stage necessary to solve problems and we haven't thought about code at all. We started with the end result and then identified the data we need. The input-output pairs neatly bracket the computation we need to perform. At the beginning, we have the known data and, at the end, we have the expected output or work product. Ok, onto the programming steps.
在这一点上,我们实际上已经设定了解决问题所需的阶段,我们根本没有考虑过代码。 我们从最终结果开始,然后确定了我们需要的数据。 输入 - 输出对巧妙地包含了我们需要执行的计算。 一开始,我们有已知的数据,最后,我们有预期的输出或作品。 好的,进入编程步骤。
**Step four** is to identify the sequence of operations that will compute the expected result. Sometimes this is called an *algorithm* and involves planning out the specific operations and suboperations that chew on the input data, gradually transforming it into the expected output.
**第四步**是确定计算预期结果的操作顺序。 有时这被称为*算法*并且涉及规划输入数据上的特定操作和子操作,逐渐将其转换为预期输出。
These first four steps are a key part of the so-called [Feynman technique](https://www.google.com/search?q=feynman+technique), which includes writing down a complete explanation of an assigned task or problem as you would explain it to a nonexpert. Until you can write it down simply, without confusing language or terms, you yourself don't understand the problem. There is no point in continuing until you get past this phase. (Faculty often joke that the best way to learn a new topic is to teach a class on that topic!)
前四个步骤是所谓的[费曼技巧](https://www.google.com/search?q=feynman+technique)的关键部分,其中包括写下已分配任务或问题的完整说明,就像你对非专家解释它那样。直到你可以简单地写下来,而不会混淆语言或术语,你自己不明白这个问题。在你完成这个阶段之前,没有必要继续下去。(教师经常开玩笑,学习新主题的最佳方法是教授关于该主题的课程!)
In **Step five**, we translate the operations in our plan to actual executable code. This step deserves an entire book but here's a summary of my advice. Start with the simplest suboperations and make sure they work first. Then code the larger operations that use those suboperations. If there's a problem, you know that it is likely in the new code not the already-tested suboperations. In this phase, we'll normally find problems in our design from step four so we'll typically repeat four and five. Testing functionality and fixing errors is called *debugging*.
**第五步**中,我们将计划中的操作转换为实际的可执行代码。 这一步需要整本书,但这里总结了我的建议。 从最简单的子操作开始,确保它们先工作。 然后编写使用这些子操作的较大操作。 如果出现问题,您就会知道新代码中的子操作可能没有经过测试。 在这个阶段,我们通常会发现第四步中的设计问题,因此我们通常会重复四五次。 测试功能和修复错误称为*调试*
Finally, **step six** is to check our overall results for correctness. The most obvious check is to compare the output of our program with the known input-output pairs from step three. Then, most importantly, test the program with input that was not considered in steps three through five. This is an important test of the programs generality. If the program gives incorrect output, it's back to step four to see what's wrong.
最后,**第六步**是检查我们的整体结果的正确性。 最明显的检查是比较程序的输出与步骤 3 中的已知输入-输出对。 然后,最重要的是,在第 3 步到第 5 步中使用未考虑的输入来测试程序。 这是对程序通用性的重要测试。 如果程序输出错误,则返回第 4 步来查看错误。
And now for a dose of reality. The world is a big messy place and, since we know the least about a problem at the start, we typically need to repeat or bounce around through some or all of these steps. For example, let's say we're building an apples vs oranges classifier and the above process leads to a program that doesn't distinguish between the two fruit very well. Perhaps we only have data on size and shape. We might decide that the classifier needs data on color so it's back to step two (and possibly step three) then step six to check the results again.
而现在,出于现实因素。世界是一个非常混乱的地方,因为我们开始知道最少的问题,所以我们通常需要通过一些或所有这些步骤重复或反弹。 例如,假设我们正在构建一个苹果与橙子分类器,上面的过程使程序不能很好地区分这两个水果。 也许我们只有大小和形状的数据。 我们可能会认为分类器需要颜色数据,所以它回到第二步(可能是第三步),然后是第六步再次检查结果。
### Conjuring up plans and programs
### 制定计划和程序
A program is a sequence of operations that transforms data or performs computations that ultimately lead to the expected output. *Programming* is the act of designing programs: identifying the operations and their appropriate sequence. In other words, programming is about coming up with a work plan intended for a computer, which we often describe in semi-precise English called *pseudocode*. This is **step four** from the previous section.
程序是一系列操作,用于转换数据或执行计算,它最终产生预期输出。*编程*是设计程序的行为:识别操作及其适当的顺序。 换句话说,编程就是为计算机提出一个工作计划,我们经常用半精确的英文描述,叫做*伪代码*。这是上一节中的**第四步**
*Coding*, on the other hand, is the act of translating such high-level pseudocode to programming language syntax. As you gain more experience, it'll become easier and easier to go from a work plan in your head straight to code, without the pseudocode step.
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册