{"font_size":0.4,"font_color":"#FFFFFF","background_alpha":0.5,"background_color":"#9C27B0","Stroke":"none","body":[{"from":5.48,"to":11.12,"location":2,"content":"Okay. Hi, everyone. Um, so let's get started again today."},{"from":11.12,"to":14.61,"location":2,"content":"So today's lecture what I'm going to do,"},{"from":14.61,"to":16.71,"location":2,"content":"is be talking about, um,"},{"from":16.71,"to":19.06,"location":2,"content":"question answering over text."},{"from":19.06,"to":22.02,"location":2,"content":"Um, this is another of the big successes"},{"from":22.02,"to":25.66,"location":2,"content":"in using deep learning inside natural language processing,"},{"from":25.66,"to":30.14,"location":2,"content":"and it's also a technology that has some really obvious commercial uses."},{"from":30.14,"to":32.66,"location":2,"content":"So it's an, it's an area that has attracted"},{"from":32.66,"to":36.27,"location":2,"content":"a lot of attention in the last couple of years."},{"from":36.27,"to":38.79,"location":2,"content":"So this is the overall plan."},{"from":38.79,"to":43.97,"location":2,"content":"Um, just a couple of reminders and things at the beginning about final project stuff,"},{"from":43.97,"to":48.88,"location":2,"content":"and then we'll, basically all of it is talking about question-answering starting with, um,"},{"from":48.88,"to":53,"location":2,"content":"motivation history, um, talking about the SQuAD data,"},{"from":53,"to":56.39,"location":2,"content":"uh, a particular simple model, our Stanford Attentive Reader."},{"from":56.39,"to":58.94,"location":2,"content":"Then talking about some other more complex,"},{"from":58.94,"to":62.46,"location":2,"content":"um, stuff into the most modern stuff."},{"from":62.46,"to":65.81,"location":2,"content":"Um, yeah, so in a census, um,"},{"from":65.81,"to":69.36,"location":2,"content":"lecture serves a double purpose because if you're going to do the,"},{"from":69.36,"to":71.39,"location":2,"content":"the default final project, well,"},{"from":71.39,"to":73.41,"location":2,"content":"it's about textual question-answering,"},{"from":73.41,"to":77.86,"location":2,"content":"and this is your chance to learn something about the area of textual question-answering,"},{"from":77.86,"to":81.41,"location":2,"content":"and the kinds of models you might want to be thinking about and building."},{"from":81.41,"to":84.89,"location":2,"content":"Um but the content of this lecture pretty much is in"},{"from":84.89,"to":88.92,"location":2,"content":"no way specifically tied to the default final project,"},{"from":88.92,"to":92.72,"location":2,"content":"apart from by subject matter that really it's telling you about"},{"from":92.72,"to":97.58,"location":2,"content":"how people use neural nets to build question-answering systems."},{"from":97.58,"to":101.2,"location":2,"content":"Okay. So first just quickly on the reminders,"},{"from":101.2,"to":103.05,"location":2,"content":"um, mid-quarter survey."},{"from":103.05,"to":105.15,"location":2,"content":"I mean, a huge number of people,"},{"from":105.15,"to":107.33,"location":2,"content":"um, have actually filled this in already."},{"from":107.33,"to":111.14,"location":2,"content":"Uh, we already had over 60 percent, um, um,"},{"from":111.14,"to":114.17,"location":2,"content":"filling-it-in rate by which by the standards of people"},{"from":114.17,"to":117.25,"location":2,"content":"who do surveys they come as a huge success already."},{"from":117.25,"to":119.51,"location":2,"content":"But if you're not in that percent, um,"},{"from":119.51,"to":123.48,"location":2,"content":"we'd still love to have your feedback and now's the perfect time to do it."},{"from":123.48,"to":125.52,"location":2,"content":"Um, yeah."},{"from":125.52,"to":129.46,"location":2,"content":"I just wanted to sort of have a note on custom final projects."},{"from":129.46,"to":131.39,"location":2,"content":"Um, so in general, um,"},{"from":131.39,"to":134.65,"location":2,"content":"it's great to get feedback on custom final projects."},{"from":134.65,"to":136.91,"location":2,"content":"There's a formal mechanism for that which is"},{"from":136.91,"to":139.63,"location":2,"content":"the project proposal that I mentioned last time."},{"from":139.63,"to":142.33,"location":2,"content":"It's also great to chat to people,"},{"from":142.33,"to":145.93,"location":2,"content":"um, informally about, um, final projects."},{"from":145.93,"to":148.69,"location":2,"content":"And so I'm one of those people and I have"},{"from":148.69,"to":151.61,"location":2,"content":"been talking to lots of people about final projects,"},{"from":151.61,"to":153.46,"location":2,"content":"and, uh, very happy to do so."},{"from":153.46,"to":156.5,"location":2,"content":"But there's sort of a problem that there's only one of me."},{"from":156.5,"to":158.63,"location":2,"content":"Um, so I do also, um,"},{"from":158.63,"to":162.08,"location":2,"content":"encourage you to realize that among the various TAs that"},{"from":162.08,"to":166.07,"location":2,"content":"really lots of them have had experience of different deep learning projects,"},{"from":166.07,"to":168.62,"location":2,"content":"and in particular on the office hours page,"},{"from":168.62,"to":173.42,"location":2,"content":"there's a table that's like this but you can read it if you look at it on your own laptop,"},{"from":173.42,"to":177.13,"location":2,"content":"which talks about the experience of different TA's."},{"from":177.13,"to":179.93,"location":2,"content":"And many of them have experience in different areas,"},{"from":179.93,"to":184.93,"location":2,"content":"and many of them are also good people to talk to about final projects."},{"from":184.93,"to":190.86,"location":2,"content":"Okay. Um, so for the default final project, the textual question-answering."},{"from":190.86,"to":195.19,"location":2,"content":"So um, draft materials for that app today,"},{"from":195.19,"to":197.34,"location":2,"content":"um, right now on the website actually."},{"from":197.34,"to":200.72,"location":2,"content":"Um, we're calling them draft because we think that there are still"},{"from":200.72,"to":204.23,"location":2,"content":"probably a few things that are gonna get changed over the next week,"},{"from":204.23,"to":209.84,"location":2,"content":"so um, don't regard as completely final in terms of the code that,"},{"from":209.84,"to":212.12,"location":2,"content":"you know, it's sort of 90 percent final."},{"from":212.12,"to":215.41,"location":2,"content":"So in terms of deciding whether you're going to do, um,"},{"from":215.41,"to":218.5,"location":2,"content":"a custom final project or a default final project,"},{"from":218.5,"to":221.47,"location":2,"content":"and working out what you're putting into your project proposal."},{"from":221.47,"to":222.75,"location":2,"content":"Um, it should be, you know,"},{"from":222.75,"to":224.27,"location":2,"content":"well more than, um,"},{"from":224.27,"to":226.47,"location":2,"content":"what you need for this year."},{"from":226.47,"to":228.67,"location":2,"content":"Okay. The one other, um,"},{"from":228.67,"to":232.04,"location":2,"content":"final bit I just wanted to say that I didn't get to"},{"from":232.04,"to":235.52,"location":2,"content":"last time is so for the final projects,"},{"from":235.52,"to":238.06,"location":2,"content":"regardless of which kind you're doing,"},{"from":238.06,"to":240.75,"location":2,"content":"um, well, part of it is, um,"},{"from":240.75,"to":242.54,"location":2,"content":"doing some experiments, of"},{"from":242.54,"to":244.7,"location":2,"content":"doing stuff with data and code,"},{"from":244.7,"to":246.88,"location":2,"content":"and getting some numbers and things like that."},{"from":246.88,"to":248.48,"location":2,"content":"But I do really, um,"},{"from":248.48,"to":251.63,"location":2,"content":"encourage people to also remember that an important part of"},{"from":251.63,"to":255.51,"location":2,"content":"the final project is writing a final project report."},{"from":255.51,"to":260.9,"location":2,"content":"And this is no different to any research project of the kinds that,"},{"from":260.9,"to":265.59,"location":2,"content":"um, students do for conferences or journals and things like that, right?"},{"from":265.59,"to":270.02,"location":2,"content":"You spend months commonly working over your code and experiments."},{"from":270.02,"to":271.97,"location":2,"content":"But in most cases,"},{"from":271.97,"to":276.37,"location":2,"content":"the main evaluation of your work is from people reading,"},{"from":276.37,"to":279.2,"location":2,"content":"a written paper output version of things."},{"from":279.2,"to":281.42,"location":2,"content":"So it's really important that,"},{"from":281.42,"to":284.48,"location":2,"content":"that paper version sort of reflects the work"},{"from":284.48,"to":287.84,"location":2,"content":"that you did and the interesting ideas that you came up with,"},{"from":287.84,"to":290.72,"location":2,"content":"and explains them well and present your experiments,"},{"from":290.72,"to":292.1,"location":2,"content":"and all of those things."},{"from":292.1,"to":296.67,"location":2,"content":"And so we encourage you to sort of do a good job at writing up your projects."},{"from":296.67,"to":299.68,"location":2,"content":"Um, here is just sort of a vague outline of, you know,"},{"from":299.68,"to":303.32,"location":2,"content":"what a typical project write-up is likely to look like."},{"from":303.32,"to":306.62,"location":2,"content":"Now, there isn't really one size completely fits all"},{"from":306.62,"to":309.95,"location":2,"content":"because depending on what you've done different things might be appropriate."},{"from":309.95,"to":311.99,"location":2,"content":"But, you know, typically the first page,"},{"from":311.99,"to":315.9,"location":2,"content":"you'll have an abstract for the paper and the introduction to the paper."},{"from":315.9,"to":319.22,"location":2,"content":"You'll spend some time talking about related prior work."},{"from":319.22,"to":323.62,"location":2,"content":"Um, you'll talk about what kind of models you built for a while."},{"from":323.62,"to":328.56,"location":2,"content":"Um, there's probably some discussion of what data you are using for your projects."},{"from":328.56,"to":334.92,"location":2,"content":"Um, experiments commonly with some tables and figures about the things that you're doing."},{"from":334.92,"to":339.74,"location":2,"content":"Um, more tables and figures talking about the results as to how well your systems work."},{"from":339.74,"to":343.01,"location":2,"content":"Um, it's great to have some error analysis to see"},{"from":343.01,"to":346.29,"location":2,"content":"what kind of things that you got right and wrong,"},{"from":346.29,"to":348.5,"location":2,"content":"and then maybe at the end there's sort of"},{"from":348.5,"to":351.96,"location":2,"content":"plans for the future, conclusions, or something like that."},{"from":351.96,"to":359.48,"location":2,"content":"Okay. Um, that's sort of it for my extra administrative reminders."},{"from":359.48,"to":363.47,"location":2,"content":"Um, are there any questions on final projects that people are dying to know?"},{"from":363.47,"to":369.8,"location":2,"content":"[NOISE] Okay. Good luck."},{"from":369.8,"to":370.93,"location":2,"content":"I just meant to say good luck."},{"from":370.93,"to":373.47,"location":2,"content":"Yeah. Good luck with your final projects. [LAUGHTER] Okay."},{"from":373.47,"to":375.38,"location":2,"content":"So now moving into,"},{"from":375.38,"to":378.55,"location":2,"content":"um, yeah, the question answering."},{"from":378.55,"to":383.17,"location":2,"content":"Okay. So, I mean- so question answering is"},{"from":383.17,"to":388.61,"location":2,"content":"a very direct application for something that human beings,"},{"from":388.61,"to":390.1,"location":2,"content":"um, want to do."},{"from":390.1,"to":393.62,"location":2,"content":"Um, well, maybe human beings don't in general want to know this."},{"from":393.62,"to":397.36,"location":2,"content":"Um, here's my query of \"Who was Australia's third prime minister?\"."},{"from":397.36,"to":399.5,"location":2,"content":"Um, maybe, yeah, that's not really the kind of"},{"from":399.5,"to":401.64,"location":2,"content":"thing you're gonna put into your queries but,"},{"from":401.64,"to":403.14,"location":2,"content":"you know, maybe you query,"},{"from":403.14,"to":405.11,"location":2,"content":"\"Who was the lead singer of Big Thief?\""},{"from":405.11,"to":406.75,"location":2,"content":"or something like that. I don't know."},{"from":406.75,"to":408.05,"location":2,"content":"Um, you're, uh, but you know,"},{"from":408.05,"to":411.77,"location":2,"content":"lots- a large percentage of stuff [NOISE] on the web"},{"from":411.77,"to":416.09,"location":2,"content":"is that people actually are asking for answers to questions."},{"from":416.09,"to":419.12,"location":2,"content":"And so, if I put in this query into Google,"},{"from":419.12,"to":420.53,"location":2,"content":"it actually just works."},{"from":420.53,"to":423.92,"location":2,"content":"It tells me the answer is John Christian Watson."},{"from":423.92,"to":428.92,"location":2,"content":"And, um, so that's sort of question answering working in the real world."},{"from":428.92,"to":431.54,"location":2,"content":"Um, if you try different kinds of questions in Google,"},{"from":431.54,"to":434.58,"location":2,"content":"you'll find that some of them work and lots of them don't work."},{"from":434.58,"to":435.77,"location":2,"content":"And when they don't work,"},{"from":435.77,"to":440.09,"location":2,"content":"you're just sort of getting whatever kind of information retrieval, web search results."},{"from":440.09,"to":443.31,"location":2,"content":"Um, there is one fine point that I just wanted,"},{"from":443.31,"to":445.13,"location":2,"content":"um, to mention down here."},{"from":445.13,"to":448.79,"location":2,"content":"So another thing that Google has is the Google Knowledge Graph,"},{"from":448.79,"to":452.23,"location":2,"content":"which is a structured graph representation of knowledge."},{"from":452.23,"to":455.4,"location":2,"content":"And some kinds of questions,"},{"from":455.4,"to":459.08,"location":2,"content":"um, being answered from that structured knowledge representation."},{"from":459.08,"to":460.43,"location":2,"content":"And so, I mean,"},{"from":460.43,"to":463.02,"location":2,"content":"quite a lot of the time for things like movies,"},{"from":463.02,"to":464.87,"location":2,"content":"it's coming from that structured graph."},{"from":464.87,"to":467.69,"location":2,"content":"If you're sort of saying, \"Who's the director of a movie?\""},{"from":467.69,"to":468.89,"location":2,"content":"or something like that."},{"from":468.89,"to":471.05,"location":2,"content":"But this answer isn't coming from that."},{"from":471.05,"to":473,"location":2,"content":"This answer is a genuine,"},{"from":473,"to":475.4,"location":2,"content":"the kind of stuff we're gonna talk about today."},{"from":475.4,"to":479.36,"location":2,"content":"It's textual question answering from a web page where"},{"from":479.36,"to":481.58,"location":2,"content":"Google's question and answering system has"},{"from":481.58,"to":484.5,"location":2,"content":"extracted the answer and is sticking it up there."},{"from":484.5,"to":486.37,"location":2,"content":"Um, if you're, um,"},{"from":486.37,"to":489.49,"location":2,"content":"wanting to explore these things, um,"},{"from":489.49,"to":494.74,"location":2,"content":"if you get one of these boxes sort of down here where I've cut it off,"},{"from":494.74,"to":496.34,"location":2,"content":"there's a little bit of gray that says,"},{"from":496.34,"to":497.99,"location":2,"content":"\"How did I get this result?\"."},{"from":497.99,"to":499.42,"location":2,"content":"And if you click on that,"},{"from":499.42,"to":503.3,"location":2,"content":"it actually tells you what source it's getting it from and you can see if it's doing it"},{"from":503.3,"to":508.13,"location":2,"content":"from the textual question answering system or from something like the Knowledge Graph."},{"from":508.13,"to":511.04,"location":2,"content":"Okay. Um, so the- in general,"},{"from":511.04,"to":515.6,"location":2,"content":"the motivation for question answering is that these days there's"},{"from":515.6,"to":520.36,"location":2,"content":"just these sort of massive collections of full text documents,"},{"from":520.36,"to":522.11,"location":2,"content":"i.e., there's the web."},{"from":522.11,"to":526.58,"location":2,"content":"Um, so that there are sort of billions of documents of information."},{"from":526.58,"to":529.73,"location":2,"content":"And traditionally, when people first started"},{"from":529.73,"to":533.33,"location":2,"content":"thinking about search information retrieval as a field,"},{"from":533.33,"to":539.02,"location":2,"content":"you know, nothing of that kind of quantity and size existed, right?"},{"from":539.02,"to":542.32,"location":2,"content":"That when people first started building search systems,"},{"from":542.32,"to":545.2,"location":2,"content":"it was sort of unthinkable to index"},{"from":545.2,"to":549.34,"location":2,"content":"whole documents because no one had hard disks big enough in those days, right?"},{"from":549.34,"to":555.34,"location":2,"content":"That really- they were indexing titles or titles and abstracts or something like that."},{"from":555.34,"to":559.92,"location":2,"content":"And so, it seemed perfectly adequate in those days to say, \"Okay."},{"from":559.92,"to":562.76,"location":2,"content":"We're just gonna send you- give you your results.\""},{"from":562.76,"to":564.68,"location":2,"content":"as to \"Here's a list of documents.\""},{"from":564.68,"to":567.44,"location":2,"content":"because the documents are only a hundred words long."},{"from":567.44,"to":571.01,"location":2,"content":"But that's clearly not the case now when we have the sort of, you know,"},{"from":571.01,"to":576.27,"location":2,"content":"ten minute read, Medium posts um, which might have the answer to a question."},{"from":576.27,"to":579.08,"location":2,"content":"And so, there's this need to sort of say, \"Well,"},{"from":579.08,"to":583.21,"location":2,"content":"can we just have systems that will give us answers to questions?\"."},{"from":583.21,"to":589.73,"location":2,"content":"And a lot of the recent changes in technology have hugely underlined that need."},{"from":589.73,"to":594.95,"location":2,"content":"So, returning documents works okay if you're sitting at your laptop,"},{"from":594.95,"to":599.15,"location":2,"content":"but it works really terribly if you're on your phone and it works even more"},{"from":599.15,"to":604.04,"location":2,"content":"terribly if you're trying to work with speech on a digital assistant device,"},{"from":604.04,"to":606.11,"location":2,"content":"something like an Alexa system."},{"from":606.11,"to":608.84,"location":2,"content":"And so, we really want to actually be able to produce"},{"from":608.84,"to":612.26,"location":2,"content":"systems that can give the answers to people's questions."},{"from":612.26,"to":616.87,"location":2,"content":"And so typically, doing that is factored into two parts."},{"from":616.87,"to":621.5,"location":2,"content":"That the first part of that is we still do information retrieval."},{"from":621.5,"to":626.27,"location":2,"content":"We use stand- normally quite standard information retrieval techniques to"},{"from":626.27,"to":632.15,"location":2,"content":"find documents that quite likely to con- maintain- contain an answer."},{"from":632.15,"to":636.2,"location":2,"content":"And the reason that this is normally done by quite traditional techniques is because"},{"from":636.2,"to":641.39,"location":2,"content":"the traditional techniques are extremely scalable over billions of documents,"},{"from":641.39,"to":643.79,"location":2,"content":"whereas current neural systems actually"},{"from":643.79,"to":646.23,"location":2,"content":"aren't really scalable over billions of documents."},{"from":646.23,"to":650.38,"location":2,"content":"But that's an area in sort of which research is ongoing."},{"from":650.38,"to":653.92,"location":2,"content":"But then once we have sort of some candidate likely documents,"},{"from":653.92,"to":655.64,"location":2,"content":"we want to find, uh,"},{"from":655.64,"to":657.37,"location":2,"content":"do they contain an answer,"},{"from":657.37,"to":659.3,"location":2,"content":"and if so, what is the answer?"},{"from":659.3,"to":660.52,"location":2,"content":"And so at that point,"},{"from":660.52,"to":663.27,"location":2,"content":"we have a document or a paragraph,"},{"from":663.27,"to":667.45,"location":2,"content":"and we're saying, \"Can we answer this question from there?\""},{"from":667.45,"to":671.35,"location":2,"content":"And then that problem is often referred to as the Reading Comprehension problem."},{"from":671.35,"to":674.71,"location":2,"content":"And so that's really what I'm gonna focus on today."},{"from":674.71,"to":679.53,"location":2,"content":"Um, Reading Comprehension isn't a new problem."},{"from":679.53,"to":686.35,"location":2,"content":"I mean it- you can trace it back into the early days of artificial intelligence and NLP."},{"from":686.35,"to":688.29,"location":2,"content":"So, back in the 70's,"},{"from":688.29,"to":691.52,"location":2,"content":"a lot of NLP work was trying to do Reading Comprehension."},{"from":691.52,"to":695.42,"location":2,"content":"I mean one of the famous strands of that, um, was, um,"},{"from":695.42,"to":698.43,"location":2,"content":"Sir Roger Shank was a famous,"},{"from":698.43,"to":701.03,"location":2,"content":"um, early NLP person."},{"from":701.03,"to":702.65,"location":2,"content":"Though not a terribly nice man."},{"from":702.65,"to":703.99,"location":2,"content":"I don't think, actually."},{"from":703.99,"to":708.44,"location":2,"content":"Um, but the Yale School of AI was a very well-known,"},{"from":708.44,"to":711.83,"location":2,"content":"um, NLP approach and really,"},{"from":711.83,"to":715.38,"location":2,"content":"it was very focused on Reading Comprehension."},{"from":715.38,"to":718.21,"location":2,"content":"Um, but it's sort of,"},{"from":718.21,"to":721.07,"location":2,"content":"you know, I think it was sort of the time, it was too early in any way."},{"from":721.07,"to":723.73,"location":2,"content":"It sort of died out. Nothing much came out of that."},{"from":723.73,"to":727.67,"location":2,"content":"Um, but then in- right just before the turn of the mil- millennium,"},{"from":727.67,"to":731.15,"location":2,"content":"Lynette Hirschman revived this idea and said, \"Well,"},{"from":731.15,"to":734,"location":2,"content":"maybe a good challenge would be to find the kind of"},{"from":734,"to":738.15,"location":2,"content":"Reading Comprehension questions that elementary school kids do,"},{"from":738.15,"to":739.7,"location":2,"content":"and let's see if we could get,"},{"from":739.7,"to":741.5,"location":2,"content":"um, computers to do that."},{"from":741.5,"to":744.53,"location":2,"content":"And some people tried that with fairly simple methods,"},{"from":744.53,"to":746.69,"location":2,"content":"which only work mediocrely."},{"from":746.69,"to":749.18,"location":2,"content":"Then sort of somewhat after that, um,"},{"from":749.18,"to":751.46,"location":2,"content":"Chris Burges who was a guy who was at"},{"from":751.46,"to":754.61,"location":2,"content":"Microsoft Research and he wasn't really an NLP person at all."},{"from":754.61,"to":756.34,"location":2,"content":"He was a machine learning person,"},{"from":756.34,"to":759.07,"location":2,"content":"but he got it into his head, um,"},{"from":759.07,"to":763.82,"location":2,"content":"that while really a big problem that should be being worked on is"},{"from":763.82,"to":769.12,"location":2,"content":"Machine Comprehension and he suggested that you sort of could codify it like this."},{"from":769.12,"to":772.72,"location":2,"content":"And this is a particular clean codification"},{"from":772.72,"to":775.34,"location":2,"content":"that has lived on and we'll look at more today."},{"from":775.34,"to":778.88,"location":2,"content":"All right. So, a machine comprehends a passage of text."},{"from":778.88,"to":781.64,"location":2,"content":"If there's any question regarding that text that can be"},{"from":781.64,"to":784.49,"location":2,"content":"answered correctly by a majority of native speakers,"},{"from":784.49,"to":786.89,"location":2,"content":"that machine can provide a string,"},{"from":786.89,"to":789.47,"location":2,"content":"which those speakers would agree both answers"},{"from":789.47,"to":793.57,"location":2,"content":"that question and does not contain information irrelevant to that question."},{"from":793.57,"to":797.75,"location":2,"content":"Um, and he sort of proposed this as sort of a challenge problem for"},{"from":797.75,"to":801.98,"location":2,"content":"artificial intelligence and set about collecting a corpus,"},{"from":801.98,"to":807.41,"location":2,"content":"the MCTest corpus, which was meant to be a simple Reading Comprehension challenge."},{"from":807.41,"to":809.86,"location":2,"content":"Um, so they collected, um,"},{"from":809.86,"to":812.84,"location":2,"content":"stories, um, which, um,"},{"from":812.84,"to":815.51,"location":2,"content":"were meant to be kids' stories, you know."},{"from":815.51,"to":817.79,"location":2,"content":"\"Alyssa got to the beach after a long trip."},{"from":817.79,"to":820.01,"location":2,"content":"She's from Charlotte. She traveled from Atlanta."},{"from":820.01,"to":821.57,"location":2,"content":"She's now in Miami\"."},{"from":821.57,"to":823.5,"location":2,"content":"Sort of pretty easy stuff."},{"from":823.5,"to":825.18,"location":2,"content":"And then there were questions."},{"from":825.18,"to":827.79,"location":2,"content":"\"Why did Alyssa go to Miami?\""},{"from":827.79,"to":829.89,"location":2,"content":"Um, and then the answer is,"},{"from":829.89,"to":831.32,"location":2,"content":"\"To visit some friends\"."},{"from":831.32,"to":835.13,"location":2,"content":"And so you've got there this string that is coming from the passage."},{"from":835.13,"to":837.51,"location":2,"content":"That's the answer to the question."},{"from":837.51,"to":840.95,"location":2,"content":"Um, so the MCTest is a corpus of"},{"from":840.95,"to":847.16,"location":2,"content":"about 600 such stories and that challenge existed, and a few people worked on it."},{"from":847.16,"to":851.24,"location":2,"content":"But that never really went very far either for the next couple of years."},{"from":851.24,"to":855.35,"location":2,"content":"But what really changed things was that in 2015,"},{"from":855.35,"to":858.51,"location":2,"content":"and then with more stuff in 2016,"},{"from":858.51,"to":863,"location":2,"content":"um, deep learning people got interested in this idea of,"},{"from":863,"to":867.62,"location":2,"content":"\"Could we perhaps build neural question answering systems?\""},{"from":867.62,"to":870.97,"location":2,"content":"And it seemed like if you wanted to do that, um,"},{"from":870.97,"to":873.98,"location":2,"content":"something like MCTest could only be a test set"},{"from":873.98,"to":878.24,"location":2,"content":"and the ways to make progress would be to do what had been done"},{"from":878.24,"to":885.68,"location":2,"content":"in other domains and to actually build just- hand build a large training set of passages,"},{"from":885.68,"to":890.87,"location":2,"content":"questions, and answers in such a way that would be able to train neural networks using"},{"from":890.87,"to":893.6,"location":2,"content":"the kind of supervised learning techniques that we've"},{"from":893.6,"to":896.54,"location":2,"content":"concentrated on so far in this class."},{"from":896.54,"to":900.45,"location":2,"content":"And indeed, the kind of supervised neural network learning techniques,"},{"from":900.45,"to":902.99,"location":2,"content":"which is [NOISE] actually the successful stuff that"},{"from":902.99,"to":906.5,"location":2,"content":"powers nearly all the applications of deep learning,"},{"from":906.5,"to":907.96,"location":2,"content":"not only in NLP,"},{"from":907.96,"to":910.2,"location":2,"content":"but also in other fields like vision."},{"from":910.2,"to":915.68,"location":2,"content":"Um, and so the first subs- the first such dataset was built by"},{"from":915.68,"to":920.99,"location":2,"content":"people at DeepMind over CNN and Daily Mail news stories."},{"from":920.99,"to":923.3,"location":2,"content":"Um, but then the next year, um,"},{"from":923.3,"to":926.27,"location":2,"content":"Pranav Rajpurkar is a Stanford PhD student"},{"from":926.27,"to":929.27,"location":2,"content":"working with Percy Liang and a couple of other students, um,"},{"from":929.27,"to":931.05,"location":2,"content":"produced the SQuAD dataset,"},{"from":931.05,"to":934.58,"location":2,"content":"which was actually a much better designed dataset and proved to be"},{"from":934.58,"to":938.12,"location":2,"content":"sort of much more successful at driving this forward."},{"from":938.12,"to":939.83,"location":2,"content":"And then following along from that,"},{"from":939.83,"to":942.77,"location":2,"content":"other people started to produce lots of other,"},{"from":942.77,"to":945.59,"location":2,"content":"um, question answering datasets which, you know,"},{"from":945.59,"to":948.22,"location":2,"content":"many of them have interesting advantages"},{"from":948.22,"to":951.32,"location":2,"content":"and disadvantages of their own including MS MARCO,"},{"from":951.32,"to":953.81,"location":2,"content":"TriviaQA, RACE, blah, blah, blah, lots of them."},{"from":953.81,"to":955.76,"location":2,"content":"Um, but for today's class,"},{"from":955.76,"to":958.1,"location":2,"content":"I'm gonna concentrate on SQuAD,"},{"from":958.1,"to":963.89,"location":2,"content":"because SQuAD is actually the one that has been by far the most widely used."},{"from":963.89,"to":970.14,"location":2,"content":"And because it - it was just a well-constructed clean dataset,"},{"from":970.14,"to":973.67,"location":2,"content":"that it sort of just proved a profitable one for people to work with."},{"from":973.67,"to":977.26,"location":2,"content":"[NOISE]"},{"from":977.26,"to":980.23,"location":2,"content":"Okay. Um, so, that was reading comprehension."},{"from":980.23,"to":983.05,"location":2,"content":"I'll also just quickly tell you the, um,"},{"from":983.05,"to":986.49,"location":2,"content":"the history of open domain question answering."},{"from":986.49,"to":989.08,"location":2,"content":"So, the difference here for the- the field of"},{"from":989.08,"to":993.3,"location":2,"content":"Open-domain Question Answering that we're saying, okay,"},{"from":993.3,"to":997.35,"location":2,"content":"there's an encyclopedia or there's a web crawl,"},{"from":997.35,"to":999.4,"location":2,"content":"I'm just going to ask a question,"},{"from":999.4,"to":1000.56,"location":2,"content":"can you answer it?"},{"from":1000.56,"to":1003.55,"location":2,"content":"So, it's this bigger task of question answering."},{"from":1003.55,"to":1006.57,"location":2,"content":"And, you know, that was something that again was thought about,"},{"from":1006.57,"to":1009,"location":2,"content":"um, very early on."},{"from":1009,"to":1011.46,"location":2,"content":"So, there's this kind of early, um,"},{"from":1011.46,"to":1016.17,"location":2,"content":"CACM paper by Simmons who sort of explores how you could"},{"from":1016.17,"to":1020.94,"location":2,"content":"do answering questions as textual question-answering, um, and yet, you know,"},{"from":1020.94,"to":1023.01,"location":2,"content":"he has the idea that what's going to"},{"from":1023.01,"to":1025.76,"location":2,"content":"happen is you're gonna dependency parse the question,"},{"from":1025.76,"to":1028.47,"location":2,"content":"and dependency parse sentences of the text,"},{"from":1028.47,"to":1031.83,"location":2,"content":"and then sort of do tree matching over the dependency parses,"},{"from":1031.83,"to":1033.66,"location":2,"content":"um, to get out the answers."},{"from":1033.66,"to":1035.87,"location":2,"content":"And, you know, that's in some sense"},{"from":1035.87,"to":1042.12,"location":2,"content":"actually prefigured work that people actually were then attempting to do 35 years later."},{"from":1042.12,"to":1045.57,"location":2,"content":"Um, getting a bit more modern, um, Julian Kupiec,"},{"from":1045.57,"to":1048,"location":2,"content":"she was working at Xerox PARC at the time,"},{"from":1048,"to":1051.24,"location":2,"content":"um, came up with this system called MURAX,"},{"from":1051.24,"to":1055.89,"location":2,"content":"and so at this stage in the 90s there started to be the first, um,"},{"from":1055.89,"to":1058.77,"location":2,"content":"digitally available encyclopedias available,"},{"from":1058.77,"to":1061.28,"location":2,"content":"so he was using the Grolier's Encyclopedia,"},{"from":1061.28,"to":1064.56,"location":2,"content":"and so he said about trying to build a system that could answer"},{"from":1064.56,"to":1067.98,"location":2,"content":"questions over that encyclopedia using,"},{"from":1067.98,"to":1070.59,"location":2,"content":"in general, fairly sort of shallow, um,"},{"from":1070.59,"to":1075.43,"location":2,"content":"linguistic processing methods, i.e, regular expressions."},{"from":1075.43,"to":1078.21,"location":2,"content":"Um, for, after [LAUGHTER] having, um,"},{"from":1078.21,"to":1081.56,"location":2,"content":"done information retrieval search over that."},{"from":1081.56,"to":1085.52,"location":2,"content":"But that started to evoke more interest from other people,"},{"from":1085.52,"to":1093.13,"location":2,"content":"and so in 1999 the US National Institutes of Standards and Technology, um,"},{"from":1093.13,"to":1097.17,"location":2,"content":"instituted a TREC question-answering track where the idea was,"},{"from":1097.17,"to":1101.14,"location":2,"content":"there was a large collection of News-wire documents,"},{"from":1101.14,"to":1105.09,"location":2,"content":"and you could be asked to provide the question of them,"},{"from":1105.09,"to":1108.39,"location":2,"content":"and lots of people started to build question answering systems."},{"from":1108.39,"to":1110.85,"location":2,"content":"Indeed, if in some sense that was"},{"from":1110.85,"to":1115.56,"location":2,"content":"this competition which was where people at IBM started,"},{"from":1115.56,"to":1118.32,"location":2,"content":"um, working on textual question-answering,"},{"from":1118.32,"to":1122.01,"location":2,"content":"and then, um, sort of a decade later, um,"},{"from":1122.01,"to":1127.31,"location":2,"content":"IBM rejigged things into the sexier format of,"},{"from":1127.31,"to":1132.97,"location":2,"content":"um, let's build a Jeopardy contestant rather than let's answer questions from the news,"},{"from":1132.97,"to":1136.62,"location":2,"content":"and that then led to their DeepQA system in 2011."},{"from":1136.62,"to":1139.15,"location":2,"content":"Which I presume quite a few of you saw,"},{"from":1139.15,"to":1142.55,"location":2,"content":"these people saw Jeopardy IBM?"},{"from":1142.55,"to":1144.12,"location":2,"content":"Yeah, some of you."},{"from":1144.12,"to":1147.19,"location":2,"content":"Okay. So, that they were able to successfully, um,"},{"from":1147.19,"to":1153.18,"location":2,"content":"build a question answering system that could compete at Jeopardy, um, and win."},{"from":1153.18,"to":1157.71,"location":2,"content":"Um, and, you know, like a lot of these demonstrations of"},{"from":1157.71,"to":1163.95,"location":2,"content":"technological success there are things you can quibble about the way it was set up,"},{"from":1163.95,"to":1167.25,"location":2,"content":"um, that really the kind of computer just had"},{"from":1167.25,"to":1172.26,"location":2,"content":"a speed advantage versus the human beings that had to buzz in to answer the question."},{"from":1172.26,"to":1174.94,"location":2,"content":"But, you know, nevertheless, fundamentally,"},{"from":1174.94,"to":1177.54,"location":2,"content":"the textual question-answering had to work,"},{"from":1177.54,"to":1182.89,"location":2,"content":"that this was a system that was answering questions mainly based on textual passages,"},{"from":1182.89,"to":1187.08,"location":2,"content":"and it had to be able to find the answers to those questions correctly,"},{"from":1187.08,"to":1188.79,"location":2,"content":"for the system to work."},{"from":1188.79,"to":1192.09,"location":2,"content":"Um, so then, more recently again, um,"},{"from":1192.09,"to":1195.99,"location":2,"content":"and really the first piece of work that did this with a neural system was,"},{"from":1195.99,"to":1198,"location":2,"content":"um, work that was, um,"},{"from":1198,"to":1199.65,"location":2,"content":"done by a Stanford PhD student,"},{"from":1199.65,"to":1200.92,"location":2,"content":"that I'll get to later,"},{"from":1200.92,"to":1202.35,"location":2,"content":"was then the idea of well,"},{"from":1202.35,"to":1206.94,"location":2,"content":"could we replace traditional complex question answering systems"},{"from":1206.94,"to":1209.95,"location":2,"content":"by using a neural reading comprehension system,"},{"from":1209.95,"to":1212.28,"location":2,"content":"and that's proved to be very successful."},{"from":1212.28,"to":1215.97,"location":2,"content":"So, to, to explain that a little bit more, um,"},{"from":1215.97,"to":1220.41,"location":2,"content":"if you look at the kind of systems that were built for TREC question-answering,"},{"from":1220.41,"to":1224.64,"location":2,"content":"um, they were very complex multi-part systems."},{"from":1224.64,"to":1227.57,"location":2,"content":"And really, if you then look at something like,"},{"from":1227.57,"to":1231.51,"location":2,"content":"IBM's Deep QA system it was sort of like this"},{"from":1231.51,"to":1235.95,"location":2,"content":"times 10 because it both had very complex systems like this,"},{"from":1235.95,"to":1240.46,"location":2,"content":"but it ensembled together sort of six different components in every place,"},{"from":1240.46,"to":1241.86,"location":2,"content":"and then did sort of,"},{"from":1241.86,"to":1245.22,"location":2,"content":"um, classify a combination on top of them."},{"from":1245.22,"to":1246.66,"location":2,"content":"But so far, the current-."},{"from":1246.66,"to":1251.85,"location":2,"content":"This is sort of around a sort of a 2003 question answering system,"},{"from":1251.85,"to":1255.12,"location":2,"content":"and so the kind of things that went through is,"},{"from":1255.12,"to":1256.98,"location":2,"content":"so when there was a question,"},{"from":1256.98,"to":1259.47,"location":2,"content":"it parsed the question with a parser"},{"from":1259.47,"to":1262.38,"location":2,"content":"kind of like the ones we saw with our dependency parsers."},{"from":1262.38,"to":1263.88,"location":2,"content":"It did some sort of"},{"from":1263.88,"to":1269.43,"location":2,"content":"handwritten semantic normalization rules to try and get them into a better semantic form."},{"from":1269.43,"to":1273.14,"location":2,"content":"It then had a question type classifier which tried to"},{"from":1273.14,"to":1276.89,"location":2,"content":"work out what kind of semantic type is this question looking for,"},{"from":1276.89,"to":1278.78,"location":2,"content":"is it looking for a person name,"},{"from":1278.78,"to":1279.89,"location":2,"content":"or a country name,"},{"from":1279.89,"to":1282.86,"location":2,"content":"or a temperature, or something like that."},{"from":1282.86,"to":1287.83,"location":2,"content":"Um, it would, um, then, um,"},{"from":1287.83,"to":1292.28,"location":2,"content":"have an information retrieval system out of the document collection,"},{"from":1292.28,"to":1297.57,"location":2,"content":"um, which would find paragraphs that were likely to contain the answers."},{"from":1297.57,"to":1300.51,"location":2,"content":"Um, and then it would have a method of ranking"},{"from":1300.51,"to":1305.17,"location":2,"content":"those paragraph choices to see which ones are likely to have the answers."},{"from":1305.17,"to":1307.74,"location":2,"content":"Um, it would then,"},{"from":1307.74,"to":1310.37,"location":2,"content":"um, over there somewhere, um,"},{"from":1310.37,"to":1316.32,"location":2,"content":"run Named Entity Recognition on those passages to find entities that were in them."},{"from":1316.32,"to":1319.52,"location":2,"content":"These systems depended strongly on the use of"},{"from":1319.52,"to":1322.35,"location":2,"content":"fine matching entities because then it could look for"},{"from":1322.35,"to":1325.76,"location":2,"content":"an entity which corresponded to the question type."},{"from":1325.76,"to":1329.97,"location":2,"content":"Um, then once it had candidate entities,"},{"from":1329.97,"to":1331.98,"location":2,"content":"it had to actually try and determine whether"},{"from":1331.98,"to":1334.98,"location":2,"content":"these entities did or didn't answer the question."},{"from":1334.98,"to":1338.74,"location":2,"content":"So, these people, this is the system from LCC by,"},{"from":1338.74,"to":1341.1,"location":2,"content":"um, Sanda Harabagiu and Dan Moldovan."},{"from":1341.1,"to":1343.61,"location":2,"content":"They actually had some quite interesting stuff here,"},{"from":1343.61,"to":1348.9,"location":2,"content":"where they had a kind of a loose theorem prover that would try and prove that, um,"},{"from":1348.9,"to":1351.51,"location":2,"content":"the semantic form of a piece of text,"},{"from":1351.51,"to":1354.12,"location":2,"content":"um, gave an answer to what the question was."},{"from":1354.12,"to":1358.41,"location":2,"content":"So, you know, that was kind of cool stuff with an Axiomatic Knowledge Base,"},{"from":1358.41,"to":1361.28,"location":2,"content":"um, and eventually out would come an answer."},{"from":1361.28,"to":1364.31,"location":2,"content":"Um, so, you know, something that is,"},{"from":1364.31,"to":1366.3,"location":2,"content":"I do just want to emphasize, you know,"},{"from":1366.3,"to":1370.05,"location":2,"content":"sometimes with these deep learning courses you get these days,"},{"from":1370.05,"to":1375.33,"location":2,"content":"the impression you have is that absolutely nothing worked before 2014,"},{"from":1375.33,"to":1377.44,"location":2,"content":"uh, when we got back to deep learning,"},{"from":1377.44,"to":1379.44,"location":2,"content":"and that's not actually true."},{"from":1379.44,"to":1381.57,"location":2,"content":"So, these kind of factoid question on,"},{"from":1381.57,"to":1383.97,"location":2,"content":"these kind of question answering systems within"},{"from":1383.97,"to":1387.13,"location":2,"content":"a certain domain actually really worked rather well."},{"from":1387.13,"to":1390.69,"location":2,"content":"Um, so, I started saying the word Factoid Question Answering,"},{"from":1390.69,"to":1393.12,"location":2,"content":"and so let me explain that because that's the secret."},{"from":1393.12,"to":1394.86,"location":2,"content":"So, people, at least in NLP,"},{"from":1394.86,"to":1397.96,"location":2,"content":"use the term \"Factoid Question Answering\" to mean"},{"from":1397.96,"to":1401.79,"location":2,"content":"the case that your answer is a named entity."},{"from":1401.79,"to":1403.89,"location":2,"content":"So, it's sort of something like, you know,"},{"from":1403.89,"to":1406.21,"location":2,"content":"what year was Elvis Presley born,"},{"from":1406.21,"to":1412.05,"location":2,"content":"or what is the name of Beyonce's husband, or, um,"},{"from":1412.05,"to":1415.32,"location":2,"content":"you know, which state,"},{"from":1415.32,"to":1418.74,"location":2,"content":"um, has the most pork or something, I don't know."},{"from":1418.74,"to":1420.24,"location":2,"content":"Right, anything that's got,"},{"from":1420.24,"to":1425.2,"location":2,"content":"anything that's sort of the answer is sort of some clear semantic type entity,"},{"from":1425.2,"to":1426.73,"location":2,"content":"and that's your answer."},{"from":1426.73,"to":1430.93,"location":2,"content":"I mean, so, within the space of those kind of questions,"},{"from":1430.93,"to":1435.19,"location":2,"content":"which actually is a significant part of the questions you get in web search, right?"},{"from":1435.19,"to":1438.63,"location":2,"content":"Lots of web search is just, you know,"},{"from":1438.63,"to":1441.12,"location":2,"content":"who was the star of this movie,"},{"from":1441.12,"to":1443.36,"location":2,"content":"or what year was somebody born, right?"},{"from":1443.36,"to":1445.79,"location":2,"content":"There's zillions of those all the time."},{"from":1445.79,"to":1448.71,"location":2,"content":"These systems actually really did work quite well"},{"from":1448.71,"to":1452.07,"location":2,"content":"that they could get about 70 percent of those questions right,"},{"from":1452.07,"to":1454.11,"location":2,"content":"um, which wasn't bad at all, um,"},{"from":1454.11,"to":1456.27,"location":2,"content":"though that they really sort of didn't really"},{"from":1456.27,"to":1459.38,"location":2,"content":"extend it out to other kinds of stuff beyond that."},{"from":1459.38,"to":1462.4,"location":2,"content":"But whatever virtues they had, um,"},{"from":1462.4,"to":1468.28,"location":2,"content":"they were extremely complex systems that people spent years put togeth- putting together,"},{"from":1468.28,"to":1472.88,"location":2,"content":"which had many components with a huge amount of hand-built stuff."},{"from":1472.88,"to":1479.04,"location":2,"content":"And most of the stuff was sort of built quite separately and tied together,"},{"from":1479.04,"to":1481.12,"location":2,"content":"and you just sort of hope that it worked,"},{"from":1481.12,"to":1484.05,"location":2,"content":"um, well, when put together in composite."},{"from":1484.05,"to":1487.69,"location":2,"content":"And so we can contrast that to what we then see later,"},{"from":1487.69,"to":1491.28,"location":2,"content":"um, for neural network-style systems."},{"from":1491.28,"to":1497.35,"location":2,"content":"Okay. Um, so let me now say some more stuff about, um,"},{"from":1497.35,"to":1502.87,"location":2,"content":"the Stanford Question Answering Dataset or SQuAD that I just mentioned a little bit ago,"},{"from":1502.87,"to":1507.06,"location":2,"content":"and as this is the data for the default final project as well."},{"from":1507.06,"to":1510.04,"location":2,"content":"Um, so what SQuAD has is,"},{"from":1510.04,"to":1513.49,"location":2,"content":"questions in SQuAD have a passage,"},{"from":1513.49,"to":1516.07,"location":2,"content":"which is a paragraph from Wikipedia."},{"from":1516.07,"to":1518.42,"location":2,"content":"And then there is a question,"},{"from":1518.42,"to":1521.76,"location":2,"content":"here it's, \"Which team won Super Bowl 50?\""},{"from":1521.76,"to":1527.27,"location":2,"content":"And the goal of the system is to come up with the answer to this question."},{"from":1527.27,"to":1530.43,"location":2,"content":"Um, human reading comprehension."},{"from":1530.43,"to":1532.35,"location":2,"content":"What is the answer to the question?"},{"from":1532.35,"to":1536.64,"location":2,"content":"[NOISE]"},{"from":1536.64,"to":1537.51,"location":2,"content":"Broncos."},{"from":1537.51,"to":1539.13,"location":2,"content":"Broncos. [LAUGHTER] Okay."},{"from":1539.13,"to":1542.73,"location":2,"content":"Yeah. Um, so that's the answer to the question."},{"from":1542.73,"to":1547.06,"location":2,"content":"Um, and so by construction for SQuAD,"},{"from":1547.06,"to":1553.57,"location":2,"content":"the answer to a question is always a sub-sequence of words from the passage which is,"},{"from":1553.57,"to":1556.35,"location":2,"content":"normally, it ends up being referred to as a span,"},{"from":1556.35,"to":1558.58,"location":2,"content":"a sub-sequence of words from the passage."},{"from":1558.58,"to":1561.67,"location":2,"content":"So that's the only kind of questions you can have."},{"from":1561.67,"to":1564.64,"location":2,"content":"You can't have questions that are counting questions,"},{"from":1564.64,"to":1567.13,"location":2,"content":"or yes, no questions, or anything like that."},{"from":1567.13,"to":1570.47,"location":2,"content":"You can just pick out a sub-sequence."},{"from":1570.47,"to":1572.26,"location":2,"content":"Um, okay."},{"from":1572.26,"to":1578.65,"location":2,"content":"But, um, so they created in the first version about 100,000 examples."},{"from":1578.65,"to":1582.04,"location":2,"content":"So there are a bunch of questions about each passage."},{"from":1582.04,"to":1584.2,"location":2,"content":"So it's sort of something like, um,"},{"from":1584.2,"to":1588.58,"location":2,"content":"I think it's maybe sort of about five questions per passage,"},{"from":1588.58,"to":1592.32,"location":2,"content":"and there are 20,000 different bits that Wikipedia uses, used."},{"from":1592.32,"to":1594.91,"location":2,"content":"Um, and this sort of must be a span form,"},{"from":1594.91,"to":1599.26,"location":2,"content":"as often referred to as extractive question answering."},{"from":1599.26,"to":1603.52,"location":2,"content":"Okay. Um, here's just one more example"},{"from":1603.52,"to":1607.54,"location":2,"content":"that can give you some more sense of some of the things that are there,"},{"from":1607.54,"to":1610.35,"location":2,"content":"and it illustrates a couple of other factors."},{"from":1610.35,"to":1612.76,"location":2,"content":"Um, so, you know,"},{"from":1612.76,"to":1616.36,"location":2,"content":"even this one, I guess the previous one wasn't, um,"},{"from":1616.36,"to":1619.6,"location":2,"content":"completely obvious what your answers should be because"},{"from":1619.6,"to":1622.9,"location":2,"content":"maybe you could say the answer should just have been Broncos,"},{"from":1622.9,"to":1625.72,"location":2,"content":"or you could have said it was Denver Broncos."},{"from":1625.72,"to":1627.34,"location":2,"content":"Um, and in general,"},{"from":1627.34,"to":1629.79,"location":2,"content":"even if you're answering with a span,"},{"from":1629.79,"to":1633.44,"location":2,"content":"there's gonna be variation as to how long a span you choose."},{"from":1633.44,"to":1636.04,"location":2,"content":"Um, so what they did, um,"},{"from":1636.04,"to":1638.68,"location":2,"content":"and so this was done with, on Mechanical Turk,"},{"from":1638.68,"to":1641.17,"location":2,"content":"gathering the data, or building questions,"},{"from":1641.17,"to":1645.79,"location":2,"content":"and getting answers, is that they got answers from three different people."},{"from":1645.79,"to":1646.9,"location":2,"content":"So here's this question,"},{"from":1646.9,"to":1649.81,"location":2,"content":"\"Along with non-governmental and non-state schools,"},{"from":1649.81,"to":1652.03,"location":2,"content":"what is another name for private schools?\""},{"from":1652.03,"to":1655.59,"location":2,"content":"And three human beings were asked the answer based on this passage."},{"from":1655.59,"to":1657.01,"location":2,"content":"And one said independent,"},{"from":1657.01,"to":1659.48,"location":2,"content":"and two said independent schools."},{"from":1659.48,"to":1662.95,"location":2,"content":"Um, this one, all three people gave the same answer."},{"from":1662.95,"to":1665.52,"location":2,"content":"This one, again, you get two different answers,"},{"from":1665.52,"to":1668.02,"location":2,"content":"so that they sample three answers."},{"from":1668.02,"to":1672.67,"location":2,"content":"And basically, then, you can be correct if you're going with any of the answers."},{"from":1672.67,"to":1679.33,"location":2,"content":"And so that sort of at least gives you a bit of robustness to variation in human answers."},{"from":1679.33,"to":1684.46,"location":2,"content":"Okay. And that starts me into the topic of evaluation."},{"from":1684.46,"to":1685.86,"location":2,"content":"Um, yeah."},{"from":1685.86,"to":1688.45,"location":2,"content":"And these slides here are entitled"},{"from":1688.45,"to":1692.14,"location":2,"content":"SQuAD version 1.1 because that means in five minutes time,"},{"from":1692.14,"to":1694.6,"location":2,"content":"I'm gonna tell you about SQuAD version 2,"},{"from":1694.6,"to":1696.64,"location":2,"content":"which adds a bit more stuff into it,"},{"from":1696.64,"to":1699.54,"location":2,"content":"but we'll just get 1.1 straight first."},{"from":1699.54,"to":1702.89,"location":2,"content":"All right. So there are three answers that col- were collected."},{"from":1702.89,"to":1705.28,"location":2,"content":"And so for evaluation metrics,"},{"from":1705.28,"to":1708.14,"location":2,"content":"they suggested two evaluation metrics."},{"from":1708.14,"to":1711.34,"location":2,"content":"The first one is exact match."},{"from":1711.34,"to":1714.25,"location":2,"content":"So you're going to return a span."},{"from":1714.25,"to":1717.97,"location":2,"content":"If the span is one of these three,"},{"from":1717.97,"to":1719.52,"location":2,"content":"you get one point,"},{"from":1719.52,"to":1720.82,"location":2,"content":"and if the scan,"},{"from":1720.82,"to":1722.98,"location":2,"content":"span is not one of these three,"},{"from":1722.98,"to":1725.18,"location":2,"content":"you get zero for that question."},{"from":1725.18,"to":1728.56,"location":2,"content":"And then your accuracy is just the percent correct,"},{"from":1728.56,"to":1730.35,"location":2,"content":"so that's extremely simple."},{"from":1730.35,"to":1732.91,"location":2,"content":"But the second metric, and actually,"},{"from":1732.91,"to":1735.98,"location":2,"content":"the one that was favored as the primary metric,"},{"from":1735.98,"to":1738.23,"location":2,"content":"was an F1 metric."},{"from":1738.23,"to":1741.5,"location":2,"content":"So what you do for this F1 metric"},{"from":1741.5,"to":1745.11,"location":2,"content":"is you're matching at the word level for the different answers."},{"from":1745.11,"to":1746.93,"location":2,"content":"So you've treat each,"},{"from":1746.93,"to":1752.28,"location":2,"content":"you treat the system span and each gold answer as a bag of words,"},{"from":1752.28,"to":1754.93,"location":2,"content":"and then you work out a precision, which is,"},{"from":1754.93,"to":1762.78,"location":2,"content":"um, the percent of words in the system's answer that are actually in a span,"},{"from":1762.78,"to":1765.77,"location":2,"content":"i- in a gold span, the recall,"},{"from":1765.77,"to":1771.62,"location":2,"content":"which is the percent of words in a gold span that are in the system's span."},{"from":1771.62,"to":1774.72,"location":2,"content":"And then you calculate the harmonic mean of those two numbers"},{"from":1774.72,"to":1777.76,"location":2,"content":"and the harmonic mean is sort of a very conservative average."},{"from":1777.76,"to":1780.46,"location":2,"content":"So it's close to the mean of those two numbers,"},{"from":1780.46,"to":1782.8,"location":2,"content":"and that gives you a score."},{"from":1782.8,"to":1787.38,"location":2,"content":"And what you then do is, for each question,"},{"from":1787.38,"to":1790.09,"location":2,"content":"you'd return, you say its score is"},{"from":1790.09,"to":1795.36,"location":2,"content":"the maximum F1 over the three different answers that were collected from human beings."},{"from":1795.36,"to":1798.85,"location":2,"content":"And then for the whole, um, dataset,"},{"from":1798.85,"to":1805.19,"location":2,"content":"you then average those F1 scores across questions and that's then your final F1 result."},{"from":1805.19,"to":1808.35,"location":2,"content":"So that's a more complicated thing to say."},{"from":1808.35,"to":1812.08,"location":2,"content":"Um, and we provide there sort of a val code,"},{"from":1812.08,"to":1813.97,"location":2,"content":"um, for you that does that."},{"from":1813.97,"to":1818.23,"location":2,"content":"Um, but it sort of seems that F1 is actually"},{"from":1818.23,"to":1824.2,"location":2,"content":"a more reliable and better measure because if you use exact match,"},{"from":1824.2,"to":1825.85,"location":2,"content":"you know, even though there's of,"},{"from":1825.85,"to":1829.53,"location":2,"content":"a bit of robustness that comes on three people's answers,"},{"from":1829.53,"to":1831.94,"location":2,"content":"three is not a very large sample,"},{"from":1831.94,"to":1834.31,"location":2,"content":"so there's sort of a bit of guessing as to whether you get"},{"from":1834.31,"to":1837.76,"location":2,"content":"exactly the same span some human being got,"},{"from":1837.76,"to":1841.18,"location":2,"content":"whereas you're sort of going to get a reasonable score"},{"from":1841.18,"to":1844.33,"location":2,"content":"in the F1 even if your boundaries are off by a little."},{"from":1844.33,"to":1847.35,"location":2,"content":"So the F1 metric sort of, um,"},{"from":1847.35,"to":1852.76,"location":2,"content":"is more reliable and avoids various kinds of artifacts as to how big"},{"from":1852.76,"to":1858.3,"location":2,"content":"or small an answer human beings tend to choose in some circumstances."},{"from":1858.3,"to":1860.65,"location":2,"content":"Um, and so that's sort of being used as"},{"from":1860.65,"to":1864.95,"location":2,"content":"the primary metric that people score people on in the leader boards."},{"from":1864.95,"to":1867.97,"location":2,"content":"Um, final detail, both metrics, um,"},{"from":1867.97,"to":1873.23,"location":2,"content":"ignore punctuation and the English articles a, an, the."},{"from":1873.23,"to":1877.39,"location":2,"content":"Okay. Um, so how did things work out?"},{"from":1877.39,"to":1881.17,"location":2,"content":"Um, so for SQuAD version 1.1, um."},{"from":1881.17,"to":1883.09,"location":2,"content":"A long time ago,"},{"from":1883.09,"to":1885.25,"location":2,"content":"at the end of 2016,"},{"from":1885.25,"to":1887.9,"location":2,"content":"um, this is how the leaderboard looked."},{"from":1887.9,"to":1890.68,"location":2,"content":"Um, this is the bottom of the leaderboard at this point in"},{"from":1890.68,"to":1894.14,"location":2,"content":"time because that allows me to show you a couple of things."},{"from":1894.14,"to":1896.89,"location":2,"content":"So down at the bottom of the leaderboard, um,"},{"from":1896.89,"to":1900.52,"location":2,"content":"so they tested how well human beings did, um,"},{"from":1900.52,"to":1902.83,"location":2,"content":"at answering these questions because you know,"},{"from":1902.83,"to":1905.88,"location":2,"content":"human beings aren't perfect at answering questions either."},{"from":1905.88,"to":1909.14,"location":2,"content":"Um, and so the human performance that they measured,"},{"from":1909.14,"to":1912.89,"location":2,"content":"um, had an F1 score of 91.2."},{"from":1912.89,"to":1916.29,"location":2,"content":"And I'll come back to that again in a minute."},{"from":1916.29,"to":1919.02,"location":2,"content":"Um, and so when they built the dataset,"},{"from":1919.02,"to":1924.79,"location":2,"content":"they built a logistic regression baseline which was sort of a conventional NLP system."},{"from":1924.79,"to":1929.32,"location":2,"content":"So, they dependency parsed the question and sentences of the answer."},{"from":1929.32,"to":1932.2,"location":2,"content":"They looked for dependency."},{"from":1932.2,"to":1934.78,"location":2,"content":"So dependency link matches,"},{"from":1934.78,"to":1938.35,"location":2,"content":"so a word at both ends with the dependency relation in"},{"from":1938.35,"to":1943.62,"location":2,"content":"between and count and matches of those and sort of pointing to a likely answer."},{"from":1943.62,"to":1949.8,"location":2,"content":"Um, so as sort of a fairly competently built traditional NLP system of it's"},{"from":1949.8,"to":1952.15,"location":2,"content":"not as complex as but it's sort of in"},{"from":1952.15,"to":1956.11,"location":2,"content":"the same vein of that early question answering system I mentioned."},{"from":1956.11,"to":1959.41,"location":2,"content":"And it got an F1 of about 51."},{"from":1959.41,"to":1961.22,"location":2,"content":"So not hopeless, um,"},{"from":1961.22,"to":1963.98,"location":2,"content":"but not that great compared to human beings."},{"from":1963.98,"to":1966.52,"location":2,"content":"And so, very shortly after that, um,"},{"from":1966.52,"to":1968.63,"location":2,"content":"people then started building"},{"from":1968.63,"to":1973.75,"location":2,"content":"neural network systems to try and do better at this task on this dataset."},{"from":1973.75,"to":1978.04,"location":2,"content":"And so, one of the first people to do this quite successfully,"},{"from":1978.04,"to":1981.58,"location":2,"content":"um, were these people from Singapore Management University,"},{"from":1981.58,"to":1985.15,"location":2,"content":"maybe not the first place you would have thought of but, um,"},{"from":1985.15,"to":1988.87,"location":2,"content":"they were really sort of the first people who showed that, yes,"},{"from":1988.87,"to":1992.32,"location":2,"content":"you could build an end-to-end trained neural network"},{"from":1992.32,"to":1995.32,"location":2,"content":"for this task and do rather better."},{"from":1995.32,"to":1998.93,"location":2,"content":"And so, they got up to 67 F1."},{"from":1998.93,"to":2002.1,"location":2,"content":"Um, and well, then they had a second system."},{"from":2002.1,"to":2004.99,"location":2,"content":"They got 70 and then things started,"},{"from":2004.99,"to":2008.14,"location":2,"content":"um, to, um, go on."},{"from":2008.14,"to":2009.67,"location":2,"content":"So that even by,"},{"from":2009.67,"to":2012.57,"location":2,"content":"um, the end of 2016,"},{"from":2012.57,"to":2018.18,"location":2,"content":"um, there started to be systems that really worked rather well on this task."},{"from":2018.18,"to":2020.98,"location":2,"content":"Um, so here, this time was the,"},{"from":2020.98,"to":2022.82,"location":2,"content":"um, top of the leaderboard."},{"from":2022.82,"to":2026.45,"location":2,"content":"So I'll talk later about this BiDAF system from, uh,"},{"from":2026.45,"to":2028.38,"location":2,"content":"the AI to,"},{"from":2028.38,"to":2031.8,"location":2,"content":"Allen Institute for Artificial Intelligence and the University of Washington."},{"from":2031.8,"to":2033.81,"location":2,"content":"So, it was getting to 77 as"},{"from":2033.81,"to":2037.77,"location":2,"content":"a single system that like in just about all machine learning,"},{"from":2037.77,"to":2040.26,"location":2,"content":"people pretty soon noticed that if you made"},{"from":2040.26,"to":2043.44,"location":2,"content":"an ensemble of identically structured systems,"},{"from":2043.44,"to":2046.83,"location":2,"content":"you could push the number higher and so if you ensemble those,"},{"from":2046.83,"to":2051.09,"location":2,"content":"you could then get another sort of whatever it is about four points"},{"from":2051.09,"to":2055.8,"location":2,"content":"and get up to 81, um, F1."},{"from":2055.8,"to":2062.45,"location":2,"content":"And so this was sort of around the situation when in the, uh, 2017, um,"},{"from":2062.45,"to":2070.44,"location":2,"content":"224N class, we first used SQuAD version one as jus- as a default final project."},{"from":2070.44,"to":2072.24,"location":2,"content":"And at that point, you know,"},{"from":2072.24,"to":2076.47,"location":2,"content":"actually the best students got almost to the top of this leaderboard."},{"from":2076.47,"to":2078.18,"location":2,"content":"So our best, um,"},{"from":2078.18,"to":2084.24,"location":2,"content":"CS224N Final Project in winter 2017 made it into,"},{"from":2084.24,"to":2087.69,"location":2,"content":"um, the equivalent of fourth place on this leaderboard,"},{"from":2087.69,"to":2091.08,"location":2,"content":"um, with 77.5 as their score."},{"from":2091.08,"to":2092.79,"location":2,"content":"So that was really rather cool."},{"from":2092.79,"to":2096.11,"location":2,"content":"Um, but that's a couple of years ago and since then,"},{"from":2096.11,"to":2098.1,"location":2,"content":"people have started building, um,"},{"from":2098.1,"to":2102.78,"location":2,"content":"bigger and bigger and more and more complex, um, systems."},{"from":2102.78,"to":2106.14,"location":2,"content":"And, um, so essentially,"},{"from":2106.14,"to":2110.79,"location":2,"content":"you could sort of say that SQuAD version one is basically solved."},{"from":2110.79,"to":2113.97,"location":2,"content":"So the very best systems are now getting"},{"from":2113.97,"to":2118.47,"location":2,"content":"F1 scores that are in the low 90s and in particular,"},{"from":2118.47,"to":2122.91,"location":2,"content":"you can see that the best couple of, um,"},{"from":2122.91,"to":2125.89,"location":2,"content":"systems have higher F1s and"},{"from":2125.89,"to":2131.25,"location":2,"content":"well higher exact matches than what was measured for human beings."},{"from":2131.25,"to":2134.14,"location":2,"content":"Uh, but like a lot of the claims of"},{"from":2134.14,"to":2137.31,"location":2,"content":"deep learning being better and performing from human being,"},{"from":2137.31,"to":2141,"location":2,"content":"than human beings, there's sort of some asterisks you can put after that."},{"from":2141,"to":2143.52,"location":2,"content":"I mean, in particular for this dataset,"},{"from":2143.52,"to":2148.13,"location":2,"content":"the way they measured human performance was a little bit"},{"from":2148.13,"to":2153.87,"location":2,"content":"unfair because they only actually collected three human beings' answers."},{"from":2153.87,"to":2158.34,"location":2,"content":"So, to judge, um, the human performance,"},{"from":2158.34,"to":2165.78,"location":2,"content":"the hu- those hu- each of those humans was being scored versus only two other humans."},{"from":2165.78,"to":2168.78,"location":2,"content":"And so, that means you only had two chances to match instead of three."},{"from":2168.78,"to":2173.82,"location":2,"content":"So, there's actually sort of a systematic underscoring of the human performance."},{"from":2173.82,"to":2177.74,"location":2,"content":"But whatever, systems got very good at doing this."},{"from":2177.74,"to":2180.96,"location":2,"content":"Um, so the next step, um,"},{"from":2180.96,"to":2182.52,"location":2,"content":"was then to introduce, uh,"},{"from":2182.52,"to":2185.45,"location":2,"content":"the SQuAD vers- version 2 task."},{"from":2185.45,"to":2189.99,"location":2,"content":"And so many people felt that a defect of SQuAD version"},{"from":2189.99,"to":2194.99,"location":2,"content":"1 was that in all cases, questions had answers."},{"from":2194.99,"to":2200.45,"location":2,"content":"So, that you just had to find the answer in the paragraph,"},{"from":2200.45,"to":2204.12,"location":2,"content":"um, and so that's sort of turned into a kind of a ranking task."},{"from":2204.12,"to":2208.36,"location":2,"content":"You just had to work out what seems the most likely answer."},{"from":2208.36,"to":2210.5,"location":2,"content":"I'll return that without really having"},{"from":2210.5,"to":2213.91,"location":2,"content":"any idea whether it was an answer to the question or not."},{"from":2213.91,"to":2216.53,"location":2,"content":"And so, for SQuAD version two,"},{"from":2216.53,"to":2218.79,"location":2,"content":"for the dev and test sets,"},{"from":2218.79,"to":2221.76,"location":2,"content":"half of the questions have answers and half of"},{"from":2221.76,"to":2224.95,"location":2,"content":"the questions just don't have an answer in the passage,"},{"from":2224.95,"to":2228.01,"location":2,"content":"um, it's slightly different distribution, the training data."},{"from":2228.01,"to":2232.78,"location":2,"content":"Um, and the way it works for scoring is the sort of, like,"},{"from":2232.78,"to":2238.92,"location":2,"content":"the no answer kind of counts as like one word as a sort of a special token."},{"from":2238.92,"to":2243.69,"location":2,"content":"So, if it's, if it should be a no answer and you say no answer,"},{"from":2243.69,"to":2248.58,"location":2,"content":"you get a score of one on the either exact match or the F-measure."},{"from":2248.58,"to":2250.56,"location":2,"content":"And if you don't do that,"},{"from":2250.56,"to":2252.21,"location":2,"content":"you get a score of zero."},{"from":2252.21,"to":2258.69,"location":2,"content":"Um, and so, the simplest way of approaching SQuAD 2.0 would be to say, well,"},{"from":2258.69,"to":2262.27,"location":2,"content":"rather than just always returning the best match in my system,"},{"from":2262.27,"to":2267.07,"location":2,"content":"I'll use some kind of threshold and only if the score is above a threshold,"},{"from":2267.07,"to":2268.78,"location":2,"content":"our counters and answer."},{"from":2268.78,"to":2271.05,"location":2,"content":"You could do more sophisticated things."},{"from":2271.05,"to":2274.08,"location":2,"content":"So another area that we've worked on quite a bit at Stanford is"},{"from":2274.08,"to":2278.52,"location":2,"content":"this natural language inference task that I'll talk about later in the course."},{"from":2278.52,"to":2282.84,"location":2,"content":"Um, but that's really about saying whether one piece of,"},{"from":2282.84,"to":2285.63,"location":2,"content":"um, text is the conclusion of another,"},{"from":2285.63,"to":2286.89,"location":2,"content":"um, piece of text."},{"from":2286.89,"to":2290.67,"location":2,"content":"And so that's sort of a way that you can try and see whether, uh,"},{"from":2290.67,"to":2297.12,"location":2,"content":"a piece of text actually gives you a justification and answer to what the question was."},{"from":2297.12,"to":2301.53,"location":2,"content":"But at any rate, this trying to decide whether"},{"from":2301.53,"to":2307.01,"location":2,"content":"you've actually got an answer or not is a quite difficult problem in many cases."},{"from":2307.01,"to":2311.88,"location":2,"content":"So here's an example from SQuAD, um, 2.0."},{"from":2311.88,"to":2315.12,"location":2,"content":"So Genghis Khan united the Mongol and Turkic tribes of"},{"from":2315.12,"to":2318.86,"location":2,"content":"the steppes and became Great Khan in 1206."},{"from":2318.86,"to":2322.29,"location":2,"content":"He and his successors expanded the Mongol Empire across Asia,"},{"from":2322.29,"to":2323.94,"location":2,"content":"blah, blah, blah, blah."},{"from":2323.94,"to":2325.64,"location":2,"content":"And the question is,"},{"from":2325.64,"to":2328.26,"location":2,"content":"when did Genghis Khan kill Great Khan?"},{"from":2328.26,"to":2330.48,"location":2,"content":"And the answer to that is,"},{"from":2330.48,"to":2333.53,"location":2,"content":"you know, uh, there isn't an answer because actually,"},{"from":2333.53,"to":2339.15,"location":2,"content":"Genghis Khan was a person named Great Khan and he didn't kill a Great Khan."},{"from":2339.15,"to":2341.84,"location":2,"content":"It's just not a question with an answer."},{"from":2341.84,"to":2347.99,"location":2,"content":"Um, but it's precisely what happens with systems is, you know,"},{"from":2347.99,"to":2351.64,"location":2,"content":"even though these systems get high scores in terms of points,"},{"from":2351.64,"to":2355.98,"location":2,"content":"they don't actually understand human language that well."},{"from":2355.98,"to":2357.61,"location":2,"content":"So they look at something that says,"},{"from":2357.61,"to":2360.86,"location":2,"content":"when did Genghis Khan kill Great Khan?"},{"from":2360.86,"to":2363.93,"location":2,"content":"Well, this is something that's looking for a date and there are"},{"from":2363.93,"to":2367.74,"location":2,"content":"some obvious dates in this passage there's 1206, 1234,"},{"from":2367.74,"to":2371.84,"location":2,"content":"1251 and well, there's kill,"},{"from":2371.84,"to":2376.56,"location":2,"content":"and kill looks a little bit similar to destroyed."},{"from":2376.56,"to":2378.64,"location":2,"content":"I can see the word destroyed."},{"from":2378.64,"to":2381.34,"location":2,"content":"So that probably kind of matches."},{"from":2381.34,"to":2383.4,"location":2,"content":"And then we're talking about, um,"},{"from":2383.4,"to":2385.56,"location":2,"content":"Genghis Khan and there,"},{"from":2385.56,"to":2388.39,"location":2,"content":"I can see Genghis and Khan in this passage."},{"from":2388.39,"to":2390.96,"location":2,"content":"And so it sort of puts that together and says"},{"from":2390.96,"to":2395.18,"location":2,"content":"1234 is the answer when that isn't the answer at all."},{"from":2395.18,"to":2399.87,"location":2,"content":"And that's actually kind of pretty typical of the behavior of these systems."},{"from":2399.87,"to":2403.56,"location":2,"content":"And so that, on the one hand, they work great."},{"from":2403.56,"to":2406.16,"location":2,"content":"On the other hand, they don't actually understand that much,"},{"from":2406.16,"to":2410.03,"location":2,"content":"and effectively asking whether there's,"},{"from":2410.03,"to":2414.93,"location":2,"content":"this question is actually answered in the passage is a way of"},{"from":2414.93,"to":2417.36,"location":2,"content":"revealing the extent to which these models"},{"from":2417.36,"to":2420.95,"location":2,"content":"do or don't understand what's actually going on."},{"from":2420.95,"to":2423.91,"location":2,"content":"Okay. So, at the time, um,"},{"from":2423.91,"to":2427.09,"location":2,"content":"they built SQuAD version 2.0."},{"from":2427.09,"to":2428.84,"location":2,"content":"They took some of, um,"},{"from":2428.84,"to":2432.09,"location":2,"content":"the existing SQuAD version one's systems,"},{"from":2432.09,"to":2436.72,"location":2,"content":"and, um, modified them in a very simple way."},{"from":2436.72,"to":2439.28,"location":2,"content":"I put in a threshold, um,"},{"from":2439.28,"to":2443.18,"location":2,"content":"score as to how good the final match was deemed to be,"},{"from":2443.18,"to":2447.64,"location":2,"content":"and said, Well, how well do you do on SQuAD 2.0?"},{"from":2447.64,"to":2450.82,"location":2,"content":"And the kind of systems that we saw doing well before,"},{"from":2450.82,"to":2452.37,"location":2,"content":"now didn't do that well,"},{"from":2452.37,"to":2458.82,"location":2,"content":"so something like the BiDAF system that we mentioned before was now scoring about 62 F1,"},{"from":2458.82,"to":2461.37,"location":2,"content":"so that that was sort of hugely lowering"},{"from":2461.37,"to":2465.21,"location":2,"content":"its performance and reflecting the limits of understanding."},{"from":2465.21,"to":2469.65,"location":2,"content":"Um, but it turned out actually that this problem didn't prove to"},{"from":2469.65,"to":2474.24,"location":2,"content":"be q- quite as difficult as the dataset authors,"},{"from":2474.24,"to":2476.82,"location":2,"content":"um, maybe thought either."},{"from":2476.82,"to":2479.78,"location":2,"content":"Um, because it turns out that um,"},{"from":2479.78,"to":2483.38,"location":2,"content":"here we are now in February 2019,"},{"from":2483.38,"to":2486.28,"location":2,"content":"and if you look at the top of the leaderboard,"},{"from":2486.28,"to":2489.47,"location":2,"content":"we're kind of getting close again to the point"},{"from":2489.47,"to":2492.78,"location":2,"content":"where the best systems are almost as good as human beings."},{"from":2492.78,"to":2499.08,"location":2,"content":"So, um, the current top rate system there you can see is getting 87.6 F1,"},{"from":2499.08,"to":2503.22,"location":2,"content":"which is less than two points behind where the human beings are."},{"from":2503.22,"to":2507.51,"location":2,"content":"Um, the SQuAD version 2 they also co- corrected the,"},{"from":2507.51,"to":2509.4,"location":2,"content":"um, scoring of human beings,"},{"from":2509.4,"to":2512.8,"location":2,"content":"so it's more of a fair evaluation this time, um,"},{"from":2512.8,"to":2514.92,"location":2,"content":"so there's still a bit of a gap but, you know,"},{"from":2514.92,"to":2518.01,"location":2,"content":"the systems are actually doing, um, really well."},{"from":2518.01,"to":2521.04,"location":2,"content":"And the interesting thing there is,"},{"from":2521.04,"to":2524.63,"location":2,"content":"you know, on the one hand these systems are impressively good."},{"from":2524.63,"to":2526.89,"location":2,"content":"Um, you can go on the SQuAD website and look"},{"from":2526.89,"to":2529.28,"location":2,"content":"at the output of several of the good systems,"},{"from":2529.28,"to":2532.34,"location":2,"content":"and you can see that there are just a ton of things that they get right."},{"from":2532.34,"to":2534.33,"location":2,"content":"They're absolutely not bad systems."},{"from":2534.33,"to":2538.98,"location":2,"content":"You have to be a good system to be getting five out of six of the questions right."},{"from":2538.98,"to":2541.86,"location":2,"content":"Um, but, you know, on the other hand they still"},{"from":2541.86,"to":2545.13,"location":2,"content":"make quite elementary Natural Language Understanding Errors."},{"from":2545.13,"to":2548.3,"location":2,"content":"And so here's an example of one of those."},{"from":2548.3,"to":2549.72,"location":2,"content":"Okay, so this one,"},{"from":2549.72,"to":2552.54,"location":2,"content":"the Yuan dynasty is considered both a successor to"},{"from":2552.54,"to":2556.16,"location":2,"content":"the Mongol Empire and an imperial Chinese dynasty."},{"from":2556.16,"to":2558.84,"location":2,"content":"It was the khanate ruled by the successors of"},{"from":2558.84,"to":2562.66,"location":2,"content":"Mongke Khan after the division of the Mongol Empire."},{"from":2562.66,"to":2566.73,"location":2,"content":"In official Chinese histories the Yuan dynasty bore the Mandate of Heaven,"},{"from":2566.73,"to":2570.48,"location":2,"content":"following the Song dynasty and preceding the Ming dynasty."},{"from":2570.48,"to":2572.66,"location":2,"content":"Okay. And then the question is,"},{"from":2572.66,"to":2575.76,"location":2,"content":"what dynasty came before the Yuan?"},{"from":2575.76,"to":2578.49,"location":2,"content":"And that's a pretty easy question,"},{"from":2578.49,"to":2579.99,"location":2,"content":"I'd hope, for a human being."},{"from":2579.99,"to":2582.83,"location":2,"content":"Everyone can answer that question?"},{"from":2582.83,"to":2588.48,"location":2,"content":"Okay, um, yeah, so it says in official Chinese histories Yuan Dynast- uh,"},{"from":2588.48,"to":2589.92,"location":2,"content":"sorry the next sentence."},{"from":2589.92,"to":2592.56,"location":2,"content":"Um, yeah followed- right the Yuan Dynasty following"},{"from":2592.56,"to":2595.24,"location":2,"content":"the Song dynasty and preceding the Ming dynasty."},{"from":2595.24,"to":2597.55,"location":2,"content":"But, you know actually um,"},{"from":2597.55,"to":2600.96,"location":2,"content":"this sort of the leading um,"},{"from":2600.96,"to":2605.31,"location":2,"content":"Google BERT model says that it was the Ming dynasty that came before"},{"from":2605.31,"to":2609.45,"location":2,"content":"the Yuan Dynasty which you know is sort of elementarily"},{"from":2609.45,"to":2613.32,"location":2,"content":"wrong that reveals some of the same kind of it's"},{"from":2613.32,"to":2618.24,"location":2,"content":"not really understanding everything but it's doing a sort of a matching problem still."},{"from":2618.24,"to":2625.62,"location":2,"content":"Okay. So, this SQuAD dataset has been useful and good."},{"from":2625.62,"to":2628.86,"location":2,"content":"It still has some major limitations and I just thought I'd"},{"from":2628.86,"to":2632.37,"location":2,"content":"mentioned what a few of those are so you're aware of some of the issues."},{"from":2632.37,"to":2634.95,"location":2,"content":"So one of them I've already mentioned, right,"},{"from":2634.95,"to":2640.74,"location":2,"content":"that you're in this space where all answers are a span from the passage."},{"from":2640.74,"to":2643.89,"location":2,"content":"And that just limits the kind of questions you can"},{"from":2643.89,"to":2647.03,"location":2,"content":"ask and the kind of difficult situations there can be."},{"from":2647.03,"to":2650.37,"location":2,"content":"So, there can't be yes-no questions counting"},{"from":2650.37,"to":2655.78,"location":2,"content":"questions or even any of the sort of more difficult implicit questions."},{"from":2655.78,"to":2661.18,"location":2,"content":"So, if you think back to when you were in middle school and did reading comprehension,"},{"from":2661.18,"to":2663.82,"location":2,"content":"I mean, it wasn't typically um,"},{"from":2663.82,"to":2667.44,"location":2,"content":"the case um, that you're being asked"},{"from":2667.44,"to":2671.4,"location":2,"content":"questions that were just stated explicitly in the text of,"},{"from":2671.4,"to":2674.88,"location":2,"content":"you know, Sue is visiting her mother in Miami."},{"from":2674.88,"to":2676.34,"location":2,"content":"And the question was,"},{"from":2676.34,"to":2678.32,"location":2,"content":"who was visiting in Miami?"},{"from":2678.32,"to":2683.73,"location":2,"content":"That wasn't the kind of questions you were asked you were normally asked questions um,"},{"from":2683.73,"to":2686.31,"location":2,"content":"like um, you know,"},{"from":2686.31,"to":2692.51,"location":2,"content":"um, Sue is going to a job interview this morning,"},{"from":2692.51,"to":2696.36,"location":2,"content":"um, it's a really important job interview for her future."},{"from":2696.36,"to":2699.43,"location":2,"content":"At breakfast she um,"},{"from":2699.43,"to":2703.39,"location":2,"content":"starts buttering both sides of her piece of toast um,"},{"from":2703.39,"to":2706.41,"location":2,"content":"and you are asked a question like, um,"},{"from":2706.41,"to":2711.32,"location":2,"content":"why um, is Sue buttering both sides of her piece of toast?"},{"from":2711.32,"to":2713.42,"location":2,"content":"And you're meant to be able to answer,"},{"from":2713.42,"to":2717.68,"location":2,"content":"\"She's distracted by her important job interview coming up later in the day.\""},{"from":2717.68,"to":2720.99,"location":2,"content":"Which isn't the- something that you can answer um,"},{"from":2720.99,"to":2723.51,"location":2,"content":"by just picking out a sub span."},{"from":2723.51,"to":2731.05,"location":2,"content":"Um, a second problem which is sort of actually a bigger problem is um,"},{"from":2731.05,"to":2735.64,"location":2,"content":"the way SQuAD was constructed for ease"},{"from":2735.64,"to":2741.97,"location":2,"content":"and not to be too expensive and various other reasons was um,"},{"from":2741.97,"to":2746.24,"location":2,"content":"paragraphs of Wikipedia were selected and then,"},{"from":2746.24,"to":2748.68,"location":2,"content":"Mechanical Turkers were hired to say,"},{"from":2748.68,"to":2751.22,"location":2,"content":"\"Come up with some questions um,"},{"from":2751.22,"to":2756.21,"location":2,"content":"that can be answered by this this passage in version 1.1.\""},{"from":2756.21,"to":2759.32,"location":2,"content":"And then in version two they were said- told,"},{"from":2759.32,"to":2763.17,"location":2,"content":"\"Also come up with some questions that"},{"from":2763.17,"to":2767.39,"location":2,"content":"look like they're related to this passage but aren't actually answered in the passage.\""},{"from":2767.39,"to":2770.07,"location":2,"content":"But, in all cases people were coming up with"},{"from":2770.07,"to":2774.87,"location":2,"content":"the questions staring at the passage and if you do that,"},{"from":2774.87,"to":2778.26,"location":2,"content":"it means that your questions are strongly"},{"from":2778.26,"to":2781.91,"location":2,"content":"overlapping with the passage both in terms of the,"},{"from":2781.91,"to":2786.63,"location":2,"content":"the words that are used and even the syntactic structures that are"},{"from":2786.63,"to":2791.52,"location":2,"content":"used for your questions tending to match the syntactic structures of the passage."},{"from":2791.52,"to":2797.09,"location":2,"content":"And so that makes question answering um, naturally easy."},{"from":2797.09,"to":2799.13,"location":2,"content":"What happens in the real world,"},{"from":2799.13,"to":2802.26,"location":2,"content":"is this human beings think up questions and"},{"from":2802.26,"to":2806.01,"location":2,"content":"type something into a search engine and the way"},{"from":2806.01,"to":2809.36,"location":2,"content":"that they type it in is completely distinct"},{"from":2809.36,"to":2813.07,"location":2,"content":"from the way something might be worded on a website."},{"from":2813.07,"to":2816.6,"location":2,"content":"So that they might be saying something like,"},{"from":2816.6,"to":2822.72,"location":2,"content":"you know, \"In what year did the price of hard disks drop below a dollar a megabyte?\""},{"from":2822.72,"to":2827.22,"location":2,"content":"Um, and the webpage will say something like"},{"from":2827.22,"to":2832.05,"location":2,"content":"the cost of hard disks has being dropping for many years um,"},{"from":2832.05,"to":2838.47,"location":2,"content":"in I know whenever it was 2004 prices eventually crossed um,"},{"from":2838.47,"to":2840.87,"location":2,"content":"the dollar megabyte barrier or something like that."},{"from":2840.87,"to":2844.78,"location":2,"content":"But there's a quite different discussion of the ideas."},{"from":2844.78,"to":2848.22,"location":2,"content":"And that kinda matching is much harder and that's one of"},{"from":2848.22,"to":2852.27,"location":2,"content":"the things that people have done other datasets have tried to do differently."},{"from":2852.27,"to":2855.96,"location":2,"content":"Um, another limitation is that these questions and"},{"from":2855.96,"to":2860.36,"location":2,"content":"answers are very much, find the sentence that's addressing the fact,"},{"from":2860.36,"to":2862.55,"location":2,"content":"match your question to the sentence,"},{"from":2862.55,"to":2865.08,"location":2,"content":"return the right thing,"},{"from":2865.08,"to":2869.4,"location":2,"content":"that there's nothing sort of more difficult than involves multi sentence,"},{"from":2869.4,"to":2873.21,"location":2,"content":"combine facts together styles of inferencing,"},{"from":2873.21,"to":2877.05,"location":2,"content":"that the limits of cross sentence stuff there is pretty much limited to"},{"from":2877.05,"to":2881.3,"location":2,"content":"resolving co-reference which is something we'll talk about later in the class,"},{"from":2881.3,"to":2884.31,"location":2,"content":"that means that you see a he or she or an it,"},{"from":2884.31,"to":2889.13,"location":2,"content":"and you can work out who that refers to earlier in the, this course."},{"from":2889.13,"to":2892.59,"location":2,"content":"Um, nevertheless, despite all those disadvantages,"},{"from":2892.59,"to":2895.23,"location":2,"content":"it sort of proved that SQuAD was, you know,"},{"from":2895.23,"to":2900.18,"location":2,"content":"well-targeted in terms of its level of difficulty, well-structured,"},{"from":2900.18,"to":2902.91,"location":2,"content":"clean dataset, and it's just been"},{"from":2902.91,"to":2907.14,"location":2,"content":"sort of everybody's favorite for a question answering dataset."},{"from":2907.14,"to":2910.08,"location":2,"content":"It also seems to have proved that actually for"},{"from":2910.08,"to":2913.53,"location":2,"content":"people who work in industry and want to build a question answering system,"},{"from":2913.53,"to":2916.01,"location":2,"content":"starting off by training a model in SQuAD,"},{"from":2916.01,"to":2919.23,"location":2,"content":"actually turns out to work pretty well it turns out."},{"from":2919.23,"to":2921.42,"location":2,"content":"I mean, it's not everything you want to do."},{"from":2921.42,"to":2926.25,"location":2,"content":"You definitely wanna have relevant in domain data and be using that as well,"},{"from":2926.25,"to":2930.45,"location":2,"content":"but you know, it turns out that it seems to actually be a quite useful starting point."},{"from":2930.45,"to":2935.86,"location":2,"content":"Okay. So, what I wanted to show you now was a- is a concrete,"},{"from":2935.86,"to":2940.71,"location":2,"content":"simple, neural question answering system."},{"from":2940.71,"to":2948.3,"location":2,"content":"Um, and this is the model that was built by here and I guess she was"},{"from":2948.3,"to":2955.86,"location":2,"content":"sort of an Abby predecessor since she was the preceding head TA for CS 224N."},{"from":2955.86,"to":2958.65,"location":2,"content":"Um, so this system,"},{"from":2958.65,"to":2961.83,"location":2,"content":"um, Stanford Attentive Reader it kind of gets called now."},{"from":2961.83,"to":2964.57,"location":2,"content":"I mean, this is sort of essentially"},{"from":2964.57,"to":2969.99,"location":2,"content":"the simplest neural question answering system that works pretty well."},{"from":2969.99,"to":2972.78,"location":2,"content":"So, it's not a bad thing to have in mind as"},{"from":2972.78,"to":2976.32,"location":2,"content":"a baseline and it's not the current state of the art by any means."},{"from":2976.32,"to":2980.79,"location":2,"content":"But you know, if you're sort of wondering what's the simplest thing that I can build"},{"from":2980.79,"to":2985.22,"location":2,"content":"that basically works as a question answering system decently,"},{"from":2985.22,"to":2987.32,"location":2,"content":"this is basically it."},{"from":2987.32,"to":2990.39,"location":2,"content":"Um, okay. So how does this work?"},{"from":2990.39,"to":2992.59,"location":2,"content":"So the way it works is like this."},{"from":2992.59,"to":2993.93,"location":2,"content":"So, first of all,"},{"from":2993.93,"to":2998.2,"location":2,"content":"we have a question which team won Super Bowl 50?"},{"from":2998.2,"to":3004.18,"location":2,"content":"And what we're gonna wanna do is build a representation of a question as a vector."},{"from":3004.18,"to":3006.92,"location":2,"content":"And the way we can do that is like this,"},{"from":3006.92,"to":3009.03,"location":2,"content":"for each word in the question,"},{"from":3009.03,"to":3010.84,"location":2,"content":"we look up a word embedding."},{"from":3010.84,"to":3015.44,"location":2,"content":"So, in particular it used GloVe- GloVe 300 dimensional word embeddings."},{"from":3015.44,"to":3019.24,"location":2,"content":"Um, we then run an LSTM"},{"from":3019.24,"to":3023.33,"location":2,"content":"forward through the question and then kind of like Abby talked about,"},{"from":3023.33,"to":3025.3,"location":2,"content":"we actually make it a bi-LSTM."},{"from":3025.3,"to":3029.03,"location":2,"content":"So, we run a second LSTM backwards through the question."},{"from":3029.03,"to":3034.88,"location":2,"content":"And so then, we grab the end state of both LSTMs"},{"from":3034.88,"to":3040.76,"location":2,"content":"and we simply concatenate them together into a vector of dimension 2D if,"},{"from":3040.76,"to":3043.73,"location":2,"content":"if our hidden states of the LSTM are dimension"},{"from":3043.73,"to":3048.43,"location":2,"content":"d and we say that is the representation of the question."},{"from":3048.43,"to":3051.24,"location":2,"content":"Okay. So, once we have that,"},{"from":3051.24,"to":3054.23,"location":2,"content":"we then start looking at the passage."},{"from":3054.23,"to":3057.64,"location":2,"content":"And so, for the start of dealing with the passage,"},{"from":3057.64,"to":3059.18,"location":2,"content":"we do the same thing."},{"from":3059.18,"to":3063.11,"location":2,"content":"We, um, look up a word vector for every word in"},{"from":3063.11,"to":3067.34,"location":2,"content":"the passage and we run a bidirectional LSTM,"},{"from":3067.34,"to":3072.2,"location":2,"content":"now being represented a bit more compactly um, across the passage."},{"from":3072.2,"to":3075.71,"location":2,"content":"But then we have to do a little bit more work because we actually"},{"from":3075.71,"to":3079.04,"location":2,"content":"have to find the answer in the passage."},{"from":3079.04,"to":3081.68,"location":2,"content":"And so what we're gonna do is use"},{"from":3081.68,"to":3088.18,"location":2,"content":"the question representation to sort of work out where the answer is using attention."},{"from":3088.18,"to":3091.8,"location":2,"content":"So this is a different use of attention to machine translation."},{"from":3091.8,"to":3095.11,"location":2,"content":"That kind of attention equations are still exactly the same."},{"from":3095.11,"to":3099.17,"location":2,"content":"But we've now got this sort of one question vector that we gonna be trying to"},{"from":3099.17,"to":3103.39,"location":2,"content":"match against to return the answer."},{"from":3103.39,"to":3107.15,"location":2,"content":"So, what we do is we, um,"},{"from":3107.15,"to":3111.13,"location":2,"content":"work out an attention score between"},{"from":3111.13,"to":3117.57,"location":2,"content":"each word's bi-LSTM representation and the question."},{"from":3117.57,"to":3122.93,"location":2,"content":"And so the way that's being done is we're using this bi-linear attention,"},{"from":3122.93,"to":3127.37,"location":2,"content":"um, that um, Abby briefly discussed and we'll see more of today."},{"from":3127.37,"to":3129.14,"location":2,"content":"We've got the question vector,"},{"from":3129.14,"to":3132.53,"location":2,"content":"the vector for a particular position in the passage"},{"from":3132.53,"to":3135.77,"location":2,"content":"to the two concatenated LSTM hidden states."},{"from":3135.77,"to":3137.93,"location":2,"content":"So they're the same dimensionality."},{"from":3137.93,"to":3141.02,"location":2,"content":"We have this intervening learn W matrix."},{"from":3141.02,"to":3143.36,"location":2,"content":"So, we work out that quantity,"},{"from":3143.36,"to":3145.11,"location":2,"content":"um, for each position,"},{"from":3145.11,"to":3147.89,"location":2,"content":"and then we put that through a softmax which will give us"},{"from":3147.89,"to":3152.18,"location":2,"content":"probabilities over the different words in the passage."},{"from":3152.18,"to":3154.22,"location":2,"content":"Um, and those give us,"},{"from":3154.22,"to":3156.66,"location":2,"content":"um, our attention weights."},{"from":3156.66,"to":3159.35,"location":2,"content":"And so at that point we have attention weights,"},{"from":3159.35,"to":3162.14,"location":2,"content":"um, for different positions, um,"},{"from":3162.14,"to":3165.41,"location":2,"content":"in the passage and we just declare that,"},{"from":3165.41,"to":3167.03,"location":2,"content":"um, that is where,"},{"from":3167.03,"to":3169.61,"location":2,"content":"um, the answer starts."},{"from":3169.61,"to":3173.27,"location":2,"content":"Um, and then to get the end of the answer,"},{"from":3173.27,"to":3181.31,"location":2,"content":"we simply do exactly the same thing again apart from we train a different W matrix here,"},{"from":3181.31,"to":3182.84,"location":2,"content":"and we have that,"},{"from":3182.84,"to":3184.94,"location":2,"content":"um, predict the end token."},{"from":3184.94,"to":3187.49,"location":2,"content":"And there's something a little bit subtle here."},{"from":3187.49,"to":3190.61,"location":2,"content":"Um, because, you know, really we're asking it to sort"},{"from":3190.61,"to":3193.68,"location":2,"content":"of predict the starts and the ends of the answer,"},{"from":3193.68,"to":3195.83,"location":2,"content":"and you might think, but wait a minute."},{"from":3195.83,"to":3199.59,"location":2,"content":"Surely, we need to look at the middle of the answer as well because maybe the,"},{"from":3199.59,"to":3203.41,"location":2,"content":"the most indicative words are actually going to be in the middle of the answer."},{"from":3203.41,"to":3207.71,"location":2,"content":"Um, but, you know, really really what we're,"},{"from":3207.71,"to":3212.96,"location":2,"content":"we're sort of implicitly telling the model of well,"},{"from":3212.96,"to":3217.05,"location":2,"content":"when you're training, if there's stuff in the middle that's useful,"},{"from":3217.05,"to":3222.44,"location":2,"content":"it's the bi-LSTM's job to push it to the extremes of the span,"},{"from":3222.44,"to":3227.07,"location":2,"content":"so that this simple bi-linear attention"},{"from":3227.07,"to":3231.95,"location":2,"content":"will be able to get a big score at the start of the span."},{"from":3231.95,"to":3235.04,"location":2,"content":"And you might also think there's something"},{"from":3235.04,"to":3238.37,"location":2,"content":"funny that this equation and that equation are exactly the same."},{"from":3238.37,"to":3242.27,"location":2,"content":"So, how come one of them is meant to know it's picking up beginning, um,"},{"from":3242.27,"to":3244.4,"location":2,"content":"and the other at the end?"},{"from":3244.4,"to":3247.47,"location":2,"content":"And again, you know, we're not doing anything to impose that."},{"from":3247.47,"to":3249.89,"location":2,"content":"We're just saying, neural network."},{"from":3249.89,"to":3251.91,"location":2,"content":"It is your job to learn."},{"from":3251.91,"to":3256.11,"location":2,"content":"Um, you have to learn a matrix here and a different one over there,"},{"from":3256.11,"to":3260.24,"location":2,"content":"so that one of them will pick out parts of the representation that"},{"from":3260.24,"to":3265.18,"location":2,"content":"indicate starts of answer spans and the other one ends of answer spans."},{"from":3265.18,"to":3268.16,"location":2,"content":"And so, that will then again pressure"},{"from":3268.16,"to":3271.55,"location":2,"content":"the neural network to sort of self organize itself in"},{"from":3271.55,"to":3274.1,"location":2,"content":"such a way that there'll be some parts of"},{"from":3274.1,"to":3278.27,"location":2,"content":"this hidden representation that will be good at learning starts of spans."},{"from":3278.27,"to":3280.01,"location":2,"content":"You know, maybe there'll be carried backwards by"},{"from":3280.01,"to":3283.52,"location":2,"content":"the backwards LSTM and and some parts of it will be good at"},{"from":3283.52,"to":3285.98,"location":2,"content":"learning where the spans end and then"},{"from":3285.98,"to":3290.61,"location":2,"content":"the W matrix will be able to pick out those parts of the representation."},{"from":3290.61,"to":3294.13,"location":2,"content":"Um, but yeah, uh,"},{"from":3294.13,"to":3298.36,"location":2,"content":"that's the system. Um, yeah."},{"from":3298.36,"to":3300.64,"location":2,"content":"So, um, so this is"},{"from":3300.64,"to":3305.98,"location":2,"content":"the basic Stanford Attentive Reader model and it's just no more complex than that."},{"from":3305.98,"to":3308.77,"location":2,"content":"Um, and the interesting thing is, you know,"},{"from":3308.77,"to":3314.24,"location":2,"content":"that very simple model actually works nicely well."},{"from":3314.24,"to":3316.36,"location":2,"content":"Um, so this is going back in time."},{"from":3316.36,"to":3323.23,"location":2,"content":"Again, this was the February 2017 SQuAD version 1 leaderboard."},{"from":3323.23,"to":3328.69,"location":2,"content":"Um, but at that time, that provide- like,"},{"from":3328.69,"to":3332.68,"location":2,"content":"it always in neural networks quite a bit of your success"},{"from":3332.68,"to":3339.28,"location":2,"content":"is training your hyperparameters and optimizing your model really well."},{"from":3339.28,"to":3341.26,"location":2,"content":"And some time, you know,"},{"from":3341.26,"to":3347.02,"location":2,"content":"it's been repeatedly proven in neural network land that often you can get"},{"from":3347.02,"to":3350.17,"location":2,"content":"much better scores than you would think from"},{"from":3350.17,"to":3353.84,"location":2,"content":"very simple models if you optimize them really well."},{"from":3353.84,"to":3357.28,"location":2,"content":"So there have been multiple cycles in sort of"},{"from":3357.28,"to":3359.83,"location":2,"content":"deep learning research where there"},{"from":3359.83,"to":3362.95,"location":2,"content":"was a paper that did something and then the next person says,"},{"from":3362.95,"to":3364.96,"location":2,"content":"\"Here's a more- more- more complex model that"},{"from":3364.96,"to":3367.54,"location":2,"content":"works better,\" and then someone else published a paper saying,"},{"from":3367.54,"to":3369.64,"location":2,"content":"\"Here's an even more complex than that model that works"},{"from":3369.64,"to":3372.49,"location":2,"content":"better,\" and then someone points out, \"No."},{"from":3372.49,"to":3377.14,"location":2,"content":"If you go back to the first model and just really train its hyperparameters well,"},{"from":3377.14,"to":3379.38,"location":2,"content":"you can beat both of those two models.\""},{"from":3379.38,"to":3381.88,"location":2,"content":"And that was effectively the case about what"},{"from":3381.88,"to":3384.61,"location":2,"content":"was happening with the Stanford Attentive Reader."},{"from":3384.61,"to":3389.24,"location":2,"content":"That, you know, back in- back in February 2017,"},{"from":3389.24,"to":3392.92,"location":2,"content":"if you just train this model really well,"},{"from":3392.92,"to":3397.99,"location":2,"content":"it could actually outperform most of the early SQuAD systems."},{"from":3397.99,"to":3399.24,"location":2,"content":"I mean, in particular,"},{"from":3399.24,"to":3401.88,"location":2,"content":"it could outperform, um, the BiDAF,"},{"from":3401.88,"to":3406.39,"location":2,"content":"the version of BiDAF that was around in early 2017 and,"},{"from":3406.39,"to":3409.32,"location":2,"content":"you know, various of these other systems from other people."},{"from":3409.32,"to":3411.34,"location":2,"content":"But it was actually, at that time,"},{"from":3411.34,"to":3415.41,"location":2,"content":"it was pretty close to the best system that anyone had built."},{"from":3415.41,"to":3417.97,"location":2,"content":"Um, as I've already pointed out to you,"},{"from":3417.97,"to":3420.28,"location":2,"content":"um, the numbers have gone up a lot since then."},{"from":3420.28,"to":3422.5,"location":2,"content":"So I'm not claiming that, um,"},{"from":3422.5,"to":3428.78,"location":2,"content":"this system is still as good as the best systems that you can build. But there you go."},{"from":3428.78,"to":3433,"location":2,"content":"Um, so that's the simple system that already works pretty well,"},{"from":3433,"to":3435.07,"location":2,"content":"but of course you want this system to work better."},{"from":3435.07,"to":3439.69,"location":2,"content":"Um, and so Danqi did quite a bit of work on that."},{"from":3439.69,"to":3443.3,"location":2,"content":"And so here I'll just mention a few things for, um,"},{"from":3443.3,"to":3446.13,"location":2,"content":"Stanford Attentive Reader++ as to"},{"from":3446.13,"to":3449.64,"location":2,"content":"what kind of things can you do to make the model better."},{"from":3449.64,"to":3454.7,"location":2,"content":"And so here's a sort of a picture of, um,"},{"from":3454.7,"to":3457.96,"location":2,"content":"the sort of the improved system and we'll go through"},{"from":3457.96,"to":3461.29,"location":2,"content":"some of the differences and what makes it better."},{"from":3461.29,"to":3465.19,"location":2,"content":"Um, there's something I didn't have before that I should just mention, right?"},{"from":3465.19,"to":3470.22,"location":2,"content":"Sort of this whole model, all the parameters of this model are just trained end to end,"},{"from":3470.22,"to":3473.98,"location":2,"content":"where your training objective is simply, um,"},{"from":3473.98,"to":3476.38,"location":2,"content":"working out how accurately you're predicting"},{"from":3476.38,"to":3479.05,"location":2,"content":"the start position and how accurately you're predicting"},{"from":3479.05,"to":3482.68,"location":2,"content":"the end position so that the attention gives"},{"from":3482.68,"to":3486.51,"location":2,"content":"you a probability distribution over start positions and end positions."},{"from":3486.51,"to":3489.82,"location":2,"content":"So you're just being asked what probability estimate"},{"from":3489.82,"to":3493.33,"location":2,"content":"are you giving to the true start position and the true end position."},{"from":3493.33,"to":3495.25,"location":2,"content":"And to the extent that though,"},{"from":3495.25,"to":3497.29,"location":2,"content":"you know, those aren't one,"},{"from":3497.29,"to":3502.38,"location":2,"content":"you've then got loss that is then being sort of summed in terms of log probability."},{"from":3502.38,"to":3505.57,"location":2,"content":"Okay. So how is this model, um,"},{"from":3505.57,"to":3508.86,"location":2,"content":"more complex now than what I showed before?"},{"from":3508.86,"to":3511.95,"location":2,"content":"Essentially in two main ways."},{"from":3511.95,"to":3516.37,"location":2,"content":"So the first one is looking at the question,"},{"from":3516.37,"to":3520.07,"location":2,"content":"we still run the BiLSTM as before."},{"from":3520.07,"to":3524.53,"location":2,"content":"Um, but now what we're going to do is it's a little bit crude"},{"from":3524.53,"to":3528.85,"location":2,"content":"just to take the end states of the LSTM and concatenate them together."},{"from":3528.85,"to":3534.28,"location":2,"content":"It turns out that you can do better by making use of all states in an LSTM."},{"from":3534.28,"to":3537.88,"location":2,"content":"And this is true for most tasks where you"},{"from":3537.88,"to":3541.97,"location":2,"content":"want some kind of sentence representation from a sequence model."},{"from":3541.97,"to":3544.59,"location":2,"content":"It turns out you can generally gain by using"},{"from":3544.59,"to":3547.51,"location":2,"content":"all of them rather than just the endpoints or that."},{"from":3547.51,"to":3552.68,"location":2,"content":"Um, so but this is just an interesting general thing to know again because, you know,"},{"from":3552.68,"to":3558.41,"location":2,"content":"this is actually another variant of how that- how you can use attention."},{"from":3558.41,"to":3565.53,"location":2,"content":"There are, you know, a lot of sort of the last two years of neural NLP can be summed"},{"from":3565.53,"to":3569.23,"location":2,"content":"up as people have found a lot of clever ways to use"},{"from":3569.23,"to":3573.22,"location":2,"content":"attention and that's been pairing just about all the advances."},{"from":3573.22,"to":3581.89,"location":2,"content":"Um, so what we wanna do is we want to have attention over the positions in this LSTM."},{"from":3581.89,"to":3586.26,"location":2,"content":"But, you know, this- we're processing the query first."},{"from":3586.26,"to":3591.36,"location":2,"content":"So it sort of seems like we've got nothing to calculate attention with respect to."},{"from":3591.36,"to":3595.15,"location":2,"content":"So what we do is we just invent something."},{"from":3595.15,"to":3596.86,"location":2,"content":"So we just sort of invent."},{"from":3596.86,"to":3601.66,"location":2,"content":"Here is a vector and it's sometimes called a sentinel or some word like that,"},{"from":3601.66,"to":3603.85,"location":2,"content":"but, you know, we just in our PyTorch say,"},{"from":3603.85,"to":3605.18,"location":2,"content":"\"Here is a vector."},{"from":3605.18,"to":3607.69,"location":2,"content":"Um, we're going to calculate, um,"},{"from":3607.69,"to":3609.46,"location":2,"content":"we initialize it randomly,"},{"from":3609.46,"to":3613.49,"location":2,"content":"and we're gonna calculate attention with respect to that vector,"},{"from":3613.49,"to":3620.95,"location":2,"content":"and we're going to use those attention scores, um, to, um,"},{"from":3620.95,"to":3624.25,"location":2,"content":"work out where to pay attention, um,"},{"from":3624.25,"to":3630.63,"location":2,"content":"in this BiLSTM, and then we just sort of train that vector so it gets values."},{"from":3630.63,"to":3634.27,"location":2,"content":"And so then we end up with a weighted sum of the time"},{"from":3634.27,"to":3639.43,"location":2,"content":"steps of that LSTM that uh, then form the question representation."},{"from":3639.43,"to":3642.37,"location":2,"content":"Um, second change, uh,"},{"from":3642.37,"to":3645.4,"location":2,"content":"the pictures only show a shallow BiLSTM but, you know,"},{"from":3645.4,"to":3648.94,"location":2,"content":"it turns out you can do better if you have a deep BiLSTM and say"},{"from":3648.94,"to":3653.01,"location":2,"content":"use a three-layer deep BiLSTM rather than a single layer."},{"from":3653.01,"to":3656.2,"location":2,"content":"Okay. Then the other changes in"},{"from":3656.2,"to":3662.35,"location":2,"content":"the passage representations and this part arguably gets a little bit more hacky,"},{"from":3662.35,"to":3666.52,"location":2,"content":"um, but there are things that you can do that make the numbers go up, I guess."},{"from":3666.52,"to":3667.81,"location":2,"content":"Um, okay."},{"from":3667.81,"to":3673.84,"location":2,"content":"So- so firstly for the representation of words rather than only using"},{"from":3673.84,"to":3678.07,"location":2,"content":"the GloVe representation that the input vectors are"},{"from":3678.07,"to":3684.05,"location":2,"content":"expanded so that- so a named entity recognizer and a part of speech tagger is run."},{"from":3684.05,"to":3688.61,"location":2,"content":"And since those are sort of small sets of values,"},{"from":3688.61,"to":3693.91,"location":2,"content":"that the output of those is just one-hot encoded and concatenated onto"},{"from":3693.91,"to":3696.49,"location":2,"content":"the word vector, so it represents if it's"},{"from":3696.49,"to":3700.2,"location":2,"content":"a location or a person name and whether it's a noun or a verb."},{"from":3700.2,"to":3704.08,"location":2,"content":"Um, word frequency proves to be a bit useful."},{"from":3704.08,"to":3712.16,"location":2,"content":"So there's your concatenating on sort of a representation of the word frequency as,"},{"from":3712.16,"to":3717.37,"location":2,"content":"um, just sort of a float of the unigram probability."},{"from":3717.37,"to":3725.34,"location":2,"content":"Um, and then this part is kind of key to getting some further advances which is, well,"},{"from":3725.34,"to":3731.14,"location":2,"content":"it turns out that we can do a better job by doing some sort"},{"from":3731.14,"to":3736.95,"location":2,"content":"of better understanding of the matching between the question and the passage."},{"from":3736.95,"to":3740.17,"location":2,"content":"And, um, this feature seems like it's"},{"from":3740.17,"to":3743.82,"location":2,"content":"very simple but turns out to actually give you quite a lot of value."},{"from":3743.82,"to":3748.42,"location":2,"content":"So you're simply saying for each word in the question,"},{"from":3748.42,"to":3752.22,"location":2,"content":"uh, so for each word- well, I said that wrong."},{"from":3752.22,"to":3755.92,"location":2,"content":"For each word in the passage,"},{"from":3755.92,"to":3759.04,"location":2,"content":"you were just saying, \"Does this word appear in the question?\""},{"from":3759.04,"to":3762.16,"location":2,"content":"And if so you're setting a one bit into"},{"from":3762.16,"to":3766.11,"location":2,"content":"the input and that's done in three different ways: exact match,"},{"from":3766.11,"to":3768.58,"location":2,"content":"uncased match, and lemma match."},{"from":3768.58,"to":3771.66,"location":2,"content":"So that means something like drive and driving, um,"},{"from":3771.66,"to":3773.59,"location":2,"content":"will match, and just that sort of"},{"from":3773.59,"to":3776.76,"location":2,"content":"indicator of here's where in the passage that's in the question."},{"from":3776.76,"to":3779.23,"location":2,"content":"In theory, the system should be able to work that out"},{"from":3779.23,"to":3783.11,"location":2,"content":"anyway that explicitly indicate and it gives quite a bit of value."},{"from":3783.11,"to":3789.31,"location":2,"content":"And then this last one does a sort of a softer version of that where it's using word"},{"from":3789.31,"to":3792.55,"location":2,"content":"embedding similarities to sort of calculate"},{"from":3792.55,"to":3796.21,"location":2,"content":"a kind of similarity between questions and answers,"},{"from":3796.21,"to":3799.34,"location":2,"content":"and that's a slightly complex equation that you can look up."},{"from":3799.34,"to":3806.03,"location":2,"content":"But effectively, um, that you're getting the embedding of words and the question answers."},{"from":3806.03,"to":3810.09,"location":2,"content":"Each of those, you're running through a single hidden layer,"},{"from":3810.09,"to":3811.59,"location":2,"content":"neural network, you know,"},{"from":3811.59,"to":3815.24,"location":2,"content":"dot producting it, and then putting all that through a Softmax,"},{"from":3815.24,"to":3821.04,"location":2,"content":"and that kind of gives you a sort of word similarity score and that helps as well."},{"from":3821.04,"to":3826.51,"location":2,"content":"Okay. So here's the kind of just overall picture this gives you."},{"from":3826.51,"to":3829.43,"location":2,"content":"So if you remember, um, um,"},{"from":3829.43,"to":3832.54,"location":2,"content":"there was the sort of the classical NLP"},{"from":3832.54,"to":3835.82,"location":2,"content":"with logistic regression baseline, there's around 51."},{"from":3835.82,"to":3838.81,"location":2,"content":"So for sort of a fairly simple model,"},{"from":3838.81,"to":3840.97,"location":2,"content":"like the Stanford Attentive Reader,"},{"from":3840.97,"to":3843.76,"location":2,"content":"it gives you an enormous boost in performance, right?"},{"from":3843.76,"to":3847.76,"location":2,"content":"That's giving you close to 30 percent performance gain."},{"from":3847.76,"to":3850.18,"location":2,"content":"And then, you know, from there,"},{"from":3850.18,"to":3853.42,"location":2,"content":"people have kept on pushing up neural systems."},{"from":3853.42,"to":3857.41,"location":2,"content":"But, you know, so this gives you kind of in some sense three quarters of"},{"from":3857.41,"to":3862.53,"location":2,"content":"the value over the traditional NLP system and in the much more,"},{"from":3862.53,"to":3866.08,"location":2,"content":"um, complex, um, neural systems that come after it."},{"from":3866.08,"to":3867.14,"location":2,"content":"Um, yeah."},{"from":3867.14,"to":3868.55,"location":2,"content":"In terms of error reduction,"},{"from":3868.55,"to":3871.78,"location":2,"content":"they're huge but it's sort of more like they're giving you the sort of,"},{"from":3871.78,"to":3875.31,"location":2,"content":"um, 12 percent after that."},{"from":3875.31,"to":3883.03,"location":2,"content":"Why did these systems work such a ton better um, than traditional systems?"},{"from":3883.03,"to":3886.75,"location":2,"content":"And so we actually did some error analysis of this and, you know,"},{"from":3886.75,"to":3892.18,"location":2,"content":"it turns out that most of their gains is because they can just do"},{"from":3892.18,"to":3896.89,"location":2,"content":"better semantic matching of word similarities"},{"from":3896.89,"to":3902.08,"location":2,"content":"or rephrasings that are semantically related but don't use the same words."},{"from":3902.08,"to":3910.68,"location":2,"content":"So, to- to the extent that the question is where was Christopher Manning born?"},{"from":3910.68,"to":3915.59,"location":2,"content":"And the sentence says Christopher Manning was born in Australia,"},{"from":3915.59,"to":3918.79,"location":2,"content":"a traditional NLP system would get that right too."},{"from":3918.79,"to":3921.57,"location":2,"content":"But that to the extent that you being able to get it right,"},{"from":3921.57,"to":3923.98,"location":2,"content":"depends on being able to match,"},{"from":3923.98,"to":3929.57,"location":2,"content":"sort of looser semantic matches so that we understand the sort of um,"},{"from":3929.57,"to":3933.61,"location":2,"content":"you know, the place of birth has to be matching was born or something."},{"from":3933.61,"to":3937.75,"location":2,"content":"That's where the neural systems actually do work much much better."},{"from":3937.75,"to":3944.95,"location":2,"content":"Okay. So, that's not the end of the story on question-answering systems."},{"from":3944.95,"to":3948.4,"location":2,"content":"And I wanted to say just a little bit about um,"},{"from":3948.4,"to":3951.67,"location":2,"content":"more complex systems to give you some idea um,"},{"from":3951.67,"to":3953.72,"location":2,"content":"of what goes on after that."},{"from":3953.72,"to":3956.26,"location":2,"content":"Um, but before I go further into that,"},{"from":3956.26,"to":3959.98,"location":2,"content":"are there any questions on uh,"},{"from":3959.98,"to":3963.13,"location":2,"content":"up until now, Stanford Attentive Reader?"},{"from":3963.13,"to":3969.76,"location":2,"content":"[NOISE] Yeah."},{"from":3969.76,"to":3972.93,"location":2,"content":"I have a question about attention in general."},{"from":3972.93,"to":3978.55,"location":2,"content":"Every example we've seen has been just linear mapping with a weight matrix."},{"from":3978.55,"to":3983.7,"location":2,"content":"Has anybody tried to convert that to a deep neural network and see what happens?"},{"from":3983.7,"to":3986.34,"location":2,"content":"Um, so yes they have."},{"from":3986.34,"to":3990.04,"location":2,"content":"Well, at least a shallow neural network."},{"from":3990.04,"to":3993.01,"location":2,"content":"Um, I'll actually show an example of that in just a minute."},{"from":3993.01,"to":3995.8,"location":2,"content":"So maybe I will um, save it till then."},{"from":3995.8,"to":3998.3,"location":2,"content":"But yeah absolutely, um,"},{"from":3998.3,"to":4005.03,"location":2,"content":"yeah people have done that and that can be a good thing to um, play with."},{"from":4005.03,"to":4012.06,"location":2,"content":"Anything else? Okay. Um, okay."},{"from":4012.06,"to":4017.97,"location":2,"content":"So, this is a picture of the BiDAF system,"},{"from":4017.97,"to":4020.73,"location":2,"content":"so this is the one from AI2 UDub."},{"from":4020.73,"to":4023.49,"location":2,"content":"And the BiDAF system is very well known."},{"from":4023.49,"to":4026.88,"location":2,"content":"Um, it's another sort of classic version of"},{"from":4026.88,"to":4031.14,"location":2,"content":"question-answering system that lots of people have used and built off."},{"from":4031.14,"to":4034.26,"location":2,"content":"Um, and, you know,"},{"from":4034.26,"to":4040.26,"location":2,"content":"some of it isn't completely different to what we saw before but it has various additions."},{"from":4040.26,"to":4043.98,"location":2,"content":"So, there are word embeddings just like we had before,"},{"from":4043.98,"to":4048.22,"location":2,"content":"there's a biLSTM running just like what we had before,"},{"from":4048.22,"to":4051.43,"location":2,"content":"and that's being done for both the um,"},{"from":4051.43,"to":4053.86,"location":2,"content":"passage and the question."},{"from":4053.86,"to":4057.21,"location":2,"content":"Um, but there are some different things that are happening as well."},{"from":4057.21,"to":4060.51,"location":2,"content":"So one of them is rather than just having word embeddings,"},{"from":4060.51,"to":4065.09,"location":2,"content":"it also processes the questions and passages at the character level."},{"from":4065.09,"to":4068.73,"location":2,"content":"And that's something that we're going to talk about coming up ahead in the class."},{"from":4068.73,"to":4074.2,"location":2,"content":"There's been a lot of work at doing character level processing in recent neural NLP,"},{"from":4074.2,"to":4076.36,"location":2,"content":"but I don't want to talk about that now."},{"from":4076.36,"to":4080.46,"location":2,"content":"Um, the main technical innovation of the BiDAF model"},{"from":4080.46,"to":4086.18,"location":2,"content":"is this attention flow layout because that's in its name bidirectional attention flow."},{"from":4086.18,"to":4090.3,"location":2,"content":"And so, there was a model of attention flow where you have attention"},{"from":4090.3,"to":4094.74,"location":2,"content":"flowing in both directions between the query and the passage."},{"from":4094.74,"to":4098.98,"location":2,"content":"And that was their main innovation and it was quite useful in their model."},{"from":4098.98,"to":4100.57,"location":2,"content":"Um, but beyond that,"},{"from":4100.57,"to":4103.5,"location":2,"content":"there's you know, sort of more stuff to this model."},{"from":4103.5,"to":4107.32,"location":2,"content":"So after the attention flow layer there's again"},{"from":4107.32,"to":4111.68,"location":2,"content":"multiple layers of bidirectional LSTMs running."},{"from":4111.68,"to":4115.77,"location":2,"content":"And then on top of that their output layer is more"},{"from":4115.77,"to":4121.23,"location":2,"content":"complex than the sort of simple attention version that I showed previously."},{"from":4121.23,"to":4125.15,"location":2,"content":"So let's just look at that in a bit more detail."},{"from":4125.15,"to":4127.94,"location":2,"content":"Um so, for the attention flow layer."},{"from":4127.94,"to":4133.9,"location":2,"content":"So, the motivation here was in the Stanford Attentive Reader,"},{"from":4133.9,"to":4137.46,"location":2,"content":"we used attention to map from"},{"from":4137.46,"to":4143.18,"location":2,"content":"the representation of the question onto the words of the passage."},{"from":4143.18,"to":4149.32,"location":2,"content":"But, you know so as questions are whole mapping onto the words of the passage."},{"from":4149.32,"to":4151.95,"location":2,"content":"Where their idea was well,"},{"from":4151.95,"to":4158.76,"location":2,"content":"presumably you could do better by mapping in both directions at the word level."},{"from":4158.76,"to":4163.89,"location":2,"content":"So you should be sort of finding passage words that you can map onto question words,"},{"from":4163.89,"to":4166.6,"location":2,"content":"and question words that you can map onto passage words."},{"from":4166.6,"to":4169.97,"location":2,"content":"And if you do that in both directions with attention flowing,"},{"from":4169.97,"to":4174.31,"location":2,"content":"and then run another round of sequence models on top of that,"},{"from":4174.31,"to":4178.53,"location":2,"content":"that you'll just be able to do much better matching between the two of them."},{"from":4178.53,"to":4182.94,"location":2,"content":"And so the way they do that is, um,"},{"from":4182.94,"to":4186.6,"location":2,"content":"that they- they've got the bottom- so at"},{"from":4186.6,"to":4190.8,"location":2,"content":"the bottom layers they've sort of run these two LSTMs."},{"from":4190.8,"to":4197.48,"location":2,"content":"So they have representations in the LSTM for each word and um,"},{"from":4197.48,"to":4200.48,"location":2,"content":"word and passage position."},{"from":4200.48,"to":4204.44,"location":2,"content":"And at this point I have to put it in a slight apology because I just"},{"from":4204.44,"to":4208.76,"location":2,"content":"stole the equations and so the letters that are used change."},{"from":4208.76,"to":4212.85,"location":2,"content":"Sorry. But, so these are the um,"},{"from":4212.85,"to":4218.51,"location":2,"content":"question individual words and these are the passage individual words."},{"from":4218.51,"to":4223.48,"location":2,"content":"And so, what they're then wanting to do is to say for each passage word,"},{"from":4223.48,"to":4228.1,"location":2,"content":"and each question word, I want to work out a similarity score."},{"from":4228.1,"to":4234.57,"location":2,"content":"And the way they work out that similarity score is they build a big concatenated vector."},{"from":4234.57,"to":4240.36,"location":2,"content":"So there's the LSTM representation of the passage word, the question word,"},{"from":4240.36,"to":4245.07,"location":2,"content":"and then they throw in a third thing where they do a Hadamard product,"},{"from":4245.07,"to":4249.85,"location":2,"content":"so an element-wise product of the question word and the context word."},{"from":4249.85,"to":4253.59,"location":2,"content":"Um, you know, for a neural net purist, throwing in"},{"from":4253.59,"to":4257.58,"location":2,"content":"these kind of Hadamard products is a little bit of a cheat because"},{"from":4257.58,"to":4261.18,"location":2,"content":"you kind of would hope that a neural net might just learn that"},{"from":4261.18,"to":4265.64,"location":2,"content":"this relation between the passage and the question was useful to look at."},{"from":4265.64,"to":4268.38,"location":2,"content":"But you can find a lot of models that put in"},{"from":4268.38,"to":4271.92,"location":2,"content":"these kind of Hadamard product because it's sort of"},{"from":4271.92,"to":4278.41,"location":2,"content":"a very easy way of sort of having a model that knows that matching is a good idea."},{"from":4278.41,"to":4284.79,"location":2,"content":"Because essentially this is sort of looking for each question and passage word pair."},{"from":4284.79,"to":4288.81,"location":2,"content":"You know, do the vectors look similar in various dimensions?"},{"from":4288.81,"to":4292.97,"location":2,"content":"You can sort of access very well from looking at that Hadamard product."},{"from":4292.97,"to":4295.81,"location":2,"content":"So that- so you take that big vector,"},{"from":4295.81,"to":4300.77,"location":2,"content":"and you then dot-product that with a learned weight matrix,"},{"from":4300.77,"to":4303.39,"location":2,"content":"and that gives you a similarity score"},{"from":4303.39,"to":4307.05,"location":2,"content":"between each position in the question and the context."},{"from":4307.05,"to":4310.4,"location":2,"content":"And so then what you're gonna do is use that to"},{"from":4310.4,"to":4315.32,"location":2,"content":"define attentions that go in both directions. Um-"},{"from":4315.32,"to":4318.99,"location":2,"content":"So for the, um, context,"},{"from":4318.99,"to":4322.41,"location":2,"content":"the question attention, this one's completely straightforward."},{"from":4322.41,"to":4328.55,"location":2,"content":"So, you put these similarity scores through a soft-max."},{"from":4328.55,"to":4333.52,"location":2,"content":"So for each of the i positions in the passage or sort of,"},{"from":4333.52,"to":4337.3,"location":2,"content":"having a softmax which is giving you a probability distribution,"},{"from":4337.3,"to":4340.38,"location":2,"content":"over question words and then you're coming up with"},{"from":4340.38,"to":4346.75,"location":2,"content":"a new representation of the i-th position which is then the attention weighted,"},{"from":4346.75,"to":4351.35,"location":2,"content":"um, version, the attention weighted average of those question words."},{"from":4351.35,"to":4352.76,"location":2,"content":"Um, so you're sort of,"},{"from":4352.76,"to":4358.77,"location":2,"content":"having attention weighted view of the question mapped onto each position in the passage."},{"from":4358.77,"to":4363.86,"location":2,"content":"Um, you then want to do something in the reverse direction."},{"from":4363.86,"to":4369.81,"location":2,"content":"Um, but the one in the reverse direction is done subtly differently."},{"from":4369.81,"to":4373.32,"location":2,"content":"So you're again starting off, um,"},{"from":4373.32,"to":4380.69,"location":2,"content":"with the- the same similarity scores but this time they're sort of wanting to, sort of,"},{"from":4380.69,"to":4384.88,"location":2,"content":"really assign which position,"},{"from":4384.88,"to":4392.12,"location":2,"content":"in which position in the question is the one that's, sort of,"},{"from":4392.12,"to":4396.98,"location":2,"content":"aligning the most so that they're finding a max and so that they're finding"},{"from":4396.98,"to":4402.55,"location":2,"content":"which is the most aligned one and so then for each of,"},{"from":4402.55,"to":4404.93,"location":2,"content":"for each of the i's,"},{"from":4404.93,"to":4407.89,"location":2,"content":"they're finding the most aligned question word."},{"from":4407.89,"to":4413.67,"location":2,"content":"And so then they're doing a softmax over these m scores and then those are being"},{"from":4413.67,"to":4419.9,"location":2,"content":"used to form a new representation of the passage by,"},{"from":4419.9,"to":4423.11,"location":2,"content":"sort of, summing over these attention weights."},{"from":4423.11,"to":4427.31,"location":2,"content":"Okay. So you build these things up and this then"},{"from":4427.31,"to":4431.33,"location":2,"content":"gives you a new representation where you have,"},{"from":4431.33,"to":4437.09,"location":2,"content":"um, your original representations of the passage words."},{"from":4437.09,"to":4440.12,"location":2,"content":"You'd have a new representation that you've built from"},{"from":4440.12,"to":4442.59,"location":2,"content":"this bidirectional attention flow and you"},{"from":4442.59,"to":4445.31,"location":2,"content":"look at these sort of Hadamard products of them and"},{"from":4445.31,"to":4450.11,"location":2,"content":"that then gives you kind of the output of the BiDAF layer and that output of"},{"from":4450.11,"to":4452.99,"location":2,"content":"the BiDAF layer is then what's sort of being fed as"},{"from":4452.99,"to":4458.35,"location":2,"content":"the input into these nick- next sequence of LSTM layers."},{"from":4458.35,"to":4462.24,"location":2,"content":"Okay. Um, and so yeah,"},{"from":4462.24,"to":4464.34,"location":2,"content":"um, so then that's the modeling layer."},{"from":4464.34,"to":4469.09,"location":2,"content":"You have another two BiLSTM layers and so the way they do the,"},{"from":4469.09,"to":4472.4,"location":2,"content":"um, suspense selection is a bit more complex as well."},{"from":4472.4,"to":4475.62,"location":2,"content":"Um, so that they're then, um,"},{"from":4475.62,"to":4480.02,"location":2,"content":"sort of taking the output of the modeling layer and putting it through a sort of"},{"from":4480.02,"to":4485.91,"location":2,"content":"a dense feed-forward neural network layer and then softmaxing over that,"},{"from":4485.91,"to":4489.02,"location":2,"content":"um, and that's then getting a distribution of"},{"from":4489.02,"to":4493.43,"location":2,"content":"a start and you're running yet another LSTM kind of a distribution finish."},{"from":4493.43,"to":4498.02,"location":2,"content":"Um, yeah. So, that gives you some idea of a more complex model."},{"from":4498.02,"to":4501.73,"location":2,"content":"Um, you know, in some sense,"},{"from":4501.73,"to":4505.9,"location":2,"content":"um, the summary if you go further forward than here is that, sort of,"},{"from":4505.9,"to":4508.84,"location":2,"content":"most of the work in the last couple of years,"},{"from":4508.84,"to":4514.22,"location":2,"content":"people have been producing progressively more complex architectures with"},{"from":4514.22,"to":4519.71,"location":2,"content":"lots of variants of attention and effectively that has been giving good gains."},{"from":4519.71,"to":4523.01,"location":2,"content":"Um, I think I'll skip since time is running,"},{"from":4523.01,"to":4525.23,"location":2,"content":"out, showing you that one."},{"from":4525.23,"to":4528.98,"location":2,"content":"But, um, let me just mention this FusionNet model"},{"from":4528.98,"to":4532.5,"location":2,"content":"which was done by people at Microsoft because this relates to the answer,"},{"from":4532.5,"to":4535.15,"location":2,"content":"the attention question, right?"},{"from":4535.15,"to":4540.74,"location":2,"content":"So p- so people have definitely used different versions of attention, right?"},{"from":4540.74,"to":4544.88,"location":2,"content":"So that in some of the stuff that we've shown we tend to emphasize"},{"from":4544.88,"to":4549.34,"location":2,"content":"this bi-linear attention where you've got two vectors mediated by a matrix."},{"from":4549.34,"to":4551.82,"location":2,"content":"And I guess traditionally at Stanford NLP,"},{"from":4551.82,"to":4553.46,"location":2,"content":"we've liked this, um,"},{"from":4553.46,"to":4556.46,"location":2,"content":"version of attention since it seems to very directly learn"},{"from":4556.46,"to":4560.69,"location":2,"content":"a similarity but other people have used a little neural net."},{"from":4560.69,"to":4563,"location":2,"content":"So this is, sort of, a shallow neural net to"},{"from":4563,"to":4565.34,"location":2,"content":"work out attention scores and there's, sort of,"},{"from":4565.34,"to":4567.74,"location":2,"content":"no reason why you couldn't say, maybe it would be even better if I"},{"from":4567.74,"to":4570.71,"location":2,"content":"make that a deep neural net and add another layer."},{"from":4570.71,"to":4572.47,"location":2,"content":"Um, and some of, you know,"},{"from":4572.47,"to":4574.92,"location":2,"content":"to be perfectly honest, um,"},{"from":4574.92,"to":4578.43,"location":2,"content":"some of the results that have been done by people including Google"},{"from":4578.43,"to":4582.52,"location":2,"content":"argue that actually that NLP version of attention is better."},{"from":4582.52,"to":4585.7,"location":2,"content":"Um, so there's something to explore in that direction."},{"from":4585.7,"to":4591.64,"location":2,"content":"But actually, um, the people in FusionNet didn't head that direction because they said,"},{"from":4591.64,"to":4594.71,"location":2,"content":"\"Look, we want to use tons and tons of attention."},{"from":4594.71,"to":4597.74,"location":2,"content":"So we want an attention computation that's pretty"},{"from":4597.74,"to":4601.16,"location":2,"content":"efficient and so it's bad news if you have to"},{"from":4601.16,"to":4604.11,"location":2,"content":"be evaluating a little dense neural net at"},{"from":4604.11,"to":4607.88,"location":2,"content":"every position every time that you do attention.\""},{"from":4607.88,"to":4611.63,"location":2,"content":"So this bi-linear form is fairly appealing"},{"from":4611.63,"to":4615.66,"location":2,"content":"but they then did some playing with it so rather than having a W matrix"},{"from":4615.66,"to":4619.7,"location":2,"content":"you can reduce the rank and complexity of"},{"from":4619.7,"to":4626.14,"location":2,"content":"your W matrix by dividing it into the product of two lower rank matrices."},{"from":4626.14,"to":4628.98,"location":2,"content":"So you can have a U and a V matrix."},{"from":4628.98,"to":4632.69,"location":2,"content":"And if you make these rectangular matrices that are kind of skinny,"},{"from":4632.69,"to":4636.45,"location":2,"content":"you can then have a sort of a lower rank factorization and,"},{"from":4636.45,"to":4638.42,"location":2,"content":"that seems a good idea."},{"from":4638.42,"to":4639.68,"location":2,"content":"And then they thought well,"},{"from":4639.68,"to":4643.27,"location":2,"content":"maybe really you want your attention distribution to be symmetric."},{"from":4643.27,"to":4646.46,"location":2,"content":"So we can actually put in the middle here,"},{"from":4646.46,"to":4649.1,"location":2,"content":"we can have the U and the V, so to speak,"},{"from":4649.1,"to":4652.16,"location":2,"content":"be the same and just have a diagonal matrix in"},{"from":4652.16,"to":4655.56,"location":2,"content":"the middle and that might be a useful way to think of it."},{"from":4655.56,"to":4659.56,"location":2,"content":"And that all makes sense from linear algebra terms but then they thought,"},{"from":4659.56,"to":4663.06,"location":2,"content":"\"Oh, non-linearity is really good in deep learning."},{"from":4663.06,"to":4664.64,"location":2,"content":"So why don't we, sort of,"},{"from":4664.64,"to":4668.79,"location":2,"content":"stick the left and right half through a ReLU and maybe that will help."},{"from":4668.79,"to":4672.38,"location":2,"content":"[LAUGHTER] Which doesn't so much make sense in your linear algebra terms, um,"},{"from":4672.38,"to":4676.85,"location":2,"content":"but that's actually what they ended up using as their, um, attention forms."},{"from":4676.85,"to":4680.15,"location":2,"content":"There are lots of things you can play with when doing your final project."},{"from":4680.15,"to":4682.09,"location":2,"content":"Um, yeah."},{"from":4682.09,"to":4684.74,"location":2,"content":"And, but, you know, their argument is still, you know,"},{"from":4684.74,"to":4687.92,"location":2,"content":"that doing attention this way is actually much much"},{"from":4687.92,"to":4691.07,"location":2,"content":"cheaper and so they can use a lot of attention."},{"from":4691.07,"to":4696.64,"location":2,"content":"And so they build this very complex tons of attention model, um,"},{"from":4696.64,"to":4699.15,"location":2,"content":"which I'm not going to try and explain, um,"},{"from":4699.15,"to":4701.56,"location":2,"content":"all of now, um,"},{"from":4701.56,"to":4704.75,"location":2,"content":"but I will show you this picture."},{"from":4704.75,"to":4708.3,"location":2,"content":"Um, so a point that they make is that a lot of"},{"from":4708.3,"to":4712.34,"location":2,"content":"the different models that people have explored in different years you,"},{"from":4712.34,"to":4713.91,"location":2,"content":"that, you know, they're sort of,"},{"from":4713.91,"to":4716.31,"location":2,"content":"doing different kinds of attention."},{"from":4716.31,"to":4719.18,"location":2,"content":"That you could be doing attention right,"},{"from":4719.18,"to":4722.24,"location":2,"content":"lining up with the original LSTM,"},{"from":4722.24,"to":4726.34,"location":2,"content":"you could run both sides through some stuff and do attention,"},{"from":4726.34,"to":4729.74,"location":2,"content":"you can do self attention inside your layer that there are a lot of"},{"from":4729.74,"to":4733.3,"location":2,"content":"different attentions that different models have explored."},{"from":4733.3,"to":4735.71,"location":2,"content":"And essentially what they are wanting to say is,"},{"from":4735.71,"to":4739.98,"location":2,"content":"let's do all of those and let's make it deep and do it all"},{"from":4739.98,"to":4744.21,"location":2,"content":"five times and the numbers will go up. And to some extent the answer is,"},{"from":4744.21,"to":4749.4,"location":2,"content":"yeah they do and the model ends up scoring very well."},{"from":4749.4,"to":4755.59,"location":2,"content":"Okay, um, so the one last thing I just wanted to mention but not explain is,"},{"from":4755.59,"to":4758.45,"location":2,"content":"I mean in the last year there's then been"},{"from":4758.45,"to":4762.95,"location":2,"content":"a further revolution in how well people can do these tasks."},{"from":4762.95,"to":4769.8,"location":2,"content":"And so people have developed algorithms which produce contextual word representation."},{"from":4769.8,"to":4772.79,"location":2,"content":"So that means that rather than a traditional word vector,"},{"from":4772.79,"to":4776.66,"location":2,"content":"you have a representation for each word in a particular context."},{"from":4776.66,"to":4781.7,"location":2,"content":"So here's the word frog in this particular context and the way people build"},{"from":4781.7,"to":4784.49,"location":2,"content":"those representations is using something"},{"from":4784.49,"to":4787.58,"location":2,"content":"like a language modeling tasks like Abby talked about,"},{"from":4787.58,"to":4790.73,"location":2,"content":"of saying putting probabilities of words in"},{"from":4790.73,"to":4794.8,"location":2,"content":"context to learn a context-specific word representation."},{"from":4794.8,"to":4797.87,"location":2,"content":"And ELMo was the first well-known such model."},{"from":4797.87,"to":4800.41,"location":2,"content":"And then people from Google came up with BERT,"},{"from":4800.41,"to":4801.83,"location":2,"content":"which worked even better."},{"from":4801.83,"to":4806.49,"location":2,"content":"Um, and so BERT is really in some sense is"},{"from":4806.49,"to":4811.23,"location":2,"content":"super complex attention Architecture doing a language modeling like objective."},{"from":4811.23,"to":4813.68,"location":2,"content":"We're going to talk about these later, um,"},{"from":4813.68,"to":4816.58,"location":2,"content":"I'm not going to talk about them now, um,"},{"from":4816.58,"to":4822.26,"location":2,"content":"but if you look at the current SQuAD 2.0 Leaderboard,"},{"from":4822.26,"to":4824.09,"location":2,"content":"um, you will quickly,"},{"from":4824.09,"to":4828.48,"location":2,"content":"um - sorry that's- oh I put the wrong slide and that was the bottom of the leaderboard."},{"from":4828.48,"to":4830.27,"location":2,"content":"Oops, slipped at the last minute."},{"from":4830.27,"to":4834.78,"location":2,"content":"If you go back to my slide which had the top of the leaderboard, um,"},{"from":4834.78,"to":4838.81,"location":2,"content":"you will have noticed that the top of the leaderboard,"},{"from":4838.81,"to":4842.82,"location":2,"content":"every single one of the top systems uses BERT."},{"from":4842.82,"to":4845.24,"location":2,"content":"So that's something that you may want to"},{"from":4845.24,"to":4847.82,"location":2,"content":"consider but you may want to consider how you could"},{"from":4847.82,"to":4852.8,"location":2,"content":"use it as a sub-module which you could add other stuff too as many of these systems do."},{"from":4852.8,"to":4856.14,"location":2,"content":"Okay. Done for today."}]}