{"font_size":0.4,"font_color":"#FFFFFF","background_alpha":0.5,"background_color":"#9C27B0","Stroke":"none","body":[{"from":4.43,"to":7.41,"location":2,"content":"Okay. Hello everyone."},{"from":7.41,"to":11.27,"location":2,"content":"[LAUGHTER] Okay we should get started."},{"from":11.27,"to":14.64,"location":2,"content":"Um, they're actually are still quite a few seats left."},{"from":14.64,"to":15.96,"location":2,"content":"If you wanna be really bold,"},{"from":15.96,"to":18.52,"location":2,"content":"there are a couple of seats right in front of me in the front row."},{"from":18.52,"to":20.45,"location":2,"content":"If you're less bolder a few over there."},{"from":20.45,"to":23.94,"location":2,"content":"Um, but they're also on some of the rows are quite a few middle seat."},{"from":23.94,"to":28.08,"location":2,"content":"So if people wanted to be really civic minded some people could sort of"},{"from":28.08,"to":32.28,"location":2,"content":"squeeze towards the edges and make more accessible um,"},{"from":32.28,"to":35.69,"location":2,"content":"some of the seats that still exist in the classroom."},{"from":35.69,"to":39.44,"location":2,"content":"Okay. Um, so, um,"},{"from":39.44,"to":42.89,"location":2,"content":"it's really exciting and great to see so many people here."},{"from":42.89,"to":47.39,"location":2,"content":"So I'm a hearty welcome to CS224N and occasionally also"},{"from":47.39,"to":52.63,"location":2,"content":"known as Ling 284 which is Natural Language Processing with Deep Learning."},{"from":52.63,"to":55.42,"location":2,"content":"Um, as just a sort of a personal anecdote,"},{"from":55.42,"to":59.72,"location":2,"content":"is still sort of blows my mind that so many people turn up to this class these days."},{"from":59.72,"to":63.98,"location":2,"content":"So, for about the first decade that I taught NLP here,"},{"from":63.98,"to":68.18,"location":2,"content":"you know the number of people I got each year was approximately 45."},{"from":68.18,"to":71.24,"location":2,"content":"[LAUGHTER] So it's an order of [LAUGHTER] magnitude smaller than"},{"from":71.24,"to":74.36,"location":2,"content":"it is now but guess it says quite a lot"},{"from":74.36,"to":77.45,"location":2,"content":"on about what a revolutionary impact"},{"from":77.45,"to":80.87,"location":2,"content":"that artificial intelligence in general and machine learning,"},{"from":80.87,"to":85.6,"location":2,"content":"deep learning, NLP are starting to have in modern society."},{"from":85.6,"to":88.86,"location":2,"content":"Okay. So this is our plan for today."},{"from":88.86,"to":92.75,"location":2,"content":"So, um, um, we're really gonna get straight down to business today."},{"from":92.75,"to":97.97,"location":2,"content":"So they'll be a brief, very brief introduction some of the sort of course logistics,"},{"from":97.97,"to":102.14,"location":2,"content":"very brief discussion and talk about human language and"},{"from":102.14,"to":106.37,"location":2,"content":"word meaning and then we wanna get right into talking about um,"},{"from":106.37,"to":110.54,"location":2,"content":"the first thing that we're doing which is coming up with word vectors and looking"},{"from":110.54,"to":115.01,"location":2,"content":"at the word2vec algorithm and that will then sort of fill up the rest of the class."},{"from":115.01,"to":116.84,"location":2,"content":"There are still two seats right in"},{"from":116.84,"to":119.48,"location":2,"content":"the front row for someone who wants to sit right in front of me,"},{"from":119.48,"to":122.76,"location":2,"content":"just letting you know [LAUGHTER]."},{"from":122.76,"to":126.36,"location":2,"content":"Okay. Okay. So here are the course logistics in brief."},{"from":126.36,"to":128.34,"location":2,"content":"So I'm Christopher Manning,"},{"from":128.34,"to":135.29,"location":2,"content":"the person who bravely became the head TA is Abigail See is right there."},{"from":135.29,"to":138.92,"location":2,"content":"And then we have quite a lot of wonderful TA's."},{"from":138.92,"to":142.7,"location":2,"content":"To the people who are wonderful TA's just sort of stand up for one moment."},{"from":142.7,"to":146.81,"location":2,"content":"So, um, [LAUGHTER] we have some sense for wonderful TAs."},{"from":146.81,"to":148.9,"location":2,"content":"[LAUGHTER] Okay great."},{"from":148.9,"to":151.32,"location":2,"content":"Um, okay."},{"from":151.32,"to":153.26,"location":2,"content":"So you know when the lecture is because you made it"},{"from":153.26,"to":157.1,"location":2,"content":"here and so welcome also to SCPD people."},{"from":157.1,"to":161.3,"location":2,"content":"This is also an SCPD class and you can watch it on video."},{"from":161.3,"to":164.3,"location":2,"content":"But we love for Stanford students to turn"},{"from":164.3,"to":167.3,"location":2,"content":"up and show their beautiful faces in the classroom."},{"from":167.3,"to":172.81,"location":2,"content":"Okay. So, um, the web-page has all the info about syllabus et cetera et cetera."},{"from":172.81,"to":176.18,"location":2,"content":"Okay. So this class what do we hope to teach?"},{"from":176.18,"to":179.24,"location":2,"content":"So, one thing that we wanna teach is, uh, you know,"},{"from":179.24,"to":182.5,"location":2,"content":"an understanding of effective modern methods for deep learning."},{"from":182.5,"to":185.09,"location":2,"content":"Starting off by reviewing some of the basics and then"},{"from":185.09,"to":188.78,"location":2,"content":"particularly talking about the kinds of techniques including um,"},{"from":188.78,"to":191.45,"location":2,"content":"recurrent networks and attention that are widely"},{"from":191.45,"to":194.57,"location":2,"content":"used for natural language processing models."},{"from":194.57,"to":198.77,"location":2,"content":"A second thing we wanna teach is a big picture understanding of"},{"from":198.77,"to":203.07,"location":2,"content":"human languages and some of the difficulties in understanding and producing them."},{"from":203.07,"to":205.49,"location":2,"content":"Of course if you wanna know a lot about human languages,"},{"from":205.49,"to":209.06,"location":2,"content":"there's a whole linguistics department and you can do a lot of courses of that."},{"from":209.06,"to":213.59,"location":2,"content":"Um, but so I wanna give at least some appreciation so you have some clue of what are"},{"from":213.59,"to":218.24,"location":2,"content":"the challenges and difficulties and varieties of human languages."},{"from":218.24,"to":221.31,"location":2,"content":"And then this is also kind of a practical class."},{"from":221.31,"to":224.96,"location":2,"content":"Like we actually wanna teach you how you can"},{"from":224.96,"to":229.67,"location":2,"content":"build practical systems that work for some of the major parts of NLP."},{"from":229.67,"to":233.75,"location":2,"content":"So if you go and get a job at one of those tech firms and they say \"Hey,"},{"from":233.75,"to":235.79,"location":2,"content":"could you build us a named entity recognizer?\""},{"from":235.79,"to":238.13,"location":2,"content":"You can say \"Sure, I can do that.\""},{"from":238.13,"to":240.53,"location":2,"content":"And so for a bunch of problems,"},{"from":240.53,"to":242.09,"location":2,"content":"obviously we can't do everything,"},{"from":242.09,"to":243.23,"location":2,"content":"we're gonna do word meaning,"},{"from":243.23,"to":247.58,"location":2,"content":"dependency parsing, machine translation and you have an option to do question answering,"},{"from":247.58,"to":250.34,"location":2,"content":"I'm actually building systems for those."},{"from":250.34,"to":255.08,"location":2,"content":"If you'd been talking to friends who did the class in the last couple of years,"},{"from":255.08,"to":258.86,"location":2,"content":"um, here are the differences for this year just to get things straight."},{"from":258.86,"to":261.83,"location":2,"content":"Um, so we've updated some of the content of the course."},{"from":261.83,"to":266.29,"location":2,"content":"So, uh, between me and guest lectures there's new content."},{"from":266.29,"to":269.03,"location":2,"content":"Well that look bad."},{"from":269.03,"to":272.5,"location":2,"content":"Wonder if that will keep happening, we'll find out."},{"from":272.5,"to":278.17,"location":2,"content":"There's new content and on various topics that are sort of developing areas."},{"from":278.17,"to":281.3,"location":2,"content":"One of the problems with this course is really big area of deep learning at"},{"from":281.3,"to":284.75,"location":2,"content":"the moment is still just developing really really quickly."},{"from":284.75,"to":287.48,"location":2,"content":"So, it's sort of seems like one-year-old content is already"},{"from":287.48,"to":291.29,"location":2,"content":"things kind of data and we're trying to update things."},{"from":291.29,"to":294.14,"location":2,"content":"A big change that we're making this year is we're"},{"from":294.14,"to":296.93,"location":2,"content":"having five-one week assignments instead of"},{"from":296.93,"to":299.45,"location":2,"content":"three-two week assignments at the beginning of"},{"from":299.45,"to":302.8,"location":2,"content":"the course and I'll say a bit more about that in a minute."},{"from":302.8,"to":306.21,"location":2,"content":"Um, this year we're gonna use PyTorch instead of TensorFlow,"},{"from":306.21,"to":308.86,"location":2,"content":"and we'll talk about that more later too."},{"from":308.86,"to":313.88,"location":2,"content":"Um, we're having the assignments due before class on either Tuesday or Thursday."},{"from":313.88,"to":316.58,"location":2,"content":"So you're not distracted and can come to class."},{"from":316.58,"to":320.36,"location":2,"content":"So starting off, um, yeah."},{"from":320.36,"to":322.68,"location":2,"content":"So we're trying to give an easier,"},{"from":322.68,"to":326.51,"location":2,"content":"gentler ramp-up but on the other hand a fast ramp-up."},{"from":326.51,"to":329.56,"location":2,"content":"So we've got this first assignment which is sort of easy, uh,"},{"from":329.56,"to":334.04,"location":2,"content":"but it's available right now and is due next Tuesday."},{"from":334.04,"to":337.46,"location":2,"content":"And the final thing is we're not having a midterm this year."},{"from":337.46,"to":339.39,"location":2,"content":"Um, okay."},{"from":339.39,"to":340.79,"location":2,"content":"So this is what we're doing."},{"from":340.79,"to":344.3,"location":2,"content":"So there are five of these assignments that I just mentioned."},{"from":344.3,"to":346.34,"location":2,"content":"Um, So six percent for the first one,"},{"from":346.34,"to":349.09,"location":2,"content":"12 percent for each of the other ones,"},{"from":349.09,"to":352.19,"location":2,"content":"um, and, I already said that."},{"from":352.19,"to":354.23,"location":2,"content":"We're gonna use gradescope for grading."},{"from":354.23,"to":356.78,"location":2,"content":"It'll be really help out the TAs if you could use"},{"from":356.78,"to":361.01,"location":2,"content":"your SUnet ID as your gradescope account ID."},{"from":361.01,"to":364.2,"location":2,"content":"Um, so then for the second part of the course,"},{"from":364.2,"to":368.8,"location":2,"content":"people do a final project and there are two choices for the final project."},{"from":368.8,"to":372.08,"location":2,"content":"You can either do our default final project,"},{"from":372.08,"to":374.03,"location":2,"content":"which is a good option for many people,"},{"from":374.03,"to":375.89,"location":2,"content":"or you can do a custom final project and I'll"},{"from":375.89,"to":379.01,"location":2,"content":"talk about that in the more in the beginning."},{"from":379.01,"to":381.2,"location":2,"content":"This is not working right."},{"from":381.2,"to":385.13,"location":2,"content":"Um, and so then at the end we have"},{"from":385.13,"to":390.43,"location":2,"content":"a final poster presentation session at which your attendance is expected,"},{"from":390.43,"to":394.58,"location":2,"content":"and we're gonna be having that Wednesday in the evening."},{"from":394.58,"to":397.46,"location":2,"content":"Probably not quite five hours but it'll be within that window,"},{"from":397.46,"to":399.49,"location":2,"content":"we'll work out the details in a bit."},{"from":399.49,"to":401.51,"location":2,"content":"Three percent for participation,"},{"from":401.51,"to":403.39,"location":2,"content":"see the website for details."},{"from":403.39,"to":405.88,"location":2,"content":"Six late days, um,"},{"from":405.88,"to":410.33,"location":2,"content":"collaboration, like always in computer science classes,"},{"from":410.33,"to":415.34,"location":2,"content":"we want you to do your own work and not borrow stuff from other people's Githubs and"},{"from":415.34,"to":417.65,"location":2,"content":"so we really do emphasize that you should"},{"from":417.65,"to":421.15,"location":2,"content":"read and pay attention to collaboration policies."},{"from":421.15,"to":424.7,"location":2,"content":"Okay. So here's the high level plan for the problem sets."},{"from":424.7,"to":427.79,"location":2,"content":"So, homework one available right now,"},{"from":427.79,"to":430.13,"location":2,"content":"is a hopefully easy on ramp."},{"from":430.13,"to":431.72,"location":2,"content":"That's on iPython notebook,"},{"from":431.72,"to":433.56,"location":2,"content":"just help get everyone up to speed."},{"from":433.56,"to":437.75,"location":2,"content":"Homework two is pure Python plus numpy but that"},{"from":437.75,"to":442.19,"location":2,"content":"will start to kind of teach you more about the sort of underlying,"},{"from":442.19,"to":444.26,"location":2,"content":"how do we do deep learning."},{"from":444.26,"to":449.43,"location":2,"content":"If you're not so good or a bit rusty or never seen um,"},{"from":449.43,"to":451.15,"location":2,"content":"Python or numpy, um,"},{"from":451.15,"to":454.73,"location":2,"content":"we're gonna have an extra section on Friday."},{"from":454.73,"to":458.21,"location":2,"content":"So Friday from 1:30 to 2:50 um,"},{"from":458.21,"to":462.71,"location":2,"content":"in Skilling Auditorium, we'll have a section that's a Python review."},{"from":462.71,"to":464.61,"location":2,"content":"That's our only plan section at the moment,"},{"from":464.61,"to":466.6,"location":2,"content":"we're not gonna have a regular section."},{"from":466.6,"to":469.55,"location":2,"content":"Um, so encourage to go to that and that will also be"},{"from":469.55,"to":473.51,"location":2,"content":"recorded for SCPD and available for video as well."},{"from":473.51,"to":476.79,"location":2,"content":"Um, then Homework three um,"},{"from":476.79,"to":480.85,"location":2,"content":"will start us on using PyTorch."},{"from":480.85,"to":484.76,"location":2,"content":"And then homeworks four and five we're then gonna be using"},{"from":484.76,"to":488.72,"location":2,"content":"py- PyTorch on GPU and we're actually gonna be using"},{"from":488.72,"to":493.52,"location":2,"content":"Microsoft Azure with big thank yous to the kind Microsoft Azure people who have"},{"from":493.52,"to":499.17,"location":2,"content":"sponsored our GPU computing for the last um, three years."},{"from":499.17,"to":505,"location":2,"content":"Um, yes. So basically I mean all of modern deep learning has moved to the use"},{"from":505,"to":510.59,"location":2,"content":"of one or other of the large deep learning libraries like PyTorch TensorFlow,"},{"from":510.59,"to":512.21,"location":2,"content":"Chainer or MXNet um,"},{"from":512.21,"to":516.44,"location":2,"content":"et cetera and then doing the computing on GPU."},{"from":516.44,"to":518.6,"location":2,"content":"So of course since we're in the one building,"},{"from":518.6,"to":520.45,"location":2,"content":"we should of course be using, um,"},{"from":520.45,"to":522.62,"location":2,"content":"GPUs [LAUGHTER] but I mean in general"},{"from":522.62,"to":528.83,"location":2,"content":"the so parallelisms scalability of GPUs is what's powered most of modern deep learning."},{"from":528.83,"to":530.72,"location":2,"content":"Okay. The final project."},{"from":530.72,"to":535.46,"location":2,"content":"So for the final project there are two things that you can do."},{"from":535.46,"to":540.66,"location":2,"content":"So we have a default final project which is essentially our final project in a box."},{"from":540.66,"to":546.22,"location":2,"content":"And so this is building a question answering system and we do it over the squad dataset."},{"from":546.22,"to":551.45,"location":2,"content":"So what you build and how you can improve your performance is completely up to you."},{"from":551.45,"to":554.48,"location":2,"content":"It is open-ended but it has an easier start,"},{"from":554.48,"to":556.91,"location":2,"content":"a clearly defined objective and we can"},{"from":556.91,"to":559.77,"location":2,"content":"have a leaderboard for how well things are working."},{"from":559.77,"to":564.68,"location":2,"content":"Um, so if you don't have a clear research objective that can be a good choice for you"},{"from":564.68,"to":569.6,"location":2,"content":"or you can propose the custom Final Project and assuming it's sensible,"},{"from":569.6,"to":572.54,"location":2,"content":"we will approve your custom final project,"},{"from":572.54,"to":574.19,"location":2,"content":"we will give you feedback, um,"},{"from":574.19,"to":576.75,"location":2,"content":"form someone as a mentor, um,"},{"from":576.75,"to":582.41,"location":2,"content":"and either way for only the final project we allow teams of one, two or three."},{"from":582.41,"to":585.2,"location":2,"content":"For the homework should expect it to do them yourself."},{"from":585.2,"to":590.02,"location":2,"content":"Of course you can chat to people in a general way about the problems."},{"from":590.02,"to":593.01,"location":2,"content":"Okay. So that is the course."},{"from":593.01,"to":595.7,"location":2,"content":"All good, and not even behind schedule yet."},{"from":595.7,"to":601.73,"location":2,"content":"Okay. So the next section is human language and word meaning.Um."},{"from":601.73,"to":604.75,"location":2,"content":"You know, if I was um,"},{"from":604.75,"to":610.26,"location":2,"content":"really going to tell you a lot about human language that would take a lot of time um,"},{"from":610.26,"to":612.11,"location":2,"content":"which I don't really have here."},{"from":612.11,"to":614.01,"location":2,"content":"So I'm just going to tell you um,"},{"from":614.01,"to":616.65,"location":2,"content":"two anecdotes about human language."},{"from":616.65,"to":619.97,"location":2,"content":"And the first is this XKCD cartoon."},{"from":619.97,"to":622.52,"location":2,"content":"Um, and I mean this isn't,"},{"from":622.52,"to":626.05,"location":2,"content":"and I don't know why that's happening."},{"from":626.05,"to":628.25,"location":2,"content":"I'm not sure what to make of that."},{"from":628.25,"to":634.07,"location":2,"content":"Um, so, I actually really liked this XKCD cartoon."},{"from":634.07,"to":637.31,"location":2,"content":"It's not one of the classic ones that you see most often around the place,"},{"from":637.31,"to":642.14,"location":2,"content":"but I actually think it says a lot about language and is worth thinking about."},{"from":642.14,"to":645.65,"location":2,"content":"Like I think a lot of the time for the kind of people who come"},{"from":645.65,"to":649.38,"location":2,"content":"to this class who are mainly people like CS people,"},{"from":649.38,"to":651.95,"location":2,"content":"and EE people and random others."},{"from":651.95,"to":655.25,"location":2,"content":"There's some other people I know since these people linguists and so on around."},{"from":655.25,"to":657.05,"location":2,"content":"But for a lot of those people like,"},{"from":657.05,"to":661.61,"location":2,"content":"you've sort of spent your life looking at formal languages and the impression"},{"from":661.61,"to":666.18,"location":2,"content":"is that sort of human language as a sort of somehow a little bit broken formal languages,"},{"from":666.18,"to":668.57,"location":2,"content":"but there's really a lot more to it than that, right?"},{"from":668.57,"to":671.16,"location":2,"content":"That language is this amazing um,"},{"from":671.16,"to":675.11,"location":2,"content":"human created system that is used for"},{"from":675.11,"to":679.52,"location":2,"content":"all sorts of purposes and is adaptable to all sorts of purposes."},{"from":679.52,"to":683.75,"location":2,"content":"So you can do everything from describing mathematics and human language"},{"from":683.75,"to":688.52,"location":2,"content":"um to sort of nuzzling up to your best friend and getting them to understand you better."},{"from":688.52,"to":691.91,"location":2,"content":"So there's actually an amazing thing of human language. Anyway, I'll just read it."},{"from":691.91,"to":694.65,"location":2,"content":"Um, so it's the first person,"},{"from":694.65,"to":696.18,"location":2,"content":"the dark haired person says,"},{"from":696.18,"to":698.11,"location":2,"content":"\"Anyway, I could care less.\""},{"from":698.11,"to":700.01,"location":2,"content":"And her friend says,"},{"from":700.01,"to":702.44,"location":2,"content":"\"I think you mean you couldn't care less.\""},{"from":702.44,"to":706.49,"location":2,"content":"Saying you could care less implies you care at least some amount."},{"from":706.49,"to":709.77,"location":2,"content":"And the dark haired person says, \"I don't know,"},{"from":709.77,"to":714.59,"location":2,"content":"we're these unbelievably complicated brains drifting through a void trying"},{"from":714.59,"to":719.63,"location":2,"content":"in vain to connect with one another by blindly flinging words out into the darkness.\""},{"from":719.63,"to":722.72,"location":2,"content":"Every choice of phrasing and spelling, and tone,"},{"from":722.72,"to":727.77,"location":2,"content":"and timing carries countless signals and contexts and subtexts and more."},{"from":727.77,"to":731.43,"location":2,"content":"And every listener interprets those signals in their own way."},{"from":731.43,"to":733.57,"location":2,"content":"Language isn't a formal system,"},{"from":733.57,"to":736.24,"location":2,"content":"language is glorious chaos."},{"from":736.24,"to":740.75,"location":2,"content":"You can never know for sure what any words will mean to anyone."},{"from":740.75,"to":746.15,"location":2,"content":"All you can do is try to get better at guessing how your words affect people so"},{"from":746.15,"to":748.79,"location":2,"content":"you can have a chance of finding the ones that will make"},{"from":748.79,"to":751.79,"location":2,"content":"them feel something like what you want them to feel."},{"from":751.79,"to":754.24,"location":2,"content":"Everything else is pointless."},{"from":754.24,"to":757.39,"location":2,"content":"I assume you're giving me tips on how you interpret"},{"from":757.39,"to":761.07,"location":2,"content":"words because you want me to feel less alone."},{"from":761.07,"to":763.51,"location":2,"content":"If so, thank you."},{"from":763.51,"to":765.59,"location":2,"content":"That means a lot."},{"from":765.59,"to":768.44,"location":2,"content":"But if you're just running my sentences past"},{"from":768.44,"to":771.78,"location":2,"content":"some mental checklist so you can show off how well you know it,"},{"from":771.78,"to":773.18,"location":2,"content":"then I could care less."},{"from":773.18,"to":782.83,"location":2,"content":"[NOISE] Um, and so I think um,"},{"from":782.83,"to":787.79,"location":2,"content":"I think actually this has some nice messages about how language is this uncertain"},{"from":787.79,"to":793.34,"location":2,"content":"evolved system of communication but somehow we have enough agreed meaning that you know,"},{"from":793.34,"to":795.5,"location":2,"content":"we can kind of pretty much communicate."},{"from":795.5,"to":796.87,"location":2,"content":"But we're doing some kind of you know"},{"from":796.87,"to":800.54,"location":2,"content":"probabilistic inference of guessing what people mean and we're"},{"from":800.54,"to":802.07,"location":2,"content":"using language not just for"},{"from":802.07,"to":806.2,"location":2,"content":"the information functions but for the social functions etc etc."},{"from":806.2,"to":813.49,"location":2,"content":"Okay. And then here's my one other thought I had review about language."},{"from":813.49,"to":820.57,"location":2,"content":"So, essentially if we want to have artificial intelligence that's intelligent,"},{"from":820.57,"to":823.94,"location":2,"content":"what we need to somehow get to the point of having"},{"from":823.94,"to":828.56,"location":2,"content":"compu- computers that have the knowledge of human beings, right?"},{"from":828.56,"to":832.43,"location":2,"content":"Because human beings have knowledge that gives them intelligence."},{"from":832.43,"to":835.46,"location":2,"content":"And if you think about how we sort of"},{"from":835.46,"to":839.27,"location":2,"content":"convey knowledge around the place in our human world,"},{"from":839.27,"to":844.02,"location":2,"content":"mainly the way we do it is through human language."},{"from":844.02,"to":846.41,"location":2,"content":"You know, some kinds of knowledge you can sort of"},{"from":846.41,"to":849.26,"location":2,"content":"work out for yourself by doing physical stuff right,"},{"from":849.26,"to":851.9,"location":2,"content":"I can hold this and drop that and I've learnt something."},{"from":851.9,"to":853.76,"location":2,"content":"So I have to learn a bit of knowledge there."},{"from":853.76,"to":857.18,"location":2,"content":"But sort of most of the knowledge in your heads and why you're sitting in"},{"from":857.18,"to":861.98,"location":2,"content":"this classroom has come from people communicating in human language to you."},{"from":861.98,"to":864.26,"location":2,"content":"Um, so one of the famous,"},{"from":864.26,"to":866.99,"location":2,"content":"most famous steep learning people Yann Le Cun,"},{"from":866.99,"to":869.16,"location":2,"content":"he likes to say this line about,"},{"from":869.16,"to":873.38,"location":2,"content":"oh, you know really I think that you know there's not much difference"},{"from":873.38,"to":877.97,"location":2,"content":"between the intelligence of human being and orangutan."},{"from":877.97,"to":880.51,"location":2,"content":"And I actually think he's really wrong on that."},{"from":880.51,"to":882.79,"location":2,"content":"Like the sense in which he means that is,"},{"from":882.79,"to":885.84,"location":2,"content":"an orangutan has a really good vision system."},{"from":885.84,"to":888.61,"location":2,"content":"Orangutans have very good you know control of"},{"from":888.61,"to":892.06,"location":2,"content":"their arms just like human beings for picking things up."},{"from":892.06,"to":898.97,"location":2,"content":"Orangutans um can use tools um and orangutans can make plans so"},{"from":898.97,"to":902.27,"location":2,"content":"that if you sort of put the food somewhere where they have to sort of move"},{"from":902.27,"to":905.96,"location":2,"content":"the plank to get to the island with the food they can do a plan like that."},{"from":905.96,"to":909.89,"location":2,"content":"So yeah, in a sense they've got a fair bit of intelligence but you know,"},{"from":909.89,"to":913.38,"location":2,"content":"sort of orangutans just aren't like human beings."},{"from":913.38,"to":916.1,"location":2,"content":"And why aren't they like human beings?"},{"from":916.1,"to":921.61,"location":2,"content":"And I'd like to suggest to you the reason for that is what human beings have achieved is,"},{"from":921.61,"to":925.07,"location":2,"content":"we don't just have sort of one computer like"},{"from":925.07,"to":929.83,"location":2,"content":"a you know dusty old IBM PC in your mother's garage."},{"from":929.83,"to":933.74,"location":2,"content":"What we have is a human computer network."},{"from":933.74,"to":937.52,"location":2,"content":"And the way that we've achieved that human computer network is that,"},{"from":937.52,"to":941.28,"location":2,"content":"we use human languages as our networking language."},{"from":941.28,"to":944.69,"location":2,"content":"Um, and so, when you think about it um,"},{"from":944.69,"to":951.82,"location":2,"content":"so on any kind of evolutionary scale language is super super super super recent, right?"},{"from":951.82,"to":957.47,"location":2,"content":"That um, creatures have had vision for people don't quite know but you know,"},{"from":957.47,"to":960.98,"location":2,"content":"maybe it's 75 million years or maybe it's longer, right?"},{"from":960.98,"to":963.85,"location":2,"content":"A huge length of time."},{"from":963.85,"to":967.29,"location":2,"content":"How long have human beings have had language?"},{"from":967.29,"to":969.86,"location":2,"content":"You know people don't know that either because it turns out you know,"},{"from":969.86,"to":971.01,"location":2,"content":"when you have fossils,"},{"from":971.01,"to":973.49,"location":2,"content":"you can't knock the skull on the side and say,"},{"from":973.49,"to":975.05,"location":2,"content":"do you not have language."},{"from":975.05,"to":979.1,"location":2,"content":"Um, but you know, most people estimate that sort of language is"},{"from":979.1,"to":985.99,"location":2,"content":"a very recent invention before current human beings moved out of um, out of Africa."},{"from":985.99,"to":988.55,"location":2,"content":"So that many people think that we've only had language for"},{"from":988.55,"to":991.46,"location":2,"content":"something like a 100,000 years or something like that."},{"from":991.46,"to":995.45,"location":2,"content":"So that's sort of you know blink of an eye on the evolutionary timescale."},{"from":995.45,"to":999.74,"location":2,"content":"But you know, it was the development of language [inaudible]"},{"from":999.74,"to":1003.97,"location":2,"content":"that sort of made human beings invisible- [NOISE] in invincible, right?"},{"from":1003.97,"to":1006.48,"location":2,"content":"It wasn't that, human beings um,"},{"from":1006.48,"to":1011.41,"location":2,"content":"developed poison fangs or developed ability to run"},{"from":1011.41,"to":1013.66,"location":2,"content":"faster than any other creature or"},{"from":1013.66,"to":1016.21,"location":2,"content":"put a big horn on their heads or something like that, right?"},{"from":1016.21,"to":1019.06,"location":2,"content":"You know, humans are basically pretty puny um,"},{"from":1019.06,"to":1021.19,"location":2,"content":"but they had this um,"},{"from":1021.19,"to":1024.31,"location":2,"content":"unbeatable advantage that they could communicate with"},{"from":1024.31,"to":1027.88,"location":2,"content":"each other and therefore work much more effectively in teams."},{"from":1027.88,"to":1031.49,"location":2,"content":"And that sort of basically made human beings invincible."},{"from":1031.49,"to":1035.58,"location":2,"content":"But you know, even then humans were kind of limited, right?"},{"from":1035.58,"to":1038.14,"location":2,"content":"That kind of got you to about the Stone Age right,"},{"from":1038.14,"to":1040.39,"location":2,"content":"where you could bang on your stones and with"},{"from":1040.39,"to":1043.24,"location":2,"content":"the right kind of stone make something sharp to cut with."},{"from":1043.24,"to":1045.68,"location":2,"content":"Um, what got humans beyond that,"},{"from":1045.68,"to":1048.1,"location":2,"content":"was that they invented writing."},{"from":1048.1,"to":1052.91,"location":2,"content":"So writing was then an ability where you could take knowledge"},{"from":1052.91,"to":1057.73,"location":2,"content":"not only communicated um mouth to mouth to people that you saw."},{"from":1057.73,"to":1061.66,"location":2,"content":"You could put it down on your piece of papyrus so your clay tablet or whatever"},{"from":1061.66,"to":1065.62,"location":2,"content":"it was at first and that knowledge could then be sent places."},{"from":1065.62,"to":1070.27,"location":2,"content":"It could be sent spatially around the world and it could then"},{"from":1070.27,"to":1075.43,"location":2,"content":"be sent temporally through time."},{"from":1075.43,"to":1077.29,"location":2,"content":"And well, how old is writing?"},{"from":1077.29,"to":1080.89,"location":2,"content":"I mean, we sort of basically know about how old writing is, right?"},{"from":1080.89,"to":1084.12,"location":2,"content":"That writing is about 5,000 years old."},{"from":1084.12,"to":1089.74,"location":2,"content":"It's incredibly incredibly recent on this scale of evolution but you know,"},{"from":1089.74,"to":1096.73,"location":2,"content":"essentially writing was so powerful as a way of having knowledge that then in those 5,000"},{"from":1096.73,"to":1104.04,"location":2,"content":"years that enabled human beings to go from stone age sharp piece or flint to you know,"},{"from":1104.04,"to":1106.24,"location":2,"content":"having iPhones and all of these things,"},{"from":1106.24,"to":1108.79,"location":2,"content":"all these incredibly sophisticated devices."},{"from":1108.79,"to":1112.96,"location":2,"content":"So, language is pretty special thing I'd like to suggest."},{"from":1112.96,"to":1117.91,"location":2,"content":"Um, but you know, if I go back to my analogy that sort of it's allowed humans to"},{"from":1117.91,"to":1123.28,"location":2,"content":"construct a networked computer that is way way more powerful than um,"},{"from":1123.28,"to":1127.6,"location":2,"content":"just having individual creatures as sort of intelligent like an orangutan."},{"from":1127.6,"to":1130.53,"location":2,"content":"Um, and you compare it to our computer networks,"},{"from":1130.53,"to":1133.05,"location":2,"content":"it's a really funny kind of network, right?"},{"from":1133.05,"to":1135.74,"location":2,"content":"You know that these days um,"},{"from":1135.74,"to":1141.81,"location":2,"content":"we have networks that run around where we have sort of large network bandwidth, right?"},{"from":1141.81,"to":1143.77,"location":2,"content":"You know, we might be frustrated sometimes with"},{"from":1143.77,"to":1146.53,"location":2,"content":"our Netflix downloads but by and large you know,"},{"from":1146.53,"to":1149.76,"location":2,"content":"we can download hundreds of megabytes really easily and quickly."},{"from":1149.76,"to":1151.57,"location":2,"content":"And we don't think that's fast enough,"},{"from":1151.57,"to":1153.67,"location":2,"content":"so we're going to be rolling out 5G networks."},{"from":1153.67,"to":1156.4,"location":2,"content":"So it's an order of magnitude faster again."},{"from":1156.4,"to":1158.8,"location":2,"content":"I mean, by comparison to that, I mean,"},{"from":1158.8,"to":1163.54,"location":2,"content":"human language is a pathetically slow network, right?"},{"from":1163.54,"to":1169.46,"location":2,"content":"That the amount of information you can convey by human language is very slow."},{"from":1169.46,"to":1173.95,"location":2,"content":"I mean you know, whatever it is I sort of speak at about 15 words a second right,"},{"from":1173.95,"to":1175.42,"location":2,"content":"you can start doing um,"},{"from":1175.42,"to":1177.55,"location":2,"content":"your information theory if you know some right?"},{"from":1177.55,"to":1181.06,"location":2,"content":"But um, you don't actually get much bandwidth at all."},{"from":1181.06,"to":1184.4,"location":2,"content":"And that then leads- so you can think of,"},{"from":1184.4,"to":1185.98,"location":2,"content":"how does it work then?"},{"from":1185.98,"to":1187.57,"location":2,"content":"So, humans have come up with"},{"from":1187.57,"to":1193.39,"location":2,"content":"this incredibly impressive system which is essentially form of compression."},{"from":1193.39,"to":1196.12,"location":2,"content":"Sort of a very adaptive form of compression,"},{"from":1196.12,"to":1198.07,"location":2,"content":"so that when we're talking to people,"},{"from":1198.07,"to":1202.87,"location":2,"content":"we assume that they have an enormous amount of knowledge in their heads which"},{"from":1202.87,"to":1207.64,"location":2,"content":"isn't the same as but it's broadly similar to mine when I'm talking to you right?"},{"from":1207.64,"to":1210.57,"location":2,"content":"That you know what English words mean,"},{"from":1210.57,"to":1213.85,"location":2,"content":"and you know a lot about how the wor- world works."},{"from":1213.85,"to":1217.15,"location":2,"content":"And therefore, I can say a short message and communicate"},{"from":1217.15,"to":1222.82,"location":2,"content":"only a relatively short bit string and you can actually understand a lot. All right?"},{"from":1222.82,"to":1226.03,"location":2,"content":"So, I can say sort of whatever you know,"},{"from":1226.03,"to":1228.85,"location":2,"content":"imagine a busy shopping mall and that"},{"from":1228.85,"to":1231.63,"location":2,"content":"there are two guys standing in front of a makeup counter,"},{"from":1231.63,"to":1236.29,"location":2,"content":"and you know I've only said whatever that was sort of about 200 bits of"},{"from":1236.29,"to":1238.96,"location":2,"content":"information but that's enabled you to construct"},{"from":1238.96,"to":1242.34,"location":2,"content":"a whole visual scene that we're taking megabytes to um,"},{"from":1242.34,"to":1244.38,"location":2,"content":"represent as an image."},{"from":1244.38,"to":1246.63,"location":2,"content":"So, that's why language is good."},{"from":1246.63,"to":1249.1,"location":2,"content":"Um, so from that more authorial level,"},{"from":1249.1,"to":1251.42,"location":2,"content":"I'll now move back to the concrete stuff."},{"from":1251.42,"to":1255.92,"location":2,"content":"What we wanna do in this class is not solve the whole of language,"},{"from":1255.92,"to":1257.95,"location":2,"content":"but we want to represent, um,"},{"from":1257.95,"to":1260.38,"location":2,"content":"the meaning of words, right?"},{"from":1260.38,"to":1263.23,"location":2,"content":"So, a lot of language is bound up in words and their meanings"},{"from":1263.23,"to":1266.2,"location":2,"content":"and words can have really rich meanings, right?"},{"from":1266.2,"to":1267.97,"location":2,"content":"As soon as you say a word teacher,"},{"from":1267.97,"to":1272.53,"location":2,"content":"that's kinda quite a lot of rich meaning or you can have actions that have rich meaning."},{"from":1272.53,"to":1277.22,"location":2,"content":"So, if I say a word like prognosticate or,"},{"from":1277.22,"to":1279.07,"location":2,"content":"um, total or something you know,"},{"from":1279.07,"to":1282.38,"location":2,"content":"these words that have rich meanings and a lot of nuance on them."},{"from":1282.38,"to":1284.39,"location":2,"content":"And so we wanna represent meaning."},{"from":1284.39,"to":1286.51,"location":2,"content":"And so, the question is what is meaning?"},{"from":1286.51,"to":1289.36,"location":2,"content":"So, you can of course you can- dictionaries are meant to tell you about meanings."},{"from":1289.36,"to":1291.49,"location":2,"content":"So, you can look up dictionaries um,"},{"from":1291.49,"to":1295.72,"location":2,"content":"and Webster says sort of tries to relate meaning to idea."},{"from":1295.72,"to":1299.52,"location":2,"content":"The idea that is represented by a word or a phrase."},{"from":1299.52,"to":1304.24,"location":2,"content":"The idea that a person wants to express by word signs et cetera."},{"from":1304.24,"to":1306.19,"location":2,"content":"I mean, you know,"},{"from":1306.19,"to":1309.73,"location":2,"content":"you could think that these definitions are kind of a cop-out because it seems"},{"from":1309.73,"to":1313.02,"location":2,"content":"like they're rewriting meaning in terms of the word idea,"},{"from":1313.02,"to":1315.04,"location":2,"content":"and is that really gotten you anywhere."},{"from":1315.04,"to":1318.37,"location":2,"content":"Um, how do linguists think about meaning?"},{"from":1318.37,"to":1323.11,"location":2,"content":"I mean, the most common way that linguists have thought about"},{"from":1323.11,"to":1325.66,"location":2,"content":"meaning is an idea that's called denotational"},{"from":1325.66,"to":1328.42,"location":2,"content":"semantics which is also used in programming languages."},{"from":1328.42,"to":1334.81,"location":2,"content":"So, the idea of that is we think of meaning as what things represent."},{"from":1334.81,"to":1336.95,"location":2,"content":"So, if I say the word chair,"},{"from":1336.95,"to":1341.14,"location":2,"content":"the denotation of the word chair includes this one here and that one,"},{"from":1341.14,"to":1342.33,"location":2,"content":"that one, that one, that one."},{"from":1342.33,"to":1344.92,"location":2,"content":"And so, the word chair is sort of representing"},{"from":1344.92,"to":1348.58,"location":2,"content":"all the things that are chairs and you can sort of, um,"},{"from":1348.58,"to":1353.41,"location":2,"content":"you can then think of something like running as well that you know there's sort of sets"},{"from":1353.41,"to":1357.98,"location":2,"content":"of actions that people can partake that- that's their denotation."},{"from":1357.98,"to":1362.2,"location":2,"content":"And that's sort of what you most commonly see in philosophy or linguistics as denotation."},{"from":1362.2,"to":1367.13,"location":2,"content":"It's kind of a hard thing to get your hands on, um, computationally."},{"from":1367.13,"to":1370.48,"location":2,"content":"So, um, what type of people most commonly"},{"from":1370.48,"to":1374.02,"location":2,"content":"do or use the most commonly do I guess I should say now"},{"from":1374.02,"to":1377.53,"location":2,"content":"for working out the meaning of words on the computer that"},{"from":1377.53,"to":1381.12,"location":2,"content":"commonly that turn to something that was a bit like a dictionary."},{"from":1381.12,"to":1386.2,"location":2,"content":"In particular favorite online thing was this online thesaurus called WordNet which"},{"from":1386.2,"to":1391.51,"location":2,"content":"sort of tells you about word meanings and relationships between word meanings."},{"from":1391.51,"to":1396.44,"location":2,"content":"Um, so this is just giving you the very slices sense of,"},{"from":1396.44,"to":1399.82,"location":2,"content":"um, of what's in WordNet."},{"from":1399.82,"to":1404.48,"location":2,"content":"Um, so this is an actual bit of Python code up there which you can,"},{"from":1404.48,"to":1408.37,"location":2,"content":"um, type into your computer and run and do this for yourself."},{"from":1408.37,"to":1411.04,"location":2,"content":"Um, so this uses a thing called NLTK."},{"from":1411.04,"to":1413.72,"location":2,"content":"Um, so NLTK is sort of like"},{"from":1413.72,"to":1419.36,"location":2,"content":"the \"Swiss Army Knife of NLP\" meaning that it's not terribly good for anything,"},{"from":1419.36,"to":1421.57,"location":2,"content":"but it has a lot of basic tools."},{"from":1421.57,"to":1426.46,"location":2,"content":"So, if you wanted to do something like just get some stuff out of WordNet and show it,"},{"from":1426.46,"to":1429.63,"location":2,"content":"it's the perfect thing to use. Um, okay."},{"from":1429.63,"to":1434.83,"location":2,"content":"So, um, from NLTK I'm importing WordNet and so then I can say,"},{"from":1434.83,"to":1441.36,"location":2,"content":"\"Okay, um, for the word good tell me about the synonym sets with good participates in.\""},{"from":1441.36,"to":1443.44,"location":2,"content":"And there's good goodness as a noun."},{"from":1443.44,"to":1444.76,"location":2,"content":"There is an adjective good."},{"from":1444.76,"to":1448.33,"location":2,"content":"There's one estimable good, honorable, respectable."},{"from":1448.33,"to":1451.15,"location":2,"content":"Um, this looks really complex and hard to understand."},{"from":1451.15,"to":1453.7,"location":2,"content":"But the idea of word- WordNet makes"},{"from":1453.7,"to":1458.08,"location":2,"content":"these very fine grain distinctions between senses of a word."},{"from":1458.08,"to":1460.67,"location":2,"content":"So, what sort of saying for good, um,"},{"from":1460.67,"to":1463.57,"location":2,"content":"there's what some sensors where it's a noun, right?"},{"from":1463.57,"to":1464.76,"location":2,"content":"That's where you sort of,"},{"from":1464.76,"to":1467.2,"location":2,"content":"I bought some goods for my trip, right?"},{"from":1467.2,"to":1468.88,"location":2,"content":"So, that's sort of, um,"},{"from":1468.88,"to":1472.78,"location":2,"content":"one of these noun sensors like this one I guess."},{"from":1472.78,"to":1475.48,"location":2,"content":"Um, then there are adjective sensors and it's trying to"},{"from":1475.48,"to":1478.84,"location":2,"content":"distinguish- there's a basic adjective sense of good being good,"},{"from":1478.84,"to":1481.27,"location":2,"content":"and then in certain, um, sensors,"},{"from":1481.27,"to":1484.75,"location":2,"content":"there are these extended sensors of good in different directions."},{"from":1484.75,"to":1488.52,"location":2,"content":"So, I guess this is good in the sense of beneficial, um,"},{"from":1488.52,"to":1492.92,"location":2,"content":"and this one is sort of person who is respectable or something."},{"from":1492.92,"to":1495.58,"location":2,"content":"He's a good man or something like that, right?"},{"from":1495.58,"to":1496.86,"location":2,"content":"So, um, but you know,"},{"from":1496.86,"to":1499.66,"location":2,"content":"part of what's kind of makes us"},{"from":1499.66,"to":1502.63,"location":2,"content":"think very problematic and practice to use is it tries to make"},{"from":1502.63,"to":1506.85,"location":2,"content":"all these very fine-grain differences between sensors that are a human being can"},{"from":1506.85,"to":1511.41,"location":2,"content":"barely understand the difference between them um, and relate to."},{"from":1511.41,"to":1513.69,"location":2,"content":"Um, so you can then do other things with WordNet."},{"from":1513.69,"to":1518.46,"location":2,"content":"So, this bit of code you can sort of well walk up and is a kind of hierarchy."},{"from":1518.46,"to":1521.63,"location":2,"content":"So, it's kinda like a traditional, um, database."},{"from":1521.63,"to":1529.03,"location":2,"content":"So, if I start with a panda and say- [NOISE] if I start with a panda."},{"from":1529.03,"to":1532.18,"location":2,"content":"Um, and walk up, um,"},{"from":1532.18,"to":1535.33,"location":2,"content":"the pandas are [inaudible]."},{"from":1535.33,"to":1537.64,"location":2,"content":"Maybe you'd guys to bio which are carnivores,"},{"from":1537.64,"to":1539.55,"location":2,"content":"placentals, mammals, blah, blah, blah."},{"from":1539.55,"to":1544.13,"location":2,"content":"Okay, so, um, that's the kind of stuff you can get out to- out of WordNet."},{"from":1544.13,"to":1547.11,"location":2,"content":"Um, you know, in practice WordNet has been."},{"from":1547.11,"to":1549.58,"location":2,"content":"Everyone sort of used to use it because it gave"},{"from":1549.58,"to":1551.99,"location":2,"content":"you some sort of sense of the meaning of the word."},{"from":1551.99,"to":1554.13,"location":2,"content":"But you know it's also sort of well-known."},{"from":1554.13,"to":1556.54,"location":2,"content":"It never worked that well."},{"from":1556.54,"to":1562.72,"location":2,"content":"Um, so you know that sort of the synonym sets miss a lot of nuance."},{"from":1562.72,"to":1565.27,"location":2,"content":"So, you know one of the synonym sets for good has"},{"from":1565.27,"to":1568.24,"location":2,"content":"proficient in it and good sort of like proficient"},{"from":1568.24,"to":1571.49,"location":2,"content":"but doesn't proficient have some more connotations and nuance?"},{"from":1571.49,"to":1573.25,"location":2,"content":"I think it does."},{"from":1573.25,"to":1578.08,"location":2,"content":"Um, WordNet like most hand built resources is sort of very incomplete."},{"from":1578.08,"to":1581.29,"location":2,"content":"So, as soon as you're coming to new meanings of words,"},{"from":1581.29,"to":1583.7,"location":2,"content":"or new words and slang words,"},{"from":1583.7,"to":1585.31,"location":2,"content":"well then, that gives you nothing."},{"from":1585.31,"to":1588.98,"location":2,"content":"Um, it's sort of built with human labor,"},{"from":1588.98,"to":1595.03,"location":2,"content":"um, in ways that you know it's hard to sort of create and adapt."},{"from":1595.03,"to":1597.67,"location":2,"content":"And in particular, what we want to focus on is,"},{"from":1597.67,"to":1601.87,"location":2,"content":"seems like a basic thing you'd like to do with words and it's actually at least"},{"from":1601.87,"to":1605.92,"location":2,"content":"understand similarities and relations between the meaning of words."},{"from":1605.92,"to":1609.52,"location":2,"content":"And it turns out that you know WordNet doesn't actually do that that well"},{"from":1609.52,"to":1613.6,"location":2,"content":"because it just has these sort of fixed discrete synonym sets."},{"from":1613.6,"to":1616.09,"location":2,"content":"So, if you have a words in a synonym said that there's"},{"from":1616.09,"to":1619.08,"location":2,"content":"sort of a synonym and maybe not exactly the same meaning,"},{"from":1619.08,"to":1620.8,"location":2,"content":"they're not in the same synonyms set,"},{"from":1620.8,"to":1624.58,"location":2,"content":"you kind of can't really measure the partial resemblance as a meaning for them."},{"from":1624.58,"to":1628.43,"location":2,"content":"So, if something like good and marvelous aren't in the same synonym set,"},{"from":1628.43,"to":1631.96,"location":2,"content":"but there's something that they share in common that you'd like to represent."},{"from":1631.96,"to":1636.88,"location":2,"content":"Okay. So, um, that's kinda turn to lead into"},{"from":1636.88,"to":1641.93,"location":2,"content":"us wanting to do something different and better for word meaning."},{"from":1641.93,"to":1645.73,"location":2,"content":"And, um, before getting there I just sort of wanna again sort"},{"from":1645.73,"to":1649.49,"location":2,"content":"of build a little from traditional NLP."},{"from":1649.49,"to":1653.28,"location":2,"content":"So, traditional NLP in the context of this course sort of means"},{"from":1653.28,"to":1659.28,"location":2,"content":"Natural Language Processing up until approximately 2012."},{"from":1659.28,"to":1663.64,"location":2,"content":"There were some earlier antecedents but as basically, um,"},{"from":1663.64,"to":1667.6,"location":2,"content":"in 2013 that things really began to change with"},{"from":1667.6,"to":1673.06,"location":2,"content":"people starting to use neural net style representations for natural language processing."},{"from":1673.06,"to":1675.43,"location":2,"content":"So, up until 2012,"},{"from":1675.43,"to":1678.06,"location":2,"content":"um, standardly you know we had words."},{"from":1678.06,"to":1682.21,"location":2,"content":"They are just words. So, we had hotel conference motel."},{"from":1682.21,"to":1686.65,"location":2,"content":"They were words, and we'd have you know lexicons and put words into our model."},{"from":1686.65,"to":1692.29,"location":2,"content":"Um, and in neural networks land this is referred to as a localist representation."},{"from":1692.29,"to":1694.96,"location":2,"content":"I'll come back to those terms again next time."},{"from":1694.96,"to":1700.02,"location":2,"content":"But that's sort of meaning that for any concept there's sort of one particular,"},{"from":1700.02,"to":1704.08,"location":2,"content":"um, place which is the word hotel or the word motel."},{"from":1704.08,"to":1706.46,"location":2,"content":"A way of thinking about that is to think"},{"from":1706.46,"to":1709.62,"location":2,"content":"about what happens when you build a machine learning model."},{"from":1709.62,"to":1714.76,"location":2,"content":"So, if you have a categorical variable like you have words with the choice of word"},{"from":1714.76,"to":1720.13,"location":2,"content":"and you want to stick that into some kind of classifier in a Machine Learning Model,"},{"from":1720.13,"to":1722.9,"location":2,"content":"somehow you have to code that categorical variable,"},{"from":1722.9,"to":1726.55,"location":2,"content":"and the standard way of doing it is that you code it by having"},{"from":1726.55,"to":1731.28,"location":2,"content":"different levels of the variable which means that you have a vector,"},{"from":1731.28,"to":1733.84,"location":2,"content":"and you have, this is the word house."},{"from":1733.84,"to":1735.67,"location":2,"content":"This is the word cat. This is the word dog."},{"from":1735.67,"to":1737.02,"location":2,"content":"This is the word some chairs."},{"from":1737.02,"to":1738.19,"location":2,"content":"This is the word agreeable."},{"from":1738.19,"to":1739.46,"location":2,"content":"This is the word something else."},{"from":1739.46,"to":1741.41,"location":2,"content":"This is the word, um,"},{"from":1741.41,"to":1745.75,"location":2,"content":"hotel, um, and this is another word for something different, right?"},{"from":1745.75,"to":1748.08,"location":2,"content":"So that you have put a one at the position"},{"from":1748.08,"to":1751.12,"location":2,"content":"and neural net land we call these one-hot vectors,"},{"from":1751.12,"to":1752.47,"location":2,"content":"and so these might be, ah,"},{"from":1752.47,"to":1756.25,"location":2,"content":"one-hot vectors for hotel and motel."},{"from":1756.25,"to":1759.04,"location":2,"content":"So, there are a couple of things that are bad here."},{"from":1759.04,"to":1761.01,"location":2,"content":"Um, the one that's sort of, ah,"},{"from":1761.01,"to":1767.14,"location":2,"content":"practical nuisance is you know languages have a lot of words."},{"from":1767.14,"to":1770.59,"location":2,"content":"Ah, so, it's sort of one of those dictionaries that you might have still had in"},{"from":1770.59,"to":1775.45,"location":2,"content":"school that you probably have about 250,000 words in them."},{"from":1775.45,"to":1777.4,"location":2,"content":"But you know, if you start getting into"},{"from":1777.4,"to":1781.86,"location":2,"content":"more technical and scientific English it's easy to get to a million words."},{"from":1781.86,"to":1785.69,"location":2,"content":"I mean, actually the number of words that you have in a language, um,"},{"from":1785.69,"to":1788.62,"location":2,"content":"like English is actually infinite because we have"},{"from":1788.62,"to":1792.22,"location":2,"content":"these processes which are called derivational morphology,"},{"from":1792.22,"to":1796.93,"location":2,"content":"um, where you can make more words by adding endings onto existing words."},{"from":1796.93,"to":1799.66,"location":2,"content":"So, you know you can start with something like paternalist,"},{"from":1799.66,"to":1803.47,"location":2,"content":"fatherly, and then you can sort of say from maternal,"},{"from":1803.47,"to":1806.28,"location":2,"content":"you can say paternalist, or paternalistic,"},{"from":1806.28,"to":1810.07,"location":2,"content":"paternalism and pa- I did it paternalistically."},{"from":1810.07,"to":1814.26,"location":2,"content":"Right? Now all of these ways that you can bake bigger words by adding more stuff into it."},{"from":1814.26,"to":1818.9,"location":2,"content":"Um, and so really you end up with an infinite space of words."},{"from":1818.9,"to":1822.88,"location":2,"content":"Um, yeah. So that's a minor problem, right?"},{"from":1822.88,"to":1828.28,"location":2,"content":"We have very big vectors if we want to represent a sensible size vocabulary."},{"from":1828.28,"to":1831.99,"location":2,"content":"Um, but there's a much bigger problem than that, which is, well,"},{"from":1831.99,"to":1835.2,"location":2,"content":"precisely what we want to do all the time, is we want to,"},{"from":1835.2,"to":1838.59,"location":2,"content":"sort of, understand relationships and the meaning of words."},{"from":1838.59,"to":1842.38,"location":2,"content":"So, you know, an obvious example of this is web search."},{"from":1842.38,"to":1845.35,"location":2,"content":"So, if I do a search for Seattle motel,"},{"from":1845.35,"to":1848.71,"location":2,"content":"it'd be useful if it also showed me results that had"},{"from":1848.71,"to":1852.65,"location":2,"content":"Seattle hotel on the page and vice versa because,"},{"from":1852.65,"to":1855.41,"location":2,"content":"you know, hotels and motels pretty much the same thing."},{"from":1855.41,"to":1859.9,"location":2,"content":"Um, but, you know, if we have these one-hot vectors like we had before they have"},{"from":1859.9,"to":1864.25,"location":2,"content":"no s- similarity relationship between them, right?"},{"from":1864.25,"to":1865.67,"location":2,"content":"So, in math terms,"},{"from":1865.67,"to":1867.78,"location":2,"content":"these two vectors are orthogonal."},{"from":1867.78,"to":1870.87,"location":2,"content":"No similarity relationship between them."},{"from":1870.87,"to":1872.65,"location":2,"content":"Um, and so you,"},{"from":1872.65,"to":1874.7,"location":2,"content":"kind of, get nowhere."},{"from":1874.7,"to":1876.88,"location":2,"content":"Now, you know, there are things that you could do,"},{"from":1876.88,"to":1878.71,"location":2,"content":"I- I just showed you WordNet's."},{"from":1878.71,"to":1880.84,"location":2,"content":"WordNet's shows you some synonyms and stuff."},{"from":1880.84,"to":1882.61,"location":2,"content":"So that might help a bit."},{"from":1882.61,"to":1884.04,"location":2,"content":"There are other things you could do."},{"from":1884.04,"to":1885.41,"location":2,"content":"You could sort of say, well wait,"},{"from":1885.41,"to":1889.64,"location":2,"content":"why don't we just build up a big table where we have a big table of,"},{"from":1889.64,"to":1892.67,"location":2,"content":"um, word similarities, and we could work with that."},{"from":1892.67,"to":1894.91,"location":2,"content":"And, you know, people used to try and do that, right?"},{"from":1894.91,"to":1899.77,"location":2,"content":"You know, that's sort of what Google did in 2005 or something."},{"from":1899.77,"to":1902.08,"location":2,"content":"You know, it had word similarity tables."},{"from":1902.08,"to":1904.51,"location":2,"content":"The problem with doing that is you know,"},{"from":1904.51,"to":1908.29,"location":2,"content":"we were talking about how maybe we want 500,000 words."},{"from":1908.29,"to":1912.04,"location":2,"content":"And if you want to build up then a word similarity table out"},{"from":1912.04,"to":1916.06,"location":2,"content":"of our pairs of words from one-hot representations,"},{"from":1916.06,"to":1918.64,"location":2,"content":"um, you- that means that the size of that table,"},{"from":1918.64,"to":1920.38,"location":2,"content":"as my math is pretty bad,"},{"from":1920.38,"to":1922.32,"location":2,"content":"is it 2.5 trillion?"},{"from":1922.32,"to":1927.13,"location":2,"content":"It's some very big number of cells in your similarity, um, matrix."},{"from":1927.13,"to":1929.23,"location":2,"content":"So that's almost impossible to do."},{"from":1929.23,"to":1933.71,"location":2,"content":"So, what we're gonna instead do is explore a method in which,"},{"from":1933.71,"to":1936.67,"location":2,"content":"um, we are going to represent words as vectors,"},{"from":1936.67,"to":1938.14,"location":2,"content":"in a way I'll show you just, um,"},{"from":1938.14,"to":1941.77,"location":2,"content":"a minute in such a way that just the representation of"},{"from":1941.77,"to":1946.48,"location":2,"content":"a word gives you their similarity with no further work."},{"from":1946.48,"to":1950.63,"location":2,"content":"Okay. And so that's gonna lead into these different ideas."},{"from":1950.63,"to":1954.17,"location":2,"content":"So, I mentioned before denotational semantics."},{"from":1954.17,"to":1959.12,"location":2,"content":"Here's another idea for representing the meaning of words,"},{"from":1959.12,"to":1961.98,"location":2,"content":"um, which is called distributional semantics."},{"from":1961.98,"to":1965.14,"location":2,"content":"And so the idea of distributional semantics is, well,"},{"from":1965.14,"to":1970.9,"location":2,"content":"how are we going to represent the meaning of a word is by looking at the contexts,"},{"from":1970.9,"to":1972.92,"location":2,"content":"um, in which it appears."},{"from":1972.92,"to":1976.51,"location":2,"content":"So, this is a picture of JR Firth who was a British linguist."},{"from":1976.51,"to":1978.4,"location":2,"content":"Um, he's famous for this saying,"},{"from":1978.4,"to":1981.54,"location":2,"content":"\"You shall know a word by the company it keeps.\""},{"from":1981.54,"to":1986.95,"location":2,"content":"Um, but another person who's very famous for developing this notion of meaning is, um,"},{"from":1986.95,"to":1990.67,"location":2,"content":"the philosopher Ludwig- Ludwig Wittgenstein in his later writings,"},{"from":1990.67,"to":1993.44,"location":2,"content":"which he referred to as a use theory of meeting- meaning."},{"from":1993.44,"to":1996.07,"location":2,"content":"Well, actually he's- he used some big German word that I don't know,"},{"from":1996.07,"to":1998.53,"location":2,"content":"but, um, we'll call it a use theory of meaning."},{"from":1998.53,"to":2002.54,"location":2,"content":"And, you know, essentially the point was, well, you know,"},{"from":2002.54,"to":2006.78,"location":2,"content":"if you can explain every- if- if you can"},{"from":2006.78,"to":2011.16,"location":2,"content":"explain what contexts it's correct to use a certain word,"},{"from":2011.16,"to":2014.6,"location":2,"content":"versus in what contexts would be the wrong word to use,"},{"from":2014.6,"to":2018.13,"location":2,"content":"this maybe gives you bad memories of doing English in high school,"},{"from":2018.13,"to":2020.49,"location":2,"content":"when people said, ah, that's the wrong word to use there,"},{"from":2020.49,"to":2023.2,"location":2,"content":"um, well, then you understand the meaning of the word, right?"},{"from":2023.2,"to":2027.05,"location":2,"content":"Um, and so that's the idea of distributional semantics."},{"from":2027.05,"to":2029.79,"location":2,"content":"And it's been- so one of the most successful ideas in"},{"from":2029.79,"to":2034.01,"location":2,"content":"modern statistical NLP because it gives you a great way to learn about word meaning."},{"from":2034.01,"to":2036.62,"location":2,"content":"And so what we're gonna do is we're going to say,"},{"from":2036.62,"to":2038.92,"location":2,"content":"haha, I want to know what the word banking means."},{"from":2038.92,"to":2041.73,"location":2,"content":"So, I'm gonna grab a lot of texts,"},{"from":2041.73,"to":2044.52,"location":2,"content":"which is easy to do now when we have the World Wide Web,"},{"from":2044.52,"to":2047.95,"location":2,"content":"I'll find lots of sentences where the word banking is used,"},{"from":2047.95,"to":2052.77,"location":2,"content":"Government debt problems turning into banking crises as happened in 2009."},{"from":2052.77,"to":2055.84,"location":2,"content":"And both these- I'm just going to say all of"},{"from":2055.84,"to":2059.11,"location":2,"content":"this stuff is the meaning of the word banking."},{"from":2059.11,"to":2063.75,"location":2,"content":"Um, that those are the contexts in which the word banking is used."},{"from":2063.75,"to":2069.49,"location":2,"content":"And that seems like very simple and perhaps even not quite right idea,"},{"from":2069.49,"to":2074.88,"location":2,"content":"but it turns out to be a very usable idea that does a great job at capturing meaning."},{"from":2074.88,"to":2078.3,"location":2,"content":"And so what we're gonna do is say rather than"},{"from":2078.3,"to":2082.95,"location":2,"content":"our old localist representation we're now gonna"},{"from":2082.95,"to":2088.22,"location":2,"content":"represent words in what we call a distributed representation."},{"from":2088.22,"to":2091.83,"location":2,"content":"And so, for the distributed representation we're still going"},{"from":2091.83,"to":2095.66,"location":2,"content":"to [NOISE] represent the meaning of a word as a numeric vector."},{"from":2095.66,"to":2099.48,"location":2,"content":"But now we're going to say that the meaning of each word is,"},{"from":2099.48,"to":2101.52,"location":2,"content":"ah, smallish vector, um,"},{"from":2101.52,"to":2107.76,"location":2,"content":"but it's going to be a dense vector where by all of the numbers are non-zero."},{"from":2107.76,"to":2110.01,"location":2,"content":"So the meaning of banking is going to be"},{"from":2110.01,"to":2113.34,"location":2,"content":"distributed over the dim- dimensions of this vector."},{"from":2113.34,"to":2119.19,"location":2,"content":"Um, now, my vector here is of dimension nine because I want to keep the slide, um, nice."},{"from":2119.19,"to":2123.2,"location":2,"content":"Um, life isn't quite that good in practice."},{"from":2123.2,"to":2125.97,"location":2,"content":"When we do this we use a larger dimensionality,"},{"from":2125.97,"to":2129.07,"location":2,"content":"kinda, solid the minimum that people use is 50."},{"from":2129.07,"to":2132.33,"location":2,"content":"Um, a typical number that you might use on your laptop is"},{"from":2132.33,"to":2135.95,"location":2,"content":"300 if you want to really max out performance,"},{"from":2135.95,"to":2138.89,"location":2,"content":"um, maybe 1,000, 2,000, 4,000."},{"from":2138.89,"to":2142.02,"location":2,"content":"But, you know, nevertheless [NOISE] orders of magnitude is"},{"from":2142.02,"to":2146.81,"location":2,"content":"smaller compared to a length 500,000 vector."},{"from":2146.81,"to":2151.89,"location":2,"content":"Okay. So we have words with their vector representations."},{"from":2151.89,"to":2155.79,"location":2,"content":"And so since each word is going to have a vector, um,"},{"from":2155.79,"to":2161.16,"location":2,"content":"representation we then have a vector space in which we can place all of the words."},{"from":2161.16,"to":2163.98,"location":2,"content":"Um, and that's completely unreadable, um,"},{"from":2163.98,"to":2168.14,"location":2,"content":"but if you zoom into the vector space it's still completely unreadable."},{"from":2168.14,"to":2170.11,"location":2,"content":"But if you zoom in a bit further,"},{"from":2170.11,"to":2173.1,"location":2,"content":"um, you can find different parts of this space."},{"from":2173.1,"to":2176.82,"location":2,"content":"So here's the part that where countries attending to,"},{"from":2176.82,"to":2178.95,"location":2,"content":"um, exist Japanese, German,"},{"from":2178.95,"to":2181.95,"location":2,"content":"French, Russian, British Australian American,"},{"from":2181.95,"to":2185.13,"location":2,"content":"um, France, Britain, Germany et cetera."},{"from":2185.13,"to":2187.77,"location":2,"content":"And you can shift over to a different part of the space."},{"from":2187.77,"to":2191.04,"location":2,"content":"So here's a part of the space where various verbs are,"},{"from":2191.04,"to":2193.49,"location":2,"content":"so has have, had, been, be."},{"from":2193.49,"to":2200.88,"location":2,"content":"Oops. Um, um, [inaudible] be always was where."},{"from":2200.88,"to":2203.97,"location":2,"content":"You can even see that some morphological forms are grouping together,"},{"from":2203.97,"to":2206.1,"location":2,"content":"and things that sort of go together like say,"},{"from":2206.1,"to":2208.77,"location":2,"content":"think expect to things that take those, kind of, compliment."},{"from":2208.77,"to":2210.8,"location":2,"content":"He said or thought something."},{"from":2210.8,"to":2212.41,"location":2,"content":"Um, they group together."},{"from":2212.41,"to":2215.01,"location":2,"content":"Now, what am I actually showing you here?"},{"from":2215.01,"to":2217.76,"location":2,"content":"Um, you know, really this was built from,"},{"from":2217.76,"to":2220.57,"location":2,"content":"ah, 100 dimensional word vectors."},{"from":2220.57,"to":2225.63,"location":2,"content":"And there is this problem is really hard to visualize 100 dimensional word vectors."},{"from":2225.63,"to":2229.86,"location":2,"content":"So, what is actually happening here is these, um,"},{"from":2229.86,"to":2235.11,"location":2,"content":"100 dimensional word vectors are being projected down into two-dimensions,"},{"from":2235.11,"to":2237.99,"location":2,"content":"and you're so- seeing the two-dimensional view,"},{"from":2237.99,"to":2239.79,"location":2,"content":"which I'll get back to later."},{"from":2239.79,"to":2242.4,"location":2,"content":"Um, so, on the one hand, um,"},{"from":2242.4,"to":2244.41,"location":2,"content":"whenever you see these pictures you should hold on to"},{"from":2244.41,"to":2246.84,"location":2,"content":"the your wallet because there's a huge amount of"},{"from":2246.84,"to":2251.53,"location":2,"content":"detail on the original vector space that got completely killed and went away, um,"},{"from":2251.53,"to":2252.84,"location":2,"content":"in the 2D projection,"},{"from":2252.84,"to":2257.07,"location":2,"content":"and indeed some of what push things together in the 2D,"},{"from":2257.07,"to":2259.88,"location":2,"content":"um, projection may really, really,"},{"from":2259.88,"to":2262.59,"location":2,"content":"really misrepresent what's in the original space."},{"from":2262.59,"to":2265.74,"location":2,"content":"Um, but even looking at these 2D representations,"},{"from":2265.74,"to":2266.85,"location":2,"content":"the overall feeling is,"},{"from":2266.85,"to":2268.92,"location":2,"content":"my gosh this actually sort of works, doesn't it?"},{"from":2268.92,"to":2274.36,"location":2,"content":"Um, we can sort of see similarities, um, between words."},{"from":2274.36,"to":2282.38,"location":2,"content":"Okay. So, um, ha- so that was the idea of what we want to do."},{"from":2282.38,"to":2284.31,"location":2,"content":"Um, the next part, um,"},{"from":2284.31,"to":2287.94,"location":2,"content":"is then how do we actually go about doing it?"},{"from":2287.94,"to":2290.45,"location":2,"content":"I'll pause for breath for half a minute."},{"from":2290.45,"to":2292.71,"location":2,"content":"Has anyone got a question they're dying to ask?"},{"from":2292.71,"to":2300.3,"location":2,"content":"[NOISE] Yeah."},{"from":2300.3,"to":2306.72,"location":2,"content":"Where were the- the vectors is each, um,"},{"from":2306.72,"to":2308.46,"location":2,"content":"had a different order in each contact,"},{"from":2308.46,"to":2310.53,"location":2,"content":"like, say the first decimal vector,"},{"from":2310.53,"to":2312.84,"location":2,"content":"second decimal vector, are those standard"},{"from":2312.84,"to":2315.47,"location":2,"content":"across all theory or people choose them themselves?"},{"from":2315.47,"to":2322.34,"location":2,"content":"Um, they're not standards across NLP um and they're not chosen at all."},{"from":2322.34,"to":2325.05,"location":2,"content":"So what we're gonna present is a learning algorithm."},{"from":2325.05,"to":2328.43,"location":2,"content":"So where we just sort of shuffle in lots of text"},{"from":2328.43,"to":2331.97,"location":2,"content":"and miraculously these word vectors come out."},{"from":2331.97,"to":2337.76,"location":2,"content":"And so the l- learning algorithm itself decides the dimensions."},{"from":2337.76,"to":2343.09,"location":2,"content":"But um that actually reminds me of something I sort of meant to say which was yeah,"},{"from":2343.09,"to":2345.43,"location":2,"content":"I mean, since this is a vector space,"},{"from":2345.43,"to":2349.58,"location":2,"content":"in some sense the dimensions over the arbitrary right,"},{"from":2349.58,"to":2352.57,"location":2,"content":"because you can you know just have your basis vectors in"},{"from":2352.57,"to":2355.95,"location":2,"content":"any different direction and you could sort of re-represent,"},{"from":2355.95,"to":2359.72,"location":2,"content":"um the words in the vector space with a different set of basics,"},{"from":2359.72,"to":2362.93,"location":2,"content":"basis vectors and it'd be exactly the same vector space"},{"from":2362.93,"to":2366.38,"location":2,"content":"just sort of rotate around to your new um, vectors."},{"from":2366.38,"to":2370.58,"location":2,"content":"So, you know, you shouldn't read too much into the sort of elements."},{"from":2370.58,"to":2372.86,"location":2,"content":"So, it actually turns out that because of the way a lot of"},{"from":2372.86,"to":2376.07,"location":2,"content":"deep learning um operations work,"},{"from":2376.07,"to":2378.17,"location":2,"content":"some things they do, do element-wise."},{"from":2378.17,"to":2382.78,"location":2,"content":"So that the dimensions do actually tend to get some meaning to them it turns out."},{"from":2382.78,"to":2386.9,"location":2,"content":"But um, though I think I really wanted to say was,"},{"from":2386.9,"to":2392.24,"location":2,"content":"that you know one thing we can just think of is how close things"},{"from":2392.24,"to":2394.25,"location":2,"content":"are in the vector space and that's"},{"from":2394.25,"to":2397.8,"location":2,"content":"a notion of meaning similarity that we are going to exploit."},{"from":2397.8,"to":2400.64,"location":2,"content":"But you might hope that you get more than that,"},{"from":2400.64,"to":2403.01,"location":2,"content":"and you might actually think that there's meaning in"},{"from":2403.01,"to":2406.93,"location":2,"content":"different dimensions and directions in the word vector space."},{"from":2406.93,"to":2411.34,"location":2,"content":"And the answer to that is there is and I'll come back to that a bit later."},{"from":2411.34,"to":2417.77,"location":2,"content":"Okay. Um, so in some sense this thing that had"},{"from":2417.77,"to":2422.24,"location":2,"content":"the biggest impact um in sort of turning the world of"},{"from":2422.24,"to":2427.63,"location":2,"content":"NLP in a neural networks direction was that picture."},{"from":2427.63,"to":2432.26,"location":2,"content":"Um, was this um algorithm that um"},{"from":2432.26,"to":2437.34,"location":2,"content":"Thomas Mikolov came up with in 2013 called the word2vec algorithm."},{"from":2437.34,"to":2443.21,"location":2,"content":"So it wasn't the first work and having distributed representations of words."},{"from":2443.21,"to":2445.73,"location":2,"content":"So there was older work from Yoshua Bengio that went"},{"from":2445.73,"to":2448.37,"location":2,"content":"back to about the sort of turn on the millennium,"},{"from":2448.37,"to":2452.78,"location":2,"content":"that somehow it's sort of hadn't really sort of hit the world over their head and had"},{"from":2452.78,"to":2457.73,"location":2,"content":"a huge impact and has really sort of Thomas Mikolov showed this very simple,"},{"from":2457.73,"to":2460.07,"location":2,"content":"very scalable way of learning"},{"from":2460.07,"to":2465.01,"location":2,"content":"vector representations of um words and that sort of really opened the flood gates."},{"from":2465.01,"to":2468.65,"location":2,"content":"And so that's the algorithm that I'm going to um show now."},{"from":2468.65,"to":2475.78,"location":2,"content":"Okay. So the idea of this algorithm is you start with a big pile of text."},{"from":2475.78,"to":2480.65,"location":2,"content":"Um, so wherever you find you know web pages on newspaper articles or something,"},{"from":2480.65,"to":2482.48,"location":2,"content":"a lot of continuous text, right?"},{"from":2482.48,"to":2486.35,"location":2,"content":"Actual sentences because we want to learn wo- word meaning context."},{"from":2486.35,"to":2492.47,"location":2,"content":"Um, NLP people call a large pile of text a corpus."},{"from":2492.47,"to":2495.89,"location":2,"content":"And I mean that's just the Latin word for body, right?"},{"from":2495.89,"to":2497.91,"location":2,"content":"It's a body of text."},{"from":2497.91,"to":2503.22,"location":2,"content":"Important things to note if you want to seem really educated is in Latin,"},{"from":2503.22,"to":2506.69,"location":2,"content":"this is a fourth declensions noun."},{"from":2506.69,"to":2509.9,"location":2,"content":"So the plural of corpus is corpora."},{"from":2509.9,"to":2511.19,"location":2,"content":"And whereas if you say"},{"from":2511.19,"to":2515.39,"location":2,"content":"core Pi everyone will know that you didn't study Latin in high school."},{"from":2515.39,"to":2520.49,"location":2,"content":"[LAUGHTER] Um, okay."},{"from":2520.49,"to":2526.46,"location":2,"content":"Um, so right- so we then want to say that every word um"},{"from":2526.46,"to":2528.89,"location":2,"content":"in a- in a fixed vocabulary which would just be"},{"from":2528.89,"to":2532.45,"location":2,"content":"the vocabulary the corpus is um represented by a vector."},{"from":2532.45,"to":2536.61,"location":2,"content":"And we just start those vectors off as random vectors."},{"from":2536.61,"to":2538.34,"location":2,"content":"And so then what we're going to do is do"},{"from":2538.34,"to":2542.59,"location":2,"content":"this big iterative algorithm where we go through each position in the text."},{"from":2542.59,"to":2544.72,"location":2,"content":"We say, here's a word in the text."},{"from":2544.72,"to":2550.52,"location":2,"content":"Let's look at the words around it and what we're going to want to do is say well,"},{"from":2550.52,"to":2552.89,"location":2,"content":"the meaning of a word is its contexts of use."},{"from":2552.89,"to":2555.29,"location":2,"content":"So we want the representation of the word"},{"from":2555.29,"to":2557.87,"location":2,"content":"in the middle to be able to predict the words that are"},{"from":2557.87,"to":2563.72,"location":2,"content":"around it and so we're gonna achieve that by moving the position of the word vector."},{"from":2563.72,"to":2567.5,"location":2,"content":"And we just repeat that a billion times and"},{"from":2567.5,"to":2571.19,"location":2,"content":"somehow a miracle occurs and outcomes at the end we have"},{"from":2571.19,"to":2574.79,"location":2,"content":"a word vector space that looks like a picture I showed where it has"},{"from":2574.79,"to":2579.53,"location":2,"content":"a good meaning of word meet good representation of word meaning."},{"from":2579.53,"to":2583.09,"location":2,"content":"So slightly more, um,"},{"from":2583.09,"to":2587.24,"location":2,"content":"um, slightly more um graphically right."},{"from":2587.24,"to":2588.44,"location":2,"content":"So here's the situation."},{"from":2588.44,"to":2592.84,"location":2,"content":"So we've got part of our corpus problems turning into banking crisis,"},{"from":2592.84,"to":2594.29,"location":2,"content":"and so what we want to say is well,"},{"from":2594.29,"to":2597.72,"location":2,"content":"we want to know the meaning of the word into and so we're going to hope that"},{"from":2597.72,"to":2601.4,"location":2,"content":"its representation can be used in a way that'll"},{"from":2601.4,"to":2604.82,"location":2,"content":"make precise to predict what words appear in"},{"from":2604.82,"to":2608.6,"location":2,"content":"the context of into because that's the meaning of into."},{"from":2608.6,"to":2611.53,"location":2,"content":"And so we're going to try and make those predictions,"},{"from":2611.53,"to":2614.86,"location":2,"content":"see how well we can predict and then change"},{"from":2614.86,"to":2619.47,"location":2,"content":"the vector representations of words in a way that we can do that prediction better."},{"from":2619.47,"to":2621.32,"location":2,"content":"And then once we've dealt with into,"},{"from":2621.32,"to":2623.76,"location":2,"content":"we just go onto the next word and we say,"},{"from":2623.76,"to":2626.06,"location":2,"content":"okay, let's take banking as the word."},{"from":2626.06,"to":2629.8,"location":2,"content":"The meaning of banking is predicting the contexts in which banking occurs."},{"from":2629.8,"to":2631.26,"location":2,"content":"Here's one context."},{"from":2631.26,"to":2634.55,"location":2,"content":"Let's try and predict these words that occur around banking and"},{"from":2634.55,"to":2638.74,"location":2,"content":"see how we do and then we'll move on again from there."},{"from":2638.74,"to":2642.47,"location":2,"content":"Okay. Um, sounds easy so far."},{"from":2642.47,"to":2646.1,"location":2,"content":"Um, [NOISE] now we go on and sort of do a bit more stuff."},{"from":2646.1,"to":2652.46,"location":2,"content":"Okay. So overall, we have a big long corpus of capital T words."},{"from":2652.46,"to":2657.13,"location":2,"content":"So if we have a whole lot of documents we just concatenate them all together and we say,"},{"from":2657.13,"to":2659.01,"location":2,"content":"okay, here's a billion words,"},{"from":2659.01,"to":2661.74,"location":2,"content":"and so big long list of words."},{"from":2661.74,"to":2663.3,"location":2,"content":"And so what we're gonna do,"},{"from":2663.3,"to":2666.88,"location":2,"content":"is for the first um product we're going to sort of"},{"from":2666.88,"to":2670.95,"location":2,"content":"go through all the words and then for the second product,"},{"from":2670.95,"to":2674.63,"location":2,"content":"we're gonna say- we're gonna choose some fixed size window, you know,"},{"from":2674.63,"to":2677.99,"location":2,"content":"it might be five words on each side or something and we're going to try and"},{"from":2677.99,"to":2682.01,"location":2,"content":"predict the 10 words that are around that center word."},{"from":2682.01,"to":2684.2,"location":2,"content":"And we're going to predict in the sense of trying to"},{"from":2684.2,"to":2686.78,"location":2,"content":"predict that word given the center word."},{"from":2686.78,"to":2688.46,"location":2,"content":"That's our probability model."},{"from":2688.46,"to":2691.18,"location":2,"content":"And so if we multiply all those things together,"},{"from":2691.18,"to":2694.61,"location":2,"content":"that's our model likelihood is how good a job it"},{"from":2694.61,"to":2698.38,"location":2,"content":"does at predicting the words around every word."},{"from":2698.38,"to":2701.6,"location":2,"content":"And that model likelihood is going to depend"},{"from":2701.6,"to":2705.18,"location":2,"content":"on the parameters of our model which we write as theta."},{"from":2705.18,"to":2707.86,"location":2,"content":"And in this particular model,"},{"from":2707.86,"to":2710.69,"location":2,"content":"the only parameters in it is actually"},{"from":2710.69,"to":2713.81,"location":2,"content":"going to be the vector representations we give the words."},{"from":2713.81,"to":2716.95,"location":2,"content":"The model has absolutely no other parameters to it."},{"from":2716.95,"to":2720.05,"location":2,"content":"So, we're just going to say we're representing"},{"from":2720.05,"to":2723.7,"location":2,"content":"a word with a vector in a vector space and that"},{"from":2723.7,"to":2727.88,"location":2,"content":"representation of it is its meaning and we're then going to be able to"},{"from":2727.88,"to":2732.34,"location":2,"content":"use that to predict what other words occur in a way I'm about to show you."},{"from":2732.34,"to":2737.24,"location":2,"content":"Okay. So, um, that's our likelihood and so what we do in all of"},{"from":2737.24,"to":2742.28,"location":2,"content":"these models is we sort of define an objective function and then we're going to be,"},{"from":2742.28,"to":2745.88,"location":2,"content":"I want to come up with vector representations of words in"},{"from":2745.88,"to":2750.74,"location":2,"content":"such a way as to minimize our objective function."},{"from":2750.74,"to":2756.38,"location":2,"content":"Um, so objective function is basically the same as what's on the top half of the slide,"},{"from":2756.38,"to":2758.05,"location":2,"content":"but we change a couple of things."},{"from":2758.05,"to":2763.04,"location":2,"content":"We stick a minus sign in front of it so we can do minimization rather than maximization."},{"from":2763.04,"to":2765.51,"location":2,"content":"Completely arbitrary makes no difference."},{"from":2765.51,"to":2768.13,"location":2,"content":"Um, we stick a one and T in front of it,"},{"from":2768.13,"to":2771.8,"location":2,"content":"so that we're working out the sort of average"},{"from":2771.8,"to":2776.15,"location":2,"content":"as of a goodness of predicting for each choice of center word."},{"from":2776.15,"to":2779.36,"location":2,"content":"Again, that sort of makes no difference but it kinda keeps the scale of"},{"from":2779.36,"to":2783.09,"location":2,"content":"things ah not dependent on the size of the corpus."},{"from":2783.09,"to":2787.24,"location":2,"content":"Um, the bit that's actually important is we stick a log in front of"},{"from":2787.24,"to":2791.69,"location":2,"content":"the function that was up there um because it turns out that everything always gets nice."},{"from":2791.69,"to":2793.8,"location":2,"content":"So when you stick logs and find the products"},{"from":2793.8,"to":2796.37,"location":2,"content":"um when you're doing things like optimization."},{"from":2796.37,"to":2798.86,"location":2,"content":"So, when we do that we then got a log of"},{"from":2798.86,"to":2802.43,"location":2,"content":"all these products which will allow us to turn things you know,"},{"from":2802.43,"to":2806.3,"location":2,"content":"into a sums of the log of this probability"},{"from":2806.3,"to":2810.76,"location":2,"content":"and we'll go through that again um in just a minute."},{"from":2810.76,"to":2815.42,"location":2,"content":"Okay. Um, and so if we can mi- if we can change"},{"from":2815.42,"to":2820.86,"location":2,"content":"our vector representations of these words so as to minimize this J of theta,"},{"from":2820.86,"to":2826.54,"location":2,"content":"that means we'll be good at predicting words in the context of another word."},{"from":2826.54,"to":2830.45,"location":2,"content":"So then, that all sounded good but it was all"},{"from":2830.45,"to":2833.96,"location":2,"content":"dependent on having this probability function where you wanna"},{"from":2833.96,"to":2837.02,"location":2,"content":"predict the probability of a word in"},{"from":2837.02,"to":2840.64,"location":2,"content":"the context given the center word and the question is,"},{"from":2840.64,"to":2843.62,"location":2,"content":"how can you possibly do that?"},{"from":2843.62,"to":2848.39,"location":2,"content":"Um, well um, remember what I said is actually our model is just gonna"},{"from":2848.39,"to":2853.66,"location":2,"content":"have vector representations of words and that was the only parameters of the model."},{"from":2853.66,"to":2855.65,"location":2,"content":"Now, that's, that's almost true."},{"from":2855.65,"to":2857.11,"location":2,"content":"It's not quite true."},{"from":2857.11,"to":2859.22,"location":2,"content":"Um, we actually cheat slightly."},{"from":2859.22,"to":2862.4,"location":2,"content":"Since we actually propose two vector representations for"},{"from":2862.4,"to":2866.6,"location":2,"content":"each word and this makes it simpler to do this."},{"from":2866.6,"to":2868.07,"location":2,"content":"Um, you cannot do this,"},{"from":2868.07,"to":2870.62,"location":2,"content":"there are ways to get around it but this is the simplest way to do it."},{"from":2870.62,"to":2874.61,"location":2,"content":"So we have one vector for word when it's the center word that's predicting"},{"from":2874.61,"to":2879.5,"location":2,"content":"other words but we have a second vector for each word when it's a context word,"},{"from":2879.5,"to":2881.22,"location":2,"content":"so that's one of the words in context."},{"from":2881.22,"to":2882.68,"location":2,"content":"So for each word type,"},{"from":2882.68,"to":2886.85,"location":2,"content":"we have these two vectors as center word, as context word."},{"from":2886.85,"to":2892.7,"location":2,"content":"Um, so then we're gonna work out this probability of a word in the context,"},{"from":2892.7,"to":2894.57,"location":2,"content":"given the center word,"},{"from":2894.57,"to":2902.11,"location":2,"content":"purely in terms of these vectors and the way we do it is with this equation right here,"},{"from":2902.11,"to":2905.16,"location":2,"content":"which I'll explain more in just a moment."},{"from":2905.16,"to":2909.65,"location":2,"content":"So we're still on exactly the same situation, right?"},{"from":2909.65,"to":2912.05,"location":2,"content":"That we're wanting to work out probabilities of"},{"from":2912.05,"to":2915.66,"location":2,"content":"words occurring in the context of our center word."},{"from":2915.66,"to":2918.78,"location":2,"content":"So the center word is C and the context words represented with"},{"from":2918.78,"to":2922.37,"location":2,"content":"O and these [inaudible] slide notation but sort of,"},{"from":2922.37,"to":2924.89,"location":2,"content":"we're basically saying there's one kind of"},{"from":2924.89,"to":2927.59,"location":2,"content":"vector for center words is a different kind of vector"},{"from":2927.59,"to":2933.66,"location":2,"content":"for context words and we're gonna work out this probabilistic prediction um,"},{"from":2933.66,"to":2936.47,"location":2,"content":"in terms of these word vectors."},{"from":2936.47,"to":2939.26,"location":2,"content":"Okay. So how can we do that?"},{"from":2939.26,"to":2942.95,"location":2,"content":"Well, the way we do it is with this um,"},{"from":2942.95,"to":2947.87,"location":2,"content":"formula here which is the sort of shape that you see over and over again um,"},{"from":2947.87,"to":2950.3,"location":2,"content":"in deep learning with categorical staff."},{"from":2950.3,"to":2952.67,"location":2,"content":"So for the very center bit of it,"},{"from":2952.67,"to":2957.82,"location":2,"content":"the bit in orange are more the same thing occurs in the um, denominator."},{"from":2957.82,"to":2961.13,"location":2,"content":"What we're doing there is calculating a dot product."},{"from":2961.13,"to":2964.46,"location":2,"content":"So, we're gonna go through the components of our vector and we're gonna"},{"from":2964.46,"to":2968.75,"location":2,"content":"multiply them together and that means if um,"},{"from":2968.75,"to":2972.8,"location":2,"content":"different words have B components of the same sign,"},{"from":2972.8,"to":2975.62,"location":2,"content":"plus or minus, in the same positions,"},{"from":2975.62,"to":2978.92,"location":2,"content":"the dot product will be big and if"},{"from":2978.92,"to":2982.46,"location":2,"content":"they have different signs or one is big and one is small,"},{"from":2982.46,"to":2984.41,"location":2,"content":"the dot product will be a lot smaller."},{"from":2984.41,"to":2988.1,"location":2,"content":"So that orange part directly calculates uh,"},{"from":2988.1,"to":2991.67,"location":2,"content":"sort of a similarity between words where"},{"from":2991.67,"to":2995.34,"location":2,"content":"the similarity is the sort of vectors looking the same, right?"},{"from":2995.34,"to":2997.61,"location":2,"content":"Um, and so that's the heart of it, right?"},{"from":2997.61,"to":3000.13,"location":2,"content":"So we're gonna have words that have similar vectors,"},{"from":3000.13,"to":3004.24,"location":2,"content":"IS close together in the vector space have similar meaning."},{"from":3004.24,"to":3006.58,"location":2,"content":"Um, so for the rest of it- um,"},{"from":3006.58,"to":3010.33,"location":2,"content":"so the next thing we do is take that number and put an X around it."},{"from":3010.33,"to":3012.1,"location":2,"content":"So, um, the exponential has"},{"from":3012.1,"to":3015.3,"location":2,"content":"this nice property that no matter what number you stick into it,"},{"from":3015.3,"to":3017.84,"location":2,"content":"because the dot product might be positive or negative,"},{"from":3017.84,"to":3020.89,"location":2,"content":"it's gonna come out as a positive number and if"},{"from":3020.89,"to":3024.16,"location":2,"content":"we eventually wanna get a probability, um, that's really good."},{"from":3024.16,"to":3028.45,"location":2,"content":"If we have positive numbers and not negative numbers, um, so that's good."},{"from":3028.45,"to":3033.37,"location":2,"content":"Um, then the third part of which is the bid in blue is we wanted to have"},{"from":3033.37,"to":3036.07,"location":2,"content":"probabilities and probabilities are meant to add up to"},{"from":3036.07,"to":3039.97,"location":2,"content":"one and so we do that in the standard, dumbest possible way."},{"from":3039.97,"to":3042.2,"location":2,"content":"We sum up what this quantity is,"},{"from":3042.2,"to":3047.08,"location":2,"content":"that every different word in our vocabulary and we divide through by"},{"from":3047.08,"to":3052.32,"location":2,"content":"it and so that normalizes things and turns them into a probability distribution."},{"from":3052.32,"to":3054.68,"location":2,"content":"Yeah, so there's sort of in practice,"},{"from":3054.68,"to":3055.99,"location":2,"content":"there are two parts."},{"from":3055.99,"to":3059.11,"location":2,"content":"There's the orange part which is this idea of using"},{"from":3059.11,"to":3063.58,"location":2,"content":"dot product and a vector space as our similarity measure between words"},{"from":3063.58,"to":3067.48,"location":2,"content":"and then the second part is all the rest of it where we feed it"},{"from":3067.48,"to":3071.66,"location":2,"content":"through what we refer to a news all the time as a softmax distribution."},{"from":3071.66,"to":3077.53,"location":2,"content":"So the two parts of the expen normalizing gives you a softmax distribution."},{"from":3077.53,"to":3082.12,"location":2,"content":"Um, and softmax functions will sort of map any numbers into"},{"from":3082.12,"to":3086.95,"location":2,"content":"a probability distribution always for the two reasons that I gave and so,"},{"from":3086.95,"to":3090,"location":2,"content":"it's referred to as a softmax um,"},{"from":3090,"to":3093.53,"location":2,"content":"because it works like a softmax, right?"},{"from":3093.53,"to":3095.04,"location":2,"content":"So if you have numbers,"},{"from":3095.04,"to":3099.74,"location":2,"content":"you could just say what's the max of these numbers, um,"},{"from":3099.74,"to":3106.81,"location":2,"content":"and you know that's sort of a hot- if you sort of map your original numbers into,"},{"from":3106.81,"to":3109.39,"location":2,"content":"if it's the max of the max and everything else is zero,"},{"from":3109.39,"to":3111.16,"location":2,"content":"that's sort of a hard max."},{"from":3111.16,"to":3116.93,"location":2,"content":"Um, soft- this is a softmax because the exponenti- you know,"},{"from":3116.93,"to":3120.31,"location":2,"content":"if you sort of imagine this but- if we just ignore the problem"},{"from":3120.31,"to":3124.32,"location":2,"content":"negative numbers for a moment and you got rid of the exp, um,"},{"from":3124.32,"to":3126.22,"location":2,"content":"then you'd sort of coming out with"},{"from":3126.22,"to":3129.64,"location":2,"content":"a probability distribution but by and large it's so be fairly"},{"from":3129.64,"to":3132.07,"location":2,"content":"flat and wouldn't particularly pick out the max of"},{"from":3132.07,"to":3135.31,"location":2,"content":"the different XI numbers whereas when you exponentiate them,"},{"from":3135.31,"to":3138.67,"location":2,"content":"that sort of makes big numbers way bigger and so, this,"},{"from":3138.67,"to":3145.99,"location":2,"content":"this softmax sort of mainly puts mass where the max's or the couple of max's are."},{"from":3145.99,"to":3149.92,"location":2,"content":"Um, so that's the max part and a soft part is that this isn't"},{"from":3149.92,"to":3154.9,"location":2,"content":"a hard decisions still spreads a little bit of probability mass everywhere else."},{"from":3154.9,"to":3160.54,"location":2,"content":"Okay, so now we have uh, loss function."},{"from":3160.54,"to":3165.16,"location":2,"content":"We have a loss function with a probability model on the inside that we can"},{"from":3165.16,"to":3170.23,"location":2,"content":"build and so what we want to be able to do is then um,"},{"from":3170.23,"to":3175.69,"location":2,"content":"move our vector representations of words around"},{"from":3175.69,"to":3181.07,"location":2,"content":"so that they are good at predicting what words occur in the context of other words."},{"from":3181.07,"to":3186.4,"location":2,"content":"Um, and so, at this point what we're gonna do is optimization."},{"from":3186.4,"to":3190.47,"location":2,"content":"So, we have vector components of different words."},{"from":3190.47,"to":3193.18,"location":2,"content":"We have a very high-dimensional space again but here,"},{"from":3193.18,"to":3196.27,"location":2,"content":"I've just got two for the picture and we're gonna wanna"},{"from":3196.27,"to":3199.51,"location":2,"content":"say how- how can we minimize this function and we're going to"},{"from":3199.51,"to":3203.92,"location":2,"content":"want to jiggle the numbers that are used in the word representations in"},{"from":3203.92,"to":3208.99,"location":2,"content":"such a way that we're walking down the slope of this space."},{"from":3208.99,"to":3212.09,"location":2,"content":"I walking down the gradient and um,"},{"from":3212.09,"to":3217.33,"location":2,"content":"then we're gonna minimize the function we found good representations for words."},{"from":3217.33,"to":3219.78,"location":2,"content":"So doing this for this case,"},{"from":3219.78,"to":3222.07,"location":2,"content":"we want to make a very big vector in"},{"from":3222.07,"to":3225.4,"location":2,"content":"a very high-dimensional vector space of all the parameters of"},{"from":3225.4,"to":3228.73,"location":2,"content":"our model and the only parameters that this model"},{"from":3228.73,"to":3233.09,"location":2,"content":"has is literally the vector space representations of words."},{"from":3233.09,"to":3236.17,"location":2,"content":"So if there are a 100 dimensional word representations,"},{"from":3236.17,"to":3239.32,"location":2,"content":"they're sort of a 100 parameters for aardvark and context,"},{"from":3239.32,"to":3243.4,"location":2,"content":"100 parameters for the word a- in context et cetera going through,"},{"from":3243.4,"to":3248.02,"location":2,"content":"100 parameters for the word aardvark [NOISE] as a center word et cetera,"},{"from":3248.02,"to":3252.52,"location":2,"content":"et cetera through that gives us a high big vector of parameters to"},{"from":3252.52,"to":3258.26,"location":2,"content":"optimize and we're gonna run this optimization and then um, move them down."},{"from":3258.26,"to":3263.74,"location":2,"content":"Um, [NOISE] yeah so that's essentially what you do."},{"from":3263.74,"to":3266.36,"location":2,"content":"Um, I sort of wanted to go through um,"},{"from":3266.36,"to":3268.99,"location":2,"content":"the details of this um,"},{"from":3268.99,"to":3272.44,"location":2,"content":"just so we've kind of gone through things concretely to"},{"from":3272.44,"to":3276.07,"location":2,"content":"make sure everyone is on the same page."},{"from":3276.07,"to":3279.47,"location":2,"content":"Um, so I suspect that, you know,"},{"from":3279.47,"to":3283.51,"location":2,"content":"if I try and do this concretely,"},{"from":3283.51,"to":3285.86,"location":2,"content":"um, there are a lot of people um,"},{"from":3285.86,"to":3290.83,"location":2,"content":"that this will bore and some people that are- will bore very badly,"},{"from":3290.83,"to":3294.41,"location":2,"content":"um, so I apologize for you,"},{"from":3294.41,"to":3295.81,"location":2,"content":"um, but you know,"},{"from":3295.81,"to":3299.14,"location":2,"content":"I'm hoping and thinking that there's probably"},{"from":3299.14,"to":3302.65,"location":2,"content":"some people who haven't done as much of this stuff recently"},{"from":3302.65,"to":3305.74,"location":2,"content":"and it might just actually be good to do it concretely"},{"from":3305.74,"to":3309.76,"location":2,"content":"and get everyone up to speed right at the beginning. Yeah?"},{"from":3309.76,"to":3314.68,"location":2,"content":"[inaudible] how do we calculate [inaudible] specifically?"},{"from":3314.68,"to":3320.28,"location":2,"content":"Well, so, we- so the way we calculate the,"},{"from":3320.28,"to":3326.05,"location":2,"content":"the U and V vectors is we're literally going to start with a random vector for"},{"from":3326.05,"to":3333.01,"location":2,"content":"each word and then we iteratively going to change those vectors a little bit as we learn."},{"from":3333.01,"to":3337.14,"location":2,"content":"And the way we're going to work out how to change them is we're gonna say,"},{"from":3337.14,"to":3342.4,"location":2,"content":"\"I want to do optimization,\" and that is going to be implemented as okay."},{"from":3342.4,"to":3344.83,"location":2,"content":"We have the current vectors for each word."},{"from":3344.83,"to":3351.55,"location":2,"content":"Let me do some calculus to work out how I could change the word vectors, um, to mean,"},{"from":3351.55,"to":3355.78,"location":2,"content":"that the word vectors would calculate a higher probability for"},{"from":3355.78,"to":3360.16,"location":2,"content":"the words that actually occur in contexts of this center word."},{"from":3360.16,"to":3361.86,"location":2,"content":"And we will do that,"},{"from":3361.86,"to":3363.93,"location":2,"content":"and we'll do it again and again and again,"},{"from":3363.93,"to":3366.76,"location":2,"content":"and then will eventually end up with good word vectors."},{"from":3366.76,"to":3368.26,"location":2,"content":"Thank you for that question,"},{"from":3368.26,"to":3370.78,"location":2,"content":"cause that's a concept that you're meant to have understood."},{"from":3370.78,"to":3373.33,"location":2,"content":"Is that how this works and maybe I didn't"},{"from":3373.33,"to":3376.64,"location":2,"content":"explain that high-level recipe well enough, yeah."},{"from":3376.64,"to":3380.41,"location":2,"content":"Okay, so yeah, so let's just go through it. So, we've seen it, right?"},{"from":3380.41,"to":3384.07,"location":2,"content":"So, we had this formula that we wanted to maximize, you know,"},{"from":3384.07,"to":3392.41,"location":2,"content":"our original function which was the product of T equals one to T,"},{"from":3392.41,"to":3395.99,"location":2,"content":"and then the product of the words, uh,"},{"from":3395.99,"to":3400.72,"location":2,"content":"position minus M less than or equal to J,"},{"from":3400.72,"to":3402.46,"location":2,"content":"less than or equal to M,"},{"from":3402.46,"to":3406,"location":2,"content":"J not equal to zero of, um,"},{"from":3406,"to":3411.64,"location":2,"content":"the probability of W. At prime at T"},{"from":3411.64,"to":3417.7,"location":2,"content":"plus J given WT according to the parameters of our model."},{"from":3417.7,"to":3421.33,"location":2,"content":"Okay, and then we'd already seen that we were gonna convert that"},{"from":3421.33,"to":3425.51,"location":2,"content":"into the function that we're going to use where we have J of Theta,"},{"from":3425.51,"to":3435.49,"location":2,"content":"where we had the minus one on T. Of the sum of T equals one to T of the sum of minus M,"},{"from":3435.49,"to":3437.77,"location":2,"content":"less than or equal to J less than or equal to M,"},{"from":3437.77,"to":3447.4,"location":2,"content":"J not equal to zero of the log of the probability of W times T, plus J, W,"},{"from":3447.4,"to":3451.84,"location":2,"content":"T. Okay, so we had that and then we'd had"},{"from":3451.84,"to":3456.49,"location":2,"content":"this formula that the probability of the outside word given"},{"from":3456.49,"to":3466.36,"location":2,"content":"the context word is this formula we just went through of xu ot vc over"},{"from":3466.36,"to":3476.77,"location":2,"content":"the sum of W equals one to the vocabulary size of xu wt vc."},{"from":3476.77,"to":3479.53,"location":2,"content":"Okay, so that's sort of our model."},{"from":3479.53,"to":3483.84,"location":2,"content":"We want to min- minimize this."},{"from":3483.84,"to":3491.23,"location":2,"content":"So, we wanna minimize this and we want to minimize that by changing these parameters."},{"from":3491.23,"to":3495.41,"location":2,"content":"And these parameters are the contents of these vectors."},{"from":3495.41,"to":3497.64,"location":2,"content":"And so, what we want to do now,"},{"from":3497.64,"to":3503.56,"location":2,"content":"is do calculus and we wanna say let's work out in terms of these parameters which are,"},{"from":3503.56,"to":3505.96,"location":2,"content":"u and v vectors, um,"},{"from":3505.96,"to":3510.11,"location":2,"content":"for the current values of the parameters which we initialized randomly."},{"from":3510.11,"to":3512.05,"location":2,"content":"Like what's the slope of the space?"},{"from":3512.05,"to":3513.49,"location":2,"content":"Where is downhill?"},{"from":3513.49,"to":3515.77,"location":2,"content":"Because if we can work out downhill is,"},{"from":3515.77,"to":3519.11,"location":2,"content":"we got just gotta walk downhill and our model gets better."},{"from":3519.11,"to":3522.01,"location":2,"content":"So, we're gonna take derivatives and work out what"},{"from":3522.01,"to":3525.61,"location":2,"content":"direction downhill is and then we wanna walk that way, yeah."},{"from":3525.61,"to":3530.23,"location":2,"content":"So, why do we wanna maximize that probable edge and like,"},{"from":3530.23,"to":3531.8,"location":2,"content":"like going through every word,"},{"from":3531.8,"to":3537.64,"location":2,"content":"it's like [inaudible] given the [inaudible]"},{"from":3537.64,"to":3539.66,"location":2,"content":"So, well, so, so,"},{"from":3539.66,"to":3542.91,"location":2,"content":"I'm wanting to achieve this, um,"},{"from":3542.91,"to":3548.39,"location":2,"content":"what I want to achieve for my distributional notion of meaning is,"},{"from":3548.39,"to":3551.5,"location":2,"content":"I have a meaningful word, a vector."},{"from":3551.5,"to":3556.81,"location":2,"content":"And that vector knows what words occur in the context of,"},{"from":3556.81,"to":3559.53,"location":2,"content":"um, a word- of itself."},{"from":3559.53,"to":3563.02,"location":2,"content":"And knowing what words occur in its context means,"},{"from":3563.02,"to":3564.79,"location":2,"content":"it can accurately give"},{"from":3564.79,"to":3568.95,"location":2,"content":"a high probability estimate to those words that occur in the context,"},{"from":3568.95,"to":3572.32,"location":2,"content":"and it will give low probability estimates"},{"from":3572.32,"to":3575.05,"location":2,"content":"to words that don't typically occur in the context."},{"from":3575.05,"to":3577.24,"location":2,"content":"So, you know, if the word is bank,"},{"from":3577.24,"to":3579.55,"location":2,"content":"I'm hoping that words like branch,"},{"from":3579.55,"to":3581.57,"location":2,"content":"and open, and withdrawal,"},{"from":3581.57,"to":3583.36,"location":2,"content":"will be given high probability,"},{"from":3583.36,"to":3585.45,"location":2,"content":"cause they tend to occur with the word bank."},{"from":3585.45,"to":3589.95,"location":2,"content":"And I'm hoping that some other words, um,"},{"from":3589.95,"to":3592.74,"location":2,"content":"like neural network or something have"},{"from":3592.74,"to":3598.29,"location":2,"content":"a lower probability because they don't tend to occur with the word bank."},{"from":3598.29,"to":3601.53,"location":2,"content":"Okay, um, does that make sense?"},{"from":3601.53,"to":3601.78,"location":2,"content":"Yeah."},{"from":3601.78,"to":3603.73,"location":2,"content":"Yeah. And the other thing I was,"},{"from":3603.73,"to":3606.86,"location":2,"content":"I'd forgotten meant to comment was, you know, obviously,"},{"from":3606.86,"to":3610.48,"location":2,"content":"we're not gonna be able to do this super well or it's just not gonna be able,"},{"from":3610.48,"to":3613.18,"location":2,"content":"that we can say all the words in the context is going to"},{"from":3613.18,"to":3615.88,"location":2,"content":"be this word with probability 0.97, right?"},{"from":3615.88,"to":3619.75,"location":2,"content":"Because we're using this one simple probability distribution"},{"from":3619.75,"to":3623.23,"location":2,"content":"to predict all words in our context."},{"from":3623.23,"to":3627.88,"location":2,"content":"So, in particular, we're using it to predict 10 different words generally, right?"},{"from":3627.88,"to":3632.43,"location":2,"content":"So, at best, we can kind of be giving sort of five percent chance to one of them, right?"},{"from":3632.43,"to":3633.82,"location":2,"content":"We can't possibly be,"},{"from":3633.82,"to":3635.95,"location":2,"content":"so guessing right every time."},{"from":3635.95,"to":3637.39,"location":2,"content":"Um, and well, you know,"},{"from":3637.39,"to":3640.26,"location":2,"content":"they're gonna be different contexts with different words in them."},{"from":3640.26,"to":3644.61,"location":2,"content":"So, you know, it's gonna be a very loose model,"},{"from":3644.61,"to":3648.66,"location":2,"content":"but nevertheless, we wanna capture the fact that, you know,"},{"from":3648.66,"to":3651.33,"location":2,"content":"withdrawal is much more likely, um,"},{"from":3651.33,"to":3657.58,"location":2,"content":"to occur near the word bank than something like football."},{"from":3657.58,"to":3661.03,"location":2,"content":"That's, you know, basically what our goal is."},{"from":3661.03,"to":3667.36,"location":2,"content":"Okay, um, yes, so we want to maximize this,"},{"from":3667.36,"to":3672.61,"location":2,"content":"by minimizing this, which means we then want to do some calculus to work this out."},{"from":3672.61,"to":3674.74,"location":2,"content":"So, what we're then gonna do is,"},{"from":3674.74,"to":3676.72,"location":2,"content":"that we're going to say, well,"},{"from":3676.72,"to":3679.49,"location":2,"content":"these parameters are our word vectors"},{"from":3679.49,"to":3682.63,"location":2,"content":"and we're gonna sort of want to move these word vectors,"},{"from":3682.63,"to":3688.18,"location":2,"content":"um, to, um, work things out as to how to, um, walk downhill."},{"from":3688.18,"to":3692.44,"location":2,"content":"So, the case that I'm going to do now is gonna look at the parameters of"},{"from":3692.44,"to":3698.28,"location":2,"content":"this center word vc and work out how to do things with respect to it."},{"from":3698.28,"to":3700.75,"location":2,"content":"Um, now, that's not the only thing that you wanna do,"},{"from":3700.75,"to":3704.91,"location":2,"content":"you also want to work out the slope with respect to the uo vector."},{"from":3704.91,"to":3707.97,"location":2,"content":"Um, but I'm not gonna do that because time in class is going to run out."},{"from":3707.97,"to":3709.75,"location":2,"content":"So, it'd be really good if you did that one at"},{"from":3709.75,"to":3711.72,"location":2,"content":"home and then you'd feel much more competent."},{"from":3711.72,"to":3717.13,"location":2,"content":"Right, so then, um, so what I'm wanting you to do is work out the partial derivative with"},{"from":3717.13,"to":3723.2,"location":2,"content":"respect to my vc vector representation of this quantity,"},{"from":3723.2,"to":3724.81,"location":2,"content":"that we were just looking at."},{"from":3724.81,"to":3728.29,"location":2,"content":"Which is, um, the quantity in here,"},{"from":3728.29,"to":3731.98,"location":2,"content":"um, where we're taking the log of that quantity."},{"from":3731.98,"to":3737.56,"location":2,"content":"Right, the log of the x of u,"},{"from":3737.56,"to":3740.14,"location":2,"content":"o, T, v, c,"},{"from":3740.14,"to":3746.83,"location":2,"content":"over the sum of W equals one to V of the x of u,"},{"from":3746.83,"to":3750.22,"location":2,"content":"o, T, v, c. Okay,"},{"from":3750.22,"to":3753.22,"location":2,"content":"so this, um, so now we have a log of the division,"},{"from":3753.22,"to":3755.7,"location":2,"content":"so that's easy to rewrite, um,"},{"from":3755.7,"to":3759.59,"location":2,"content":"that we have a partial derivative of the log of"},{"from":3759.59,"to":3767.56,"location":2,"content":"the numerator minus and"},{"from":3767.56,"to":3769.69,"location":2,"content":"I can distribute the partial derivative."},{"from":3769.69,"to":3773.39,"location":2,"content":"So, I can have minus the partial derivative,"},{"from":3773.39,"to":3776.68,"location":2,"content":"um, of the denominator,"},{"from":3776.68,"to":3779.71,"location":2,"content":"um, which is log of this thing."},{"from":3779.71,"to":3788.86,"location":2,"content":"[NOISE]"},{"from":3788.86,"to":3799.19,"location":2,"content":"Okay. Um, so this is sort of what was the numerator and this is what was the denominator."},{"from":3799.19,"to":3807.06,"location":2,"content":"Okay. So, um, the part that was the numerator is really easy."},{"from":3807.06,"to":3809.13,"location":2,"content":"In fact maybe I can fit it in here."},{"from":3809.13,"to":3813.45,"location":2,"content":"Um, so log on exp are just inverses of each other,"},{"from":3813.45,"to":3814.8,"location":2,"content":"so they cancel out."},{"from":3814.8,"to":3823.65,"location":2,"content":"So, we've got the partial derivative of U_o T V_c."},{"from":3823.65,"to":3827.46,"location":2,"content":"Okay, so this point I should, um, just, um,"},{"from":3827.46,"to":3831.63,"location":2,"content":"remind people right that this V_c here's a vector of- um,"},{"from":3831.63,"to":3836.13,"location":2,"content":"it's still a vector right because we had a 100 dimensional representation of a word."},{"from":3836.13,"to":3840.33,"location":2,"content":"Um, so this is doing multivariate calculus."},{"from":3840.33,"to":3842.79,"location":2,"content":"Um, so you know, if you're,"},{"from":3842.79,"to":3844.53,"location":2,"content":"if you at all, um,"},{"from":3844.53,"to":3846.11,"location":2,"content":"remember any of this stuff,"},{"from":3846.11,"to":3848.18,"location":2,"content":"you can say, \"Ha this is trivial\"."},{"from":3848.18,"to":3852.39,"location":2,"content":"The answer to that is you are done, um and that's great."},{"from":3852.39,"to":3854.95,"location":2,"content":"But you know, if you're, um, feeling, um,"},{"from":3854.95,"to":3857.55,"location":2,"content":"not so good on all of this stuff, um,"},{"from":3857.55,"to":3859.13,"location":2,"content":"and you wanna sort of, um,"},{"from":3859.13,"to":3862.44,"location":2,"content":"cheat a little on the side and try and work out what it is,"},{"from":3862.44,"to":3864.18,"location":2,"content":"um, you can sort of say,"},{"from":3864.18,"to":3865.98,"location":2,"content":"\"Well, let me um,,"},{"from":3865.98,"to":3868.38,"location":2,"content":"work out the partial derivative,"},{"from":3868.38,"to":3874.2,"location":2,"content":"um with respect to one element of this vector like the first element of this vector\"."},{"from":3874.2,"to":3882.87,"location":2,"content":"Well, what I actually got here for this dot product is I have U_o one times V_c one,"},{"from":3882.87,"to":3889.56,"location":2,"content":"plus U_o two times V_c two plus dot, dot,"},{"from":3889.56,"to":3896.91,"location":2,"content":"dot plus U_o 100 times V_c 100, right,"},{"from":3896.91,"to":3902.53,"location":2,"content":"and I'm finding the partial derivative of this with respect to V_c one,"},{"from":3902.53,"to":3905.49,"location":2,"content":"and hopefully remember that much calculus from high school"},{"from":3905.49,"to":3909.14,"location":2,"content":"of none of these terms involve V_c one."},{"from":3909.14,"to":3912.66,"location":2,"content":"So, the only thing that's left is this U_o one,"},{"from":3912.66,"to":3915.96,"location":2,"content":"and that's what I've got there for this dimension."},{"from":3915.96,"to":3917.85,"location":2,"content":"So, this particular parameter."},{"from":3917.85,"to":3923.26,"location":2,"content":"But I don't only want to do the first component of the V_c vector,"},{"from":3923.26,"to":3926.74,"location":2,"content":"I also want to do the second component of the V_c vector et cetera,"},{"from":3926.74,"to":3930.63,"location":2,"content":"which means I'm going to end up with all of them"},{"from":3930.63,"to":3935.68,"location":2,"content":"turning up in precisely one of these things."},{"from":3935.68,"to":3941.19,"location":2,"content":"Um, and so the end result is I get the vector U_o."},{"from":3941.19,"to":3943.62,"location":2,"content":"Okay. Um, but you know,"},{"from":3943.62,"to":3947.22,"location":2,"content":"if you're sort of getting confused and your brain is falling apart,"},{"from":3947.22,"to":3952.05,"location":2,"content":"I think it can be sort of kind of useful to re- reduce things to sort of um,"},{"from":3952.05,"to":3958.28,"location":2,"content":"single dimensional calculus and actually sort of play out what's actually happening."},{"from":3958.28,"to":3960.84,"location":2,"content":"Um, anyway, this part was easy."},{"from":3960.84,"to":3963.54,"location":2,"content":"The numerator, we get um, U_o."},{"from":3963.54,"to":3968.09,"location":2,"content":"Um, so things aren't quite so nice when we do the denominator."},{"from":3968.09,"to":3971.64,"location":2,"content":"So we now want to have this, um, B_d,"},{"from":3971.64,"to":3977.01,"location":2,"content":"V_c of the log of the sum of W equals"},{"from":3977.01,"to":3982.84,"location":2,"content":"one to the P_x of U_o T V_c."},{"from":3982.84,"to":3985.8,"location":2,"content":"Okay. So, now at this point,"},{"from":3985.8,"to":3987.45,"location":2,"content":"I'm not quite so pretty."},{"from":3987.45,"to":3991.03,"location":2,"content":"We've got this log sum X combination that you see a lot,"},{"from":3991.03,"to":3995.64,"location":2,"content":"and so at this point you have to remember that there was E, the chain rule."},{"from":3995.64,"to":3998.52,"location":2,"content":"Okay. So, what we can say is here's you know,"},{"from":3998.52,"to":4002.54,"location":2,"content":"our function F and here is the body of the function,"},{"from":4002.54,"to":4006.24,"location":2,"content":"and so what we want to do is um,"},{"from":4006.24,"to":4008.63,"location":2,"content":"do it in two stages."},{"from":4008.63,"to":4011.57,"location":2,"content":"Um, so that at the end of the day,"},{"from":4011.57,"to":4013.43,"location":2,"content":"we've got this V_c at the end."},{"from":4013.43,"to":4017.11,"location":2,"content":"So, we have sort of some function here."},{"from":4017.11,"to":4019.91,"location":2,"content":"There's ultimately a function of V_c,"},{"from":4019.91,"to":4022.22,"location":2,"content":"and so we gonna do with a chain rule."},{"from":4022.22,"to":4025.04,"location":2,"content":"We'll say the chain rule is we first take"},{"from":4025.04,"to":4029.14,"location":2,"content":"the derivative of this outside thing putting in this body,"},{"from":4029.14,"to":4033.68,"location":2,"content":"and then we remember that the derivative of log is one on X."},{"from":4033.68,"to":4042.92,"location":2,"content":"So, we have one over the sum of W equals one to V of the exp of U_o T V_c"},{"from":4042.92,"to":4046.64,"location":2,"content":"and then we need to multiply that by then taking"},{"from":4046.64,"to":4052.61,"location":2,"content":"the derivative of the inside part which is um,"},{"from":4052.61,"to":4060.49,"location":2,"content":"what we have here."},{"from":4060.49,"to":4064.85,"location":2,"content":"Okay. Times the derivative of the inside part with"},{"from":4064.85,"to":4068.6,"location":2,"content":"the important reminder that you need to do a change of variables,"},{"from":4068.6,"to":4073.46,"location":2,"content":"and for the inside part use a different variable that you're summing over."},{"from":4073.46,"to":4080.81,"location":2,"content":"Okay. So, now we're trying to find the derivative of a sum of X."},{"from":4080.81,"to":4085.05,"location":2,"content":"The first thing that we can do is v-very easy."},{"from":4085.05,"to":4088.86,"location":2,"content":"We can move the derivative inside a sum."},{"from":4088.86,"to":4094.43,"location":2,"content":"So, we can rewrite that and have at the sum first of the X equals one to"},{"from":4094.43,"to":4100.43,"location":2,"content":"V of the partial derivatives with respect to V_c of the [inaudible]."},{"from":4100.43,"to":4102.57,"location":2,"content":"Um, so that's a little bit of progress."},{"from":4102.57,"to":4106.73,"location":2,"content":"Um and that point we have to sort of do the chain rule again, right."},{"from":4106.73,"to":4113.21,"location":2,"content":"So, here is our function and here's the thing in it again which is some function of V_c."},{"from":4113.21,"to":4117.6,"location":2,"content":"So, we again want to do um, the chain rule."},{"from":4117.6,"to":4121.34,"location":2,"content":"So, [NOISE] we then have well,"},{"from":4121.34,"to":4125.72,"location":2,"content":"the derivative of X um, is exp."},{"from":4125.72,"to":4134.63,"location":2,"content":"So, we gonna have the sum of X equals one to V of exp of U_x T V_c,"},{"from":4134.63,"to":4140.15,"location":2,"content":"and then we're going to multiply that by the partial derivative with"},{"from":4140.15,"to":4145.7,"location":2,"content":"respect to T V_c of the inside U_x T V_c."},{"from":4145.7,"to":4148.16,"location":2,"content":"Well, we saw that one before, so,"},{"from":4148.16,"to":4153.2,"location":2,"content":"the derivative of that is U- well,"},{"from":4153.2,"to":4156.32,"location":2,"content":"yeah, U_x because we're doing it through a different X, right."},{"from":4156.32,"to":4158.78,"location":2,"content":"This then becomes out as U_x,"},{"from":4158.78,"to":4163.85,"location":2,"content":"and so we have the sum of the X equals one to"},{"from":4163.85,"to":4170.03,"location":2,"content":"V of this exp U X T B C times the U_of X."},{"from":4170.03,"to":4174.99,"location":2,"content":"Okay. So, by doing the chain rule twice, we've got that."},{"from":4174.99,"to":4178.19,"location":2,"content":"So, now if we put it together, you know,"},{"from":4178.19,"to":4183.05,"location":2,"content":"the derivative of V_c with respect of the whole thing,"},{"from":4183.05,"to":4186.5,"location":2,"content":"this log of the probability of O given C, right."},{"from":4186.5,"to":4191.21,"location":2,"content":"That for the numerator it was just U_o,"},{"from":4191.21,"to":4194.03,"location":2,"content":"and then we're subtracting,"},{"from":4194.03,"to":4197.65,"location":2,"content":"we had this term here, um,"},{"from":4197.65,"to":4199.73,"location":2,"content":"which is sort of a denominator,"},{"from":4199.73,"to":4203.87,"location":2,"content":"and then we have this term here which is the numerator."},{"from":4203.87,"to":4207.73,"location":2,"content":"So, we're subtracting in the numerator,"},{"from":4207.73,"to":4212.27,"location":2,"content":"we have the sum of X equals one to V of"},{"from":4212.27,"to":4218.77,"location":2,"content":"the exp of U_x T V_c times U_x,"},{"from":4218.77,"to":4225.4,"location":2,"content":"and then in the denominator, we have um,"},{"from":4225.4,"to":4236.35,"location":2,"content":"the sum of W equals one to V of exp of U_w T V_c."},{"from":4236.35,"to":4240.03,"location":2,"content":"Um, okay, so we kind of get that."},{"from":4240.03,"to":4244.02,"location":2,"content":"Um, oh wait. Yeah. Yeah, I've gotten."},{"from":4244.02,"to":4245.9,"location":2,"content":"Yeah, that's right. Um, okay."},{"from":4245.9,"to":4252.17,"location":2,"content":"We kind of get that and then we can sort of just re-arrange this a little."},{"from":4252.17,"to":4256.48,"location":2,"content":"So, we can have this sum right out front,"},{"from":4256.48,"to":4263.28,"location":2,"content":"and we can say that this is sort of a big sum of X equals one to V,"},{"from":4263.28,"to":4269.87,"location":2,"content":"and we can sort of take that U_x out the end and say, okay."},{"from":4269.87,"to":4273.15,"location":2,"content":"Let's call that put over here a U_x,"},{"from":4273.15,"to":4275.09,"location":2,"content":"and if we do that,"},{"from":4275.09,"to":4280.3,"location":2,"content":"sort of an interesting thing has happened because look right here,"},{"from":4280.3,"to":4285.83,"location":2,"content":"we've rediscovered exactly the same form"},{"from":4285.83,"to":4291.43,"location":2,"content":"that we use as our probability distribution for predicting the probability of words."},{"from":4291.43,"to":4297.86,"location":2,"content":"So, this is now simply the probability of X given C according to our model."},{"from":4297.86,"to":4306.15,"location":2,"content":"Um, so we can rewrite this and say that what we're getting is U_o minus the sum of"},{"from":4306.15,"to":4314.8,"location":2,"content":"X equals one to V of the probability of X given C times U_x."},{"from":4314.8,"to":4318.76,"location":2,"content":"This has a kind of an interesting meaning if you think about it."},{"from":4318.76,"to":4321.36,"location":2,"content":"So, this is actually giving us, you know,"},{"from":4321.36,"to":4324.2,"location":2,"content":"our slope in this multi-dimensional space"},{"from":4324.2,"to":4327.22,"location":2,"content":"and how we're getting that slope is we're taking"},{"from":4327.22,"to":4331.28,"location":2,"content":"the observed representation of"},{"from":4331.28,"to":4338.45,"location":2,"content":"the context word and we're subtracting from that what our model thinks um,"},{"from":4338.45,"to":4340.95,"location":2,"content":"the context should look like."},{"from":4340.95,"to":4344.47,"location":2,"content":"What does the model think that the context should look like?"},{"from":4344.47,"to":4347.33,"location":2,"content":"This part here is formal in expectation."},{"from":4347.33,"to":4351.4,"location":2,"content":"So, what you're doing is you're finding the weighted average"},{"from":4351.4,"to":4356.38,"location":2,"content":"of the models of the representations of each word,"},{"from":4356.38,"to":4359.99,"location":2,"content":"multiplied by the probability of it in the current model."},{"from":4359.99,"to":4365.31,"location":2,"content":"So, this is sort of the expected context word according to our current model,"},{"from":4365.31,"to":4367.46,"location":2,"content":"and so we're taking the difference between"},{"from":4367.46,"to":4372.17,"location":2,"content":"the expected context word and the actual context word that showed up,"},{"from":4372.17,"to":4375.56,"location":2,"content":"and that difference then turns out to exactly give"},{"from":4375.56,"to":4378.89,"location":2,"content":"us the slope as to which direction we should be"},{"from":4378.89,"to":4381.05,"location":2,"content":"walking changing the words"},{"from":4381.05,"to":4386.72,"location":2,"content":"representation in order to improve our model's ability to predict."},{"from":4386.72,"to":4391.56,"location":2,"content":"Okay. Um, so we'll,"},{"from":4391.56,"to":4394.1,"location":2,"content":"um, assignment two, um, yeah."},{"from":4394.1,"to":4398.06,"location":2,"content":"So, um, it'll be a great exercise for you guys,"},{"from":4398.06,"to":4400.11,"location":2,"content":"um, to in- um,"},{"from":4400.11,"to":4402.83,"location":2,"content":"to try and do that for the cen-, wait,"},{"from":4402.83,"to":4406.64,"location":2,"content":"um I did the center words trying to look context words as well"},{"from":4406.64,"to":4411.13,"location":2,"content":"and show you that you can do the same kind of piece of math and have it work out."},{"from":4411.13,"to":4415.65,"location":2,"content":"Um, if I've just got a few minutes left at the end."},{"from":4415.65,"to":4423.32,"location":2,"content":"Um, what I just wanted to show you if I can get all of this to work right."},{"from":4423.32,"to":4429.95,"location":2,"content":"Um, let's go [inaudible] this way."},{"from":4429.95,"to":4434.2,"location":2,"content":"Okay, find my."},{"from":4434.2,"to":4440.07,"location":2,"content":"Okay. Um, so I just wanted to just show you a quick example."},{"from":4440.07,"to":4441.94,"location":2,"content":"So, for the first assignment,"},{"from":4441.94,"to":4444.17,"location":2,"content":"um, again it's an iPython Notebook."},{"from":4444.17,"to":4449.02,"location":2,"content":"So, if you're all set up you sort of can do Jupyter Notebook."},{"from":4449.02,"to":4452.94,"location":2,"content":"Um, and you have some notebook."},{"from":4452.94,"to":4457.18,"location":2,"content":"Um, here's my little notebook I'm gonna show you,"},{"from":4457.18,"to":4470.94,"location":2,"content":"um, and the trick will be to make this big enough that people can see it."},{"from":4470.94,"to":4475.53,"location":2,"content":"That readable? [LAUGHTER] Okay, um,"},{"from":4475.53,"to":4479.21,"location":2,"content":"so right so, so Numpy is the sort of,"},{"from":4479.21,"to":4481.93,"location":2,"content":"um, do math package in Python."},{"from":4481.93,"to":4483.12,"location":2,"content":"You'll want to know about that."},{"from":4483.12,"to":4484.44,"location":2,"content":"If you don't know about it."},{"from":4484.44,"to":4486.44,"location":2,"content":"Um, Matplotlib is sort of the,"},{"from":4486.44,"to":4489.04,"location":2,"content":"one of the most basic graphing package"},{"from":4489.04,"to":4491.76,"location":2,"content":"if you don't know about that you're going to want to know about it."},{"from":4491.76,"to":4495.9,"location":2,"content":"This is sort of an IPython or Jupyter special that"},{"from":4495.9,"to":4499.76,"location":2,"content":"lets you have an interactive matplotlib um, inside."},{"from":4499.76,"to":4503.68,"location":2,"content":"And if you want to get fancy you can play it- play with your graphic styles."},{"from":4503.68,"to":4506.61,"location":2,"content":"Um, there's that."},{"from":4506.61,"to":4510.47,"location":2,"content":"Scikit-learn is kind of a general machine learning package."},{"from":4510.47,"to":4513.35,"location":2,"content":"Um, Gensim isn't a deep learning package."},{"from":4513.35,"to":4517.59,"location":2,"content":"Gensim is kind of a word similarity package which started off um,"},{"from":4517.59,"to":4520.76,"location":2,"content":"with um, methods like Latent Dirichlet analysis."},{"from":4520.76,"to":4522.53,"location":2,"content":"If you know about that from modelling words"},{"from":4522.53,"to":4525.94,"location":2,"content":"similarities that sort of grown as a good package um,"},{"from":4525.94,"to":4528.57,"location":2,"content":"for doing um, word vectors as well."},{"from":4528.57,"to":4531.65,"location":2,"content":"So, it's quite often used for word vectors and"},{"from":4531.65,"to":4536.1,"location":2,"content":"word similarities that sort of efficient for doing things at large-scale."},{"from":4536.1,"to":4537.72,"location":2,"content":"Um, yeah."},{"from":4537.72,"to":4541.36,"location":2,"content":"So, I haven't yet told you about will next time we have"},{"from":4541.36,"to":4546.4,"location":2,"content":"our own homegrown form of word vectors which are the GloVe word vectors."},{"from":4546.4,"to":4551.27,"location":2,"content":"I'm using them not because it really matters for what I'm showing but you know,"},{"from":4551.27,"to":4555.74,"location":2,"content":"these vectors are conveniently small."},{"from":4555.74,"to":4560.47,"location":2,"content":"It turns out that the vectors that Facebook and Google"},{"from":4560.47,"to":4565.94,"location":2,"content":"distribute are extremely large vocabulary and extremely high dimensional."},{"from":4565.94,"to":4568.94,"location":2,"content":"So take me just too long to load them in"},{"from":4568.94,"to":4572.86,"location":2,"content":"the last five minutes of this class where conveniently uh,"},{"from":4572.86,"to":4576.86,"location":2,"content":"in our Stanford vectors we have a 100 dimensional vectors, um,"},{"from":4576.86,"to":4579.16,"location":2,"content":"and 50 dimensional vectors which are kinda"},{"from":4579.16,"to":4581.76,"location":2,"content":"good for doing small things on a laptop frankly."},{"from":4581.76,"to":4587.33,"location":2,"content":"Um, so, what I'm doing here is Gensim doesn't natively support"},{"from":4587.33,"to":4590.21,"location":2,"content":"GloVe vectors but they actually provide a utility that"},{"from":4590.21,"to":4593.39,"location":2,"content":"converts the GloVe file format to the word2vec file format."},{"from":4593.39,"to":4600.28,"location":2,"content":"So I've done that. And then I've loaded a pre-trained model of word vectors."},{"from":4600.28,"to":4604.43,"location":2,"content":"Um, and, so this is what they call a keyed vector."},{"from":4604.43,"to":4606.89,"location":2,"content":"And so, the keyed vector is nothing fancy."},{"from":4606.89,"to":4611.66,"location":2,"content":"It's just you have words like potato and there's a vector that hangs off each one."},{"from":4611.66,"to":4615.44,"location":2,"content":"So it's really just sort of a big dictionary with a vector for each thing."},{"from":4615.44,"to":4618.69,"location":2,"content":"But, so this model has been a trained model where"},{"from":4618.69,"to":4622.23,"location":2,"content":"we just use the kind of algorithm we looked at and,"},{"from":4622.23,"to":4626.73,"location":2,"content":"you know, trained at billions of times fiddling our word vectors."},{"from":4626.73,"to":4631.26,"location":2,"content":"Um, and once we have one we can then, um,"},{"from":4631.26,"to":4634.27,"location":2,"content":"ask questions like, we can say,"},{"from":4634.27,"to":4637.11,"location":2,"content":"what is the most similar word to some other words?"},{"from":4637.11,"to":4639.65,"location":2,"content":"So we could take something like, um,"},{"from":4639.65,"to":4643.18,"location":2,"content":"what are the most similar words to Obama let's say?"},{"from":4643.18,"to":4645.77,"location":2,"content":"And we get back Barrack, Bush, Clinton,"},{"from":4645.77,"to":4649.04,"location":2,"content":"McCain, Gore, Hillary Dole, Martin, Henry."},{"from":4649.04,"to":4651.43,"location":2,"content":"That seems actually kind of interesting."},{"from":4651.43,"to":4654.05,"location":2,"content":"These factors from a few years ago."},{"from":4654.05,"to":4657.15,"location":2,"content":"So we don't have a post- post-Obama staff."},{"from":4657.15,"to":4660.75,"location":2,"content":"I mean if you put in another word, um, you know,"},{"from":4660.75,"to":4664.1,"location":2,"content":"we can put in something like banana and we get coconut,"},{"from":4664.1,"to":4666.6,"location":2,"content":"mango, bananas, potato, pineapple."},{"from":4666.6,"to":4669.43,"location":2,"content":"We get kind of tropical food."},{"from":4669.43,"to":4674.07,"location":2,"content":"So, you can actually- you can actually ask uh,"},{"from":4674.07,"to":4676.99,"location":2,"content":"for being dissimilar to words."},{"from":4676.99,"to":4679.7,"location":2,"content":"By itself dissimilar isn't very useful."},{"from":4679.7,"to":4684.55,"location":2,"content":"So if I ask most similar and I say um,"},{"from":4684.55,"to":4689.29,"location":2,"content":"negative equals, um, banana,"},{"from":4689.29,"to":4694.72,"location":2,"content":"um, I'm not sure what your concept of what's most dissimilar to,"},{"from":4694.72,"to":4696.62,"location":2,"content":"um, banana is, but you know,"},{"from":4696.62,"to":4702.65,"location":2,"content":"actually by itself you don't get anything useful out of this, um,"},{"from":4702.65,"to":4708,"location":2,"content":"because, um, you just so get these weird really rare words um,"},{"from":4708,"to":4711.44,"location":2,"content":"which, um, [LAUGHTER] definitely weren't the ones who are thinking of."},{"from":4711.44,"to":4717.57,"location":2,"content":"Um, but it turns out you can do something really useful with this negative idea"},{"from":4717.57,"to":4719,"location":2,"content":"which was one of"},{"from":4719,"to":4724.18,"location":2,"content":"the highly celebrated results of word vectors when they first started off."},{"from":4724.18,"to":4730.2,"location":2,"content":"And that was this idea that there is actually dimensions of meaning in this space."},{"from":4730.2,"to":4734.82,"location":2,"content":"And so this was the most celebrated example um, which was look,"},{"from":4734.82,"to":4739.98,"location":2,"content":"what we could do is we could start with the word king and subtract"},{"from":4739.98,"to":4745.35,"location":2,"content":"from it the meaning of man and then we could add to it the meaning of woman."},{"from":4745.35,"to":4749.11,"location":2,"content":"And then we could say which word in our vector space as"},{"from":4749.11,"to":4753.06,"location":2,"content":"most similar in meaning to that word."},{"from":4753.06,"to":4755.81,"location":2,"content":"And that would be a way of sort of doing analogies."},{"from":4755.81,"to":4758.64,"location":2,"content":"Would be able to do the, um, analogy,"},{"from":4758.64,"to":4762.05,"location":2,"content":"man is the king as woman is to what?"},{"from":4762.05,"to":4766.5,"location":2,"content":"And so, the way we're gonna do that is to say we want to be similar to king"},{"from":4766.5,"to":4771.22,"location":2,"content":"and woman because they're both positive ones and far away from man."},{"from":4771.22,"to":4775.19,"location":2,"content":"And so, we could do that manually,"},{"from":4775.19,"to":4776.95,"location":2,"content":"here is said manually,"},{"from":4776.95,"to":4781.05,"location":2,"content":"most similar positive woman king, negative man."},{"from":4781.05,"to":4785.41,"location":2,"content":"And we can run this and lo and behold it produces queen."},{"from":4785.41,"to":4788.57,"location":2,"content":"To make that a little bit easier I defined this analogy,"},{"from":4788.57,"to":4793.48,"location":2,"content":"um, analogy predicates so I can run other ones."},{"from":4793.48,"to":4799.1,"location":2,"content":"And so I can run another one like analogy Japan Japanese,"},{"from":4799.1,"to":4801.16,"location":2,"content":"Austria is to Austrian."},{"from":4801.16,"to":4803.13,"location":2,"content":"Um, and you know,"},{"from":4803.13,"to":4807.15,"location":2,"content":"I think it's fair to say that when people first"},{"from":4807.15,"to":4810.95,"location":2,"content":"saw that you could have this simple piece of math and run it,"},{"from":4810.95,"to":4812.95,"location":2,"content":"and learn meanings of words."},{"from":4812.95,"to":4818.47,"location":2,"content":"I mean it actually just sort of blew people's minds how effective this was."},{"from":4818.47,"to":4822.03,"location":2,"content":"You know. Like there- there's is no mirrors and strings here, right?"},{"from":4822.03,"to":4824.23,"location":2,"content":"You know it's not that I have a separate-"},{"from":4824.23,"to":4828.33,"location":2,"content":"a special sort of list in my Python where there's a difficult I'm looking up,"},{"from":4828.33,"to":4830.24,"location":2,"content":"er, for Austria Austrian,"},{"from":4830.24,"to":4831.91,"location":2,"content":"uh, and things like that."},{"from":4831.91,"to":4835.31,"location":2,"content":"But somehow these vector representations are"},{"from":4835.31,"to":4838.76,"location":2,"content":"such that it is actually encoding these semantic relationships,"},{"from":4838.76,"to":4840.92,"location":2,"content":"you know, so you can try different ones,"},{"from":4840.92,"to":4843.36,"location":2,"content":"you know, like it's not that only this one works."},{"from":4843.36,"to":4846.19,"location":2,"content":"I can put in France, it says French."},{"from":4846.19,"to":4849.78,"location":2,"content":"I can put in Germany, it says German,"},{"from":4849.78,"to":4854.59,"location":2,"content":"I can put in Australia not Austria and it says Australian,"},{"from":4854.59,"to":4859.48,"location":2,"content":"you know that somehow if you want this vector representations of words that"},{"from":4859.48,"to":4864.81,"location":2,"content":"for sort of these ideas like understanding the relationships between words,"},{"from":4864.81,"to":4870.6,"location":2,"content":"you're just doing this vector space manipulation on these 100 dimensional numbers,"},{"from":4870.6,"to":4875.83,"location":2,"content":"that it actually knows about them.This not only the similarities of word meanings but"},{"from":4875.83,"to":4878.26,"location":2,"content":"actually different semantic relationships"},{"from":4878.26,"to":4881.64,"location":2,"content":"between words like country names and their peoples."},{"from":4881.64,"to":4883.85,"location":2,"content":"And yeah that's actually pretty amazing."},{"from":4883.85,"to":4891.34,"location":2,"content":"It really-you know, it's sort of surprising that running such a dumb algorithm on um,"},{"from":4891.34,"to":4895.1,"location":2,"content":"vectors of numbers could capture so well the meaning of words."},{"from":4895.1,"to":4898.16,"location":2,"content":"And so that's sort of became the foundation of a lot of sort"},{"from":4898.16,"to":4901.35,"location":2,"content":"of modern distributed neural representations of words."},{"from":4901.35,"to":4902.63,"location":2,"content":"Okay I'll stop there."},{"from":4902.63,"to":4906.7,"location":2,"content":"Thanks a lot guys and see you on Thursday. [NOISE]"}]}