# Developer's guide for GPDB ### Credits This guide was developed in collaboration with Navneet Potti (@navsan) and Nabarun Nag (@nabarunnag). Many thanks to Dave Cramer (@davecramer) and Daniel Gustafsson (@danielgustafsson) for various suggestions to improve the original version of this document. Alexey Grishchenko (@0x0FFF) has also participated in improvement of the document and scripts. ## Who should read this document? Anyone who wants to develop code for GPDB. This guide targets the freelance developer who typically has a laptop and wants to develop GPDB code on it. In other words, such a typical developer does not necessarily have 24x7 access to a cluster, and needs a miminal stand-alone development environment. The instructions here were verified on the configurations below. | **OS** | **Date Tested** | **Comments** | | :------------ |:-------------------| --------------------------------------:| | OSX v.10.10.5 | 2016-03-17 | Vagrant v. 1.8.1; VirtualBox v. 5.0.16 | | OSX v.10.11.2 | 2015-12-29 | Vagrant v. 1.8.1; VirtualBox v. 5.0.12 | ## 1: Setup VirtualBox and Vagrant You need to setup both VirtualBox and Vagrant. If you don't have these installed already, then head over to https://www.virtualbox.org/wiki/Downloads and http://www.vagrantup.com/downloads to download and then install them. ##2: Clone GPDB code from github Go to the directory in your machine where you want to check out the GPDB code, and clone the GPDB code by typing the following into a terminal window. ```shell git clone https://github.com/greenplum-db/gpdb.git ``` ##3: Setup and start the virtual machine Next go to the `gpdb/src/tools/vagrant` directory. This directory has virtual machine configurations for different operating systems (for now there is only one). Pick the distro of your choice, and `cd` to that directory. For this document, we will assume that you pick `centos`. So, issue the following command: ```shell cd gpdb/src/tools/vagrant/centos ``` Next let us start a virtual machine using the Vagrant file in that directory. From the terminal window, issue the following command: ```shell vagrant up gpdb ``` The last command will take a while as Vagrant works with VirtualBox to fetch a box image for CentOS. This image is fetched only once and will be stored by Vagrant in a directory (likely `~/.vagrant.d/boxes/`), so you won't repeatedly incur this network IO if you repeat the steps above. A side-effect is that Vagrant has now used a few hundred MiBs of space on your machine. You can see the list of boxes that Vagrant has downloaded using ``vagrant box list``. If you need to drop some box images, follow the instructions posted [here](https://docs.vagrantup.com/v2/cli/box.html "vagrant manage boxes"). If you are curious about what Vagrant is doing, then open the file `Vagrantfile`. The `config.vm.box` parameter there specifies the Vagrant box image that is being fetched. Essentially you are creating an image of CentOS on your machine that will be used below to setup and run GPDB. While you are viewing the Vagrantfile, a few more things to notice here are: * The parameter `vb.memory` sets the memory to 8GB for the virtual machine. You could dial that number up or down depending on the actual memory in your machine. * The parameter `vb.cpus` sets the number of cores that the virtual machine will use to 4. Again, feel free to change this number based on the machine that you have. * Additional synced folders can be configured by adding a `vagrant-local.yml` configuration file on the following format: ```yaml synced_folder: - local: /local/folder shared: /folder/in/vagrant - local: /another/local/folder shared: /another/folder/in/vagrant ``` Once the command above (`vagrant up gpdb`) returns, we are ready to login to the virtual machine. Type in the following command into the terminal window (make sure that you are in the directory `gpdb/vagrant/centos`): ```shell vagrant ssh gpdb ``` Now you are in the virtual machine shell in a **guest** OS that is running in your actual machine (the **host**). Everything that you do in the guest machine will be isolated from the host. That's it - GPDB is built, up and running. Before you can open a psql connection, run the following: ```shell # setup the environment source /usr/local/gpdb/greenplum_path.sh source ~/gpdb/gpAux/gpdemo/gpdemo-env.sh # create a database to interact with (you only need to do this once) createdb # connect! psql ``` To run the tests: ```shell cd ~/gpdb make installcheck-world ``` If you are curious how this happened, take a look at the following scripts: * `vagrant/centos/vagrant-setup.sh` - this script installs all the packages required for GPDB as dependencies * `vagrant/centos/vagrant-build.sh` - this script builds GPDB. In case you need to change build options you can change this file and re-create VM by running `vagrant destroy gpdb` followed by `vagrant up gpdb` * `vagrant/centos/vagrant-configure-os.sh` - this script configures OS parameters required for running GPDB You can easily go to `vagrant/centos/Vagrantfile` and comment out the calls for any of these scripts at any time to prevent GPDB installation or OS-level configurations If you want to try out a few SQL commands, go back to the guest shell in which you have the `psql` prompt, and issue the following SQL commands: ```sql -- Create and populate a Users table CREATE TABLE Users (uid INTEGER PRIMARY KEY, name VARCHAR); INSERT INTO Users SELECT generate_series, md5(random()) FROM generate_series(1, 100000); -- Create and populate a Messages table CREATE TABLE Messages (mid INTEGER PRIMARY KEY, uid INTEGER REFERENCES Users(uid), ptime DATE, message VARCHAR); INSERT INTO Messages SELECT generate_series, round(random()*100000), date(now() - '1 hour'::INTERVAL * round(random()*24*30)), md5(random())::text FROM generate_series(1, 1000000); -- Report the number of tuples in each table SELECT COUNT(*) FROM Messages; SELECT COUNT(*) FROM Users; -- Report how many messages were posted on each day SELECT M.ptime, COUNT(*) FROM Users U NATURAL JOIN Messages M GROUP BY M.ptime ORDER BY M.ptime; ``` You just created a simple warehouse database that simulates users posting messages on a social media network. The "fact" table (i.e. the `Messages` table) has a million rows. The final query reports the number of messages that were posted on each day. Pretty cool! (Note if you want to exit the `psql` shell above, type in `\q`.) ##4: Using GDBP If you are doing serious development, you will likely need to use a debugger. Here is how you do that. First, list the Postgres processes by typing in (a guest terminal) the following command: `ps ax | grep postgres`. You should see a list that looks something like: ![Postgres processes](/vagrant/pictures/gpdb_processes.png) (You may have to click on the image to see it at a higher resolution.) Here the key processes are the ones that were started as `/usr/local/gpdb/bin/postgres`. The master is the process (pid 25486 in the picture above) that has the word "master" in the `-D`parameter setting, whereas the segment hosts have the word "gpseg" in the `-D` parameter setting. Next, start ``gdb`` from a guest terminal. Once you get a prompt in gdb, type in the following (the pid you specify in the `attach` command will be different for you): ```gdb set follow-fork-mode child b ExecutorMain attach 25486 ``` Of course, you can change which function you want to break into, and change whether you want to debug the master or the segment processes. Happy hacking! ##4: GPDB without GPORCA If you want to run GPDB without the GPORCA query optimizer, run `vagrant up gpdb-without-gporca`.