README.md 7.7 KB
Newer Older
J
Jignesh Patel 已提交
1
# Developer's guide for GPDB
2 3 4

### Credits
This guide was developed in collaboration with Navneet Potti (@navsan) and
J
Jignesh Patel 已提交
5 6
Nabarun Nag (@nabarunnag). Many thanks to Dave Cramer (@davecramer) and
Daniel Gustafsson (@danielgustafsson) for various suggestions to improve
7 8
the original version of this document. Alexey Grishchenko (@0x0FFF) has
also participated in improvement of the document and scripts.
9 10

## Who should read this document?
11 12
Anyone who wants to develop code for GPDB. This guide targets the
freelance developer who typically has a laptop and wants to develop
J
Jignesh Patel 已提交
13
GPDB code on it. In other words, such a typical developer does not necessarily
J
Jignesh Patel 已提交
14
have 24x7 access to a cluster, and needs a miminal stand-alone development
15
environment.
16

17
The instructions here were verified on the configurations below.
J
Jignesh Patel 已提交
18 19 20

| **OS**        | **Date Tested**    | **Comments**                           |
| :------------ |:-------------------| --------------------------------------:|
21
| OSX v.10.10.5 | 2016-03-17         | Vagrant v. 1.8.1; VirtualBox v. 5.0.16 |
J
Jignesh Patel 已提交
22 23 24
| OSX v.10.11.2 | 2015-12-29         | Vagrant v. 1.8.1; VirtualBox v. 5.0.12 |

## 1: Setup VirtualBox and Vagrant
J
Jignesh Patel 已提交
25 26 27
You need to setup both VirtualBox and Vagrant. If you don't have these
installed already, then head over to https://www.virtualbox.org/wiki/Downloads
and http://www.vagrantup.com/downloads to download and then install them.
J
Jignesh Patel 已提交
28 29

##2: Clone GPDB code from github
J
Jignesh Patel 已提交
30 31
Go to the directory in your machine where you want to check out the GPDB code,
and clone the GPDB code by typing the following into a terminal window.
J
Jignesh Patel 已提交
32 33

```shell
34
git clone https://github.com/greenplum-db/gpdb.git
J
Jignesh Patel 已提交
35 36 37
```

##3: Setup and start the virtual machine
38
Next go to the `gpdb/src/tools/vagrant` directory. This directory has virtual machine
J
Jignesh Patel 已提交
39 40
configurations for different operating systems (for now there is only one).
Pick the distro of your choice, and `cd` to that directory. For this document,
41
we will assume that you pick `centos`. So, issue the following command:
42 43

```shell
44
cd gpdb/src/tools/vagrant/centos
45 46
```

J
Jignesh Patel 已提交
47
Next let us start a virtual machine using the Vagrant file in that directory.
48
From the terminal window, issue the following command:
J
Jignesh Patel 已提交
49 50

```shell
51
vagrant up gpdb
J
Jignesh Patel 已提交
52 53 54
```

The last command will take a while as Vagrant works with VirtualBox to fetch
55 56 57
a box image for CentOS. This image is fetched only once and will be stored by Vagrant in a
directory (likely `~/.vagrant.d/boxes/`), so you won't repeatedly incur this network IO
if you repeat the steps above. A side-effect is that Vagrant has now used a
J
Jignesh Patel 已提交
58
few hundred MiBs of space on your machine. You can see the list of boxes that
59
Vagrant has downloaded using ``vagrant box list``. If you need to drop some
J
Jignesh Patel 已提交
60 61
box images, follow the instructions posted [here](https://docs.vagrantup.com/v2/cli/box.html "vagrant manage boxes").

62 63
If you are curious about what Vagrant is doing, then open the file
`Vagrantfile`. The `config.vm.box` parameter there specifies the
64
Vagrant box image that is being fetched. Essentially you are creating an
65
image of CentOS on your machine that will be used below to setup and run GPDB.
J
Jignesh Patel 已提交
66 67 68

While you are viewing the Vagrantfile, a few more things to notice here are:
* The parameter `vb.memory` sets the memory to 8GB for the virtual machine.
J
Jignesh Patel 已提交
69 70 71
  You could dial that number up or down depending on the actual memory in
  your machine.
* The parameter `vb.cpus` sets the number of cores that the virtual machine will
72
  use to 4. Again, feel free to change this number based on the machine that
J
Jignesh Patel 已提交
73
  you have.
74 75 76 77 78 79 80 81 82 83
* Additional synced folders can be configured by adding a `vagrant-local.yml`
  configuration file on the following format:

```yaml
synced_folder:
    - local: /local/folder
      shared: /folder/in/vagrant
    - local: /another/local/folder
      shared: /another/folder/in/vagrant
```
J
Jignesh Patel 已提交
84

85
Once the command above (`vagrant up gpdb`) returns, we are ready to login to the
J
Jignesh Patel 已提交
86
virtual machine. Type in the following command into the terminal window
J
Jignesh Patel 已提交
87
(make sure that you are in the directory `gpdb/vagrant/centos`):
J
Jignesh Patel 已提交
88 89

```shell
90
vagrant ssh gpdb
J
Jignesh Patel 已提交
91 92
```

J
Jignesh Patel 已提交
93 94
Now you are in the virtual machine shell in a **guest** OS that is running in
your actual machine (the **host**). Everything that you do in the guest machine
95
will be isolated from the host.
J
Jignesh Patel 已提交
96

97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112
That's it - GPDB is built, up and running.  Before you can open a psql connection, run the following:
```shell
# setup the environment
source /usr/local/gpdb/greenplum_path.sh
source ~/gpdb/gpAux/gpdemo/gpdemo-env.sh

# create a database to interact with (you only need to do this once)
createdb

# connect!
psql
```

To run the tests:
```shell
cd ~/gpdb
113
make installcheck-world
114
```
115 116 117 118 119 120

If you are curious how this happened, take a look at the following scripts:
* `vagrant/centos/vagrant-setup.sh` - this script installs all the packages
  required for GPDB as dependencies
* `vagrant/centos/vagrant-build.sh` - this script builds GPDB. In case you
  need to change build options you can change this file and re-create VM by
121
  running `vagrant destroy gpdb` followed by `vagrant up gpdb`
122 123 124 125 126 127
* `vagrant/centos/vagrant-configure-os.sh` - this script configures OS
  parameters required for running GPDB

You can easily go to `vagrant/centos/Vagrantfile` and comment out the calls for
any of these scripts at any time to prevent GPDB installation or OS-level
configurations
128 129

If you want to try out a few SQL commands, go back to the guest shell in which
130
you have the `psql` prompt, and issue the following SQL commands:
131 132 133

```sql
-- Create and populate a Users table
134
CREATE TABLE Users (uid INTEGER PRIMARY KEY,
J
Jignesh Patel 已提交
135
                    name VARCHAR);
136 137
INSERT INTO Users
  SELECT generate_series, md5(random())
J
Jignesh Patel 已提交
138
  FROM generate_series(1, 100000);
139 140

-- Create and populate a Messages table
141 142 143
CREATE TABLE Messages (mid INTEGER PRIMARY KEY,
                       uid INTEGER REFERENCES Users(uid),
                       ptime DATE,
J
Jignesh Patel 已提交
144
                       message VARCHAR);
145 146 147 148 149
INSERT INTO Messages
   SELECT generate_series,
          round(random()*100000),
          date(now() - '1 hour'::INTERVAL * round(random()*24*30)),
          md5(random())::text
150 151 152 153 154 155 156
   FROM generate_series(1, 1000000);

-- Report the number of tuples in each table
SELECT COUNT(*) FROM Messages;
SELECT COUNT(*) FROM Users;

-- Report how many messages were posted on each day
157 158 159
SELECT M.ptime, COUNT(*)
FROM Users U NATURAL JOIN Messages M
GROUP BY M.ptime
J
Jignesh Patel 已提交
160
ORDER BY M.ptime;
161
```
J
Jignesh Patel 已提交
162

J
Jignesh Patel 已提交
163
You just created a simple warehouse database that simulates users posting
164
messages on a social media network. The "fact" table (i.e. the `Messages`
J
Jignesh Patel 已提交
165
table) has a million rows. The final query reports the number of messages
166
that were posted on each day. Pretty cool!
167

J
Jignesh Patel 已提交
168 169
(Note if you want to exit the `psql` shell above, type in `\q`.)

170
##4: Using GDBP
171
If you are doing serious development, you will likely need to use a debugger.
172
Here is how you do that.
J
Jignesh Patel 已提交
173

174 175
First, list the Postgres processes by typing in (a guest terminal) the following
command: `ps ax | grep postgres`. You should see a list that looks something
J
Jignesh Patel 已提交
176
like: ![Postgres processes](/vagrant/pictures/gpdb_processes.png)
177 178

(You may have to click on the image to see it at a higher resolution.)
179
Here the key processes are the ones that were started as
180
`/usr/local/gpdb/bin/postgres`. The master is the process (pid 25486
181
in the picture above) that has the word "master" in the `-D`parameter setting,
182 183
whereas the segment hosts have the word "gpseg" in the `-D` parameter setting.

J
Jignesh Patel 已提交
184
Next, start ``gdb`` from a guest terminal. Once you get a prompt in gdb, type
185
in the following (the pid you specify in the `attach` command will be
J
Jignesh Patel 已提交
186
different for you):
187 188 189 190 191 192
```gdb
set follow-fork-mode child
b ExecutorMain
attach 25486
```
Of course, you can change which function you want to break into, and change
J
Jignesh Patel 已提交
193
whether you want to debug the master or the segment processes. Happy hacking!
194

195 196
##4: GPDB without GPORCA
If you want to run GPDB without the GPORCA query optimizer, run `vagrant up gpdb-without-gporca`.