Setting up ckan on Ubuntu 18.04 – Python 3 and Amazon Lightsail

Step 1: Create an Ubuntu 18.04 LTS instance in Amazon Lightsail

Create a new instance on Amazon Lightsail. I called mine “ckan” and initiated it on Ubuntu 18.04 LTS.

Once you have an instance created, go to Networking tab and create a static IP to associate this instance with. I called mine “ckan-static-ip” and attached it to the “ckan” instance. I also enabled automated snapshots at the ckan instance level.

Optionally, at this time, you may want to associate this static IP with a domain name. I have associated this with masudklabs.com in the godaddy DNS panel. Of course, there is nothing running on port 80 at the moment so your browser won’t show anything.

At this point, you can go into your account, then in the SSH keys tab, and download the default key associated with your region in which you have created your instance. Mine is created in the Ireland eu-west-1 zone. Once downloaded (in my Downloads folder), copy it to a place which you will ssh from. Name it something simpler if you prefer (I named mine aws-key). The default username for an Ubuntu instance in Amazon Lightsail is ubuntu.

masud@MacBook-Pro ~ % cd Downloads 
masud@MacBook-Pro Downloads % cp LightsailDefaultKey-eu-west-1.pem ../aws-key 
masud@MacBook-Pro Downloads % cd ..                                          
masud@MacBook-Pro ~ % ssh -i aws-key ubuntu@34.252.6.57
The authenticity of host '34.252.6.57 (34.252.6.57)' can't be established.
ECDSA key fingerprint is SHA256:QhPcQd+5nzRl2AvoByGjRpXYJnnTAxp77uFwlX+3MsY.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '34.252.6.57' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-1021-aws x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Sun Jun 14 20:41:49 UTC 2020

  System load:  0.0               Processes:           84
  Usage of /:   2.6% of 38.71GB   Users logged in:     0
  Memory usage: 14%               IP address for eth0: 172.26.6.221
  Swap usage:   0%

  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud

0 packages can be updated.
0 updates are security updates.



The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

ubuntu@ip-172-26-6-221:~$ 

Great, so you have an Ubuntu 18.04 LTS instance running in Amazon Lightsail, and you can successfully ssh into this instance. This concludes step 1.

Step 2: Ckan installation and pre-requisites

Ckan documentation is generally good but outdated in places, and it does not make it entirely clear what “You want to install CKAN for development” means. if you want to change any templates of the pages or change the theme in any way, I would recommend that you install from source rather than from package.

There is good documentation available on installing ckan from source here: https://docs.ckan.org/en/latest/maintaining/installing/install-from-source.html and I will follow this and see if it all works.

  1. Install the relevant packages
sudo apt-get install python3-dev postgresql libpq-dev python3-pip python3-venv git-core solr-jetty openjdk-8-jdk redis-server

Straight away, I have come across a series of errors.

ubuntu@ip-172-26-6-221:~$ sudo apt-get install python3-dev postgresql libpq-dev python3-pip python3-venv git-core solr-jetty openjdk-8-jdk redis-server
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Note, selecting 'git' instead of 'git-core'
Package python3-venv is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Unable to locate package python3-pip
E: Package 'python3-venv' has no installation candidate
E: Unable to locate package solr-jetty
E: Unable to locate package openjdk-8-jdk
E: Unable to locate package redis-server
ubuntu@ip-172-26-6-221:~$ 

The issue here was that I didn’t update my package information from all available sources. To fix this, simply run the following command.

ubuntu@ip-172-26-6-221:~$ sudo apt-get update
Hit:1 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu bionic InRelease
Get:2 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Get:3 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
Get:4 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu bionic/universe Sources [9051 kB]
Get:5 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]       
Get:6 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu bionic/restricted Sources [5324 B]                    
Get:7 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu bionic/multiverse Sources [181 kB]                             
Get:8 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu bionic/main Sources [829 kB]                                   
Get:9 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu bionic/universe amd64 Packages [8570 kB]                  
Get:10 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu bionic/universe Translation-en [4941 kB]                       
Get:11 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu bionic/multiverse amd64 Packages [151 kB]
...
...
...
Get:42 http://security.ubuntu.com/ubuntu bionic-security/multiverse Translation-en [2856 B]
Fetched 29.8 MB in 6s (5240 kB/s)               
Reading package lists... Done
ubuntu@ip-172-26-6-221:~$

Ok now back to the previous command, and this time, it goes through. Let the packages get installed now. It gave me a prompt about restarting services automatically or manually. I selected Yes for automatic restarts so that it can keep updating the packages while I am updating this blog post. This can easily take a couple of minutes depending on your server CPU speed and bandwidth.

Now that the packages are successfully installed, let’s install ckan into a python virtualenv.

2. Install ckan

Create a python virtual environment.

ubuntu@ip-172-26-6-221:~$ sudo mkdir -p /usr/lib/ckan/default
ubuntu@ip-172-26-6-221:~$ whoami
ubuntu
ubuntu@ip-172-26-6-221:~$ sudo chown `whoami` /usr/lib/ckan/default
ubuntu@ip-172-26-6-221:~$ python3 -m venv /usr/lib/ckan/default
ubuntu@ip-172-26-6-221:~$ . /usr/lib/ckan/default/bin/activate
(default) ubuntu@ip-172-26-6-221:~$ 

You can see that python virtual environment is successfully created and activated.

For the next step, it states to install the recommended version of setuptools. I went to https://pypi.org/project/setuptools/ and noticed that the current version that is passing all tests at the time of writing this post is 47.1.1. (See end of step 2 before you run this command to save some pain)

(default) ubuntu@ip-172-26-6-221:~$ pip install setuptools==47.1.1
Collecting setuptools==47.1.1
  Downloading https://files.pythonhosted.org/packages/95/95/f657b6e17f00c3f35b5f68b10e46c3a43af353d8856bd57bfcfb1dbb3e92/setuptools-47.1.1-py3-none-any.whl (583kB)
    100% |████████████████████████████████| 583kB 2.1MB/s 
Installing collected packages: setuptools
  Found existing installation: setuptools 39.0.1
    Uninstalling setuptools-39.0.1:
      Successfully uninstalled setuptools-39.0.1
Successfully installed setuptools-47.1.1
(default) ubuntu@ip-172-26-6-221:~$ pip install --upgrade pip
Collecting pip
  Downloading https://files.pythonhosted.org/packages/43/84/23ed6a1796480a6f1a2d38f2802901d078266bda38388954d01d3f2e821d/pip-20.1.1-py2.py3-none-any.whl (1.5MB)
    100% |████████████████████████████████| 1.5MB 922kB/s 
Installing collected packages: pip
  Found existing installation: pip 9.0.1
    Uninstalling pip-9.0.1:
      Successfully uninstalled pip-9.0.1
Successfully installed pip-20.1.1
(default) ubuntu@ip-172-26-6-221:~$

Now that setuptools are installed and pip is upgraded, let’s move to installation of ckan. If you (like me) are intending to install the latest stable release of ckan (2.8.2 at the time of writing this post), then unfortunately I have news for you. Ckan documentation gives the false impression that it is possible. However, in reality, this is what happens:

(default) ubuntu@ip-172-26-6-221:~$ pip install -e 'git+https://github.com/ckan/ckan.git@ckan-2.8.2#egg=ckan'
Obtaining ckan from git+https://github.com/ckan/ckan.git@ckan-2.8.2#egg=ckan
  Cloning https://github.com/ckan/ckan.git (to revision ckan-2.8.2) to /usr/lib/ckan/default/src/ckan
  Running command git clone -q https://github.com/ckan/ckan.git /usr/lib/ckan/default/src/ckan
  Running command git checkout -q d75edc844fadea285e479e69308faee3f1824509
    ERROR: Command errored out with exit status 1:
     command: /usr/lib/ckan/default/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/usr/lib/ckan/default/src/ckan/setup.py'"'"'; __file__='"'"'/usr/lib/ckan/default/src/ckan/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-rv_iu3d0
         cwd: /usr/lib/ckan/default/src/ckan/
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/usr/lib/ckan/default/src/ckan/setup.py", line 34, in <module>
        if parse_version(setuptools_version) < min_setuptools_version:
    TypeError: '<' not supported between instances of 'map' and 'map'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
(default) ubuntu@ip-172-26-6-221:~$ 

A little bit of googling tells me that this is a common issue and has the root cause that ckan 2.8.* version or lower does not work with Python 3. See this for example: https://github.com/ckan/ckan/issues/5284

Right, let’s try the development version then, 2.9.0a which is supposed to work with Python 3. Unfortunately not straightforward either. After doing lots of package downloads and installs (basically showing promise), this is what it ended up with. On to more googling.

Collecting zope.interface==4.3.2; extra == "requirements"
  Downloading zope.interface-4.3.2.tar.gz (143 kB)
     |████████████████████████████████| 143 kB 44.0 MB/s 
    ERROR: Command errored out with exit status 1:
     command: /usr/lib/ckan/default/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-9jyectce/zope.interface/setup.py'"'"'; __file__='"'"'/tmp/pip-install-9jyectce/zope.interface/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-e9z6op84
         cwd: /tmp/pip-install-9jyectce/zope.interface/
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-9jyectce/zope.interface/setup.py", line 26, in <module>
        from setuptools import setup, Extension, Feature
    ImportError: cannot import name 'Feature'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
(default) ubuntu@ip-172-26-6-221:~$ 

Seems like this might be a problem with the version of setuptools as Features implementation was incomplete, then deprecated, then resurfaced in different versions. As the ckan guidance mentions version 44.1.0, I will change my setuptools version to that to see if that makes a difference.

(default) ubuntu@ip-172-26-6-221:~$ pip uninstall setuptools
Found existing installation: setuptools 47.1.1
Uninstalling setuptools-47.1.1:
  Would remove:
    /usr/lib/ckan/default/bin/easy_install
    /usr/lib/ckan/default/bin/easy_install-3.6
    /usr/lib/ckan/default/lib/python3.6/site-packages/easy_install.py
    /usr/lib/ckan/default/lib/python3.6/site-packages/pkg_resources/*
    /usr/lib/ckan/default/lib/python3.6/site-packages/setuptools-47.1.1.dist-info/*
    /usr/lib/ckan/default/lib/python3.6/site-packages/setuptools/*
Proceed (y/n)? Y
  Successfully uninstalled setuptools-47.1.1
(default) ubuntu@ip-172-26-6-221:~$ pip install setuptools==44.1.0
Collecting setuptools==44.1.0
  Downloading setuptools-44.1.0-py2.py3-none-any.whl (583 kB)
     |████████████████████████████████| 583 kB 12.7 MB/s 
Installing collected packages: setuptools
Successfully installed setuptools-44.1.0
(default) ubuntu@ip-172-26-6-221:~$ 

And then:

(default) ubuntu@ip-172-26-6-221:~$ pip install -e 'git+https://github.com/ckan/ckan.git#egg=ckan[requirements,dev]'
...
...
...
Successfully installed Blinker-1.4 Flask-DebugToolbar-0.10.1 Pygments-2.6.1 Sphinx-1.8.5 alabaster-0.7.12 alembic-1.0.0 arrow-0.15.6 atomicwrites-1.4.0 attrs-19.3.0 babel-2.7.0 beaker-1.11.0 beautifulsoup4-4.5.1 binaryornot-0.4.4 bleach-3.1.4 certifi-2019.11.28 chardet-3.0.4 ckan click-6.7 cookiecutter-1.6.0 coverage-5.1 coveralls-2.0.0 decorator-4.4.1 docopt-0.6.2 docutils-0.12 dominate-2.4.0 factory-boy-2.1.1 fanstatic-1.1 feedgen-0.9.0 first-2.0.2 flask-1.1.1 flask-babel-1.0.0 flask-multistatic-1.0 freezegun-0.3.15 future-0.18.2 idna-2.8 imagesize-1.2.0 importlib-metadata-1.6.1 incremental-17.5.0 itsdangerous-1.1.0 jinja2-2.10.1 jinja2-time-0.2.0 lxml-4.4.2 mako-1.1.0 markdown-2.6.7 markupsafe-1.1.1 mock-2.0.0 more-itertools-8.4.0 nose-1.3.7 packaging-20.4 passlib-1.6.5 pastedeploy-2.0.1 pathtools-0.1.2 pbr-5.4.4 pip-tools-2.0.2 pluggy-0.13.1 polib-1.0.7 poyo-0.5.0 psycopg2-2.8.2 py-1.8.1 pycodestyle-2.5.0 pyfakefs-3.2 pyparsing-2.4.7 pysolr-3.6.0 pytest-4.6.5 pytest-cov-2.7.1 pytest-freezegun-0.4.1 pytest-rerunfailures-8.0 pytest-split-tests-1.0.9 python-dateutil-2.8.1 python-editor-1.0.4 python-magic-0.4.15 pytz-2016.7 pyutilib-5.7.1 pyyaml-5.3.1 redis-3.3.11 repoze.lru-0.7 repoze.who-2.3 requests-2.22.0 responses-0.10.6 routes-1.13 rq-1.0 shutilwhich-1.1.0 simplejson-3.10.0 six-1.13.0 snowballstemmer-2.0.0 sphinx-rtd-theme-0.3.1 sphinxcontrib-websupport-1.2.2 sqlalchemy-1.3.5 sqlalchemy-migrate-0.12.0 sqlparse-0.2.2 tempita-0.5.2 toml-0.10.1 towncrier-19.2.0 tzlocal-1.3 unicodecsv-0.14.1 urllib3-1.25.8 watchdog-0.10.2 wcwidth-0.2.4 webassets-0.12.1 webencodings-0.5.1 webob-1.8.5 werkzeug-1.0.0 whichcraft-0.6.1 zipp-3.1.0 zope.interface-4.3.2
(default) ubuntu@ip-172-26-6-221:~$ 

So this has worked. Moral of the story is to stick with official documentation 🙂

Now let’s deactivate and activate the virtual environment again to make sure we are doing the rest of the installs in the virtualenv. I will create an alias so I don’t have to remember the activation path all the time.

(default) ubuntu@ip-172-26-6-221:~$ deactivate
ubuntu@ip-172-26-6-221:~$ alias activate=". /usr/lib/ckan/default/bin/activate"
ubuntu@ip-172-26-6-221:~$ activate
(default) ubuntu@ip-172-26-6-221:~$

Step 3: Setup the PostgreSQL database

As we have already installed the Postgres database (see very first install command), now is the time to set it up properly. Firstly, to check if the existing default database are being listed.

(default) ubuntu@ip-172-26-6-221:~$ sudo -u postgres psql -l
                              List of databases
   Name    |  Owner   | Encoding | Collate |  Ctype  |   Access privileges   
-----------+----------+----------+---------+---------+-----------------------
 postgres  | postgres | UTF8     | C.UTF-8 | C.UTF-8 | 
 template0 | postgres | UTF8     | C.UTF-8 | C.UTF-8 | =c/postgres          +
           |          |          |         |         | postgres=CTc/postgres
 template1 | postgres | UTF8     | C.UTF-8 | C.UTF-8 | =c/postgres          +
           |          |          |         |         | postgres=CTc/postgres
(3 rows)

(default) ubuntu@ip-172-26-6-221:~$ 

Let’s create a default user for ckan called “ckan_default” and a default database for ckan called “ckan_default” as per the official documentation.

The flags for createuser command are case sensitive and it is important to read more about them if you want to learn more about Postgres. See: https://www.postgresql.org/docs/9.3/app-createuser.html

-D = this user cannot create databases

-P = provide a prompt for password for this user

-R = this user cannot create roles

-S = this user will not be a superuser

(default) ubuntu@ip-172-26-6-221:~$ sudo -u postgres createuser -S -D -R -P ckan_default
Enter password for new role: 
Enter it again: 
(default) ubuntu@ip-172-26-6-221:~$

Now let’s create the default database for ckan and check the list of databases again.

(default) ubuntu@ip-172-26-6-221:~$ sudo -u postgres createdb -O ckan_default ckan_default -E utf-8
(default) ubuntu@ip-172-26-6-221:~$ sudo -u postgres psql -l
                                 List of databases
     Name     |    Owner     | Encoding | Collate |  Ctype  |   Access privileges   
--------------+--------------+----------+---------+---------+-----------------------
 ckan_default | ckan_default | UTF8     | C.UTF-8 | C.UTF-8 | 
 postgres     | postgres     | UTF8     | C.UTF-8 | C.UTF-8 | 
 template0    | postgres     | UTF8     | C.UTF-8 | C.UTF-8 | =c/postgres          +
              |              |          |         |         | postgres=CTc/postgres
 template1    | postgres     | UTF8     | C.UTF-8 | C.UTF-8 | =c/postgres          +
              |              |          |         |         | postgres=CTc/postgres
(4 rows)

(default) ubuntu@ip-172-26-6-221:~$ 

Excellent, all looking good so far.

Step 4: Configuring ckan

Ckan is run using settings which reside in a configuration file. Let’s create this file and adjust the settings in there.

Firstly, let’s create the directory in which configuration file(s) would reside (you can have different config files for a production or development instance). Give the directory the right ownership so user ubuntu can create files in it. Finally, use the ckan command (we are in virtual environment so this will work now) to generate a default config, and call it “ckan.ini”.

(default) ubuntu@ip-172-26-6-221:~$ sudo mkdir -p /etc/ckan/default
(default) ubuntu@ip-172-26-6-221:~$ sudo chown -R `whoami` /etc/ckan/
(default) ubuntu@ip-172-26-6-221:~$ ckan generate config /etc/ckan/default/ckan.ini
(default) ubuntu@ip-172-26-6-221:~$

Update the connection to Postgres database configuration. Edit the newly create ckan.ini file and update the following setting, changing “pass” to the password you chose above.

## Database Settings
sqlalchemy.url = postgresql://ckan_default:pass@localhost/ckan_default

I will also update the site url.

ckan.site_url = http://masudklabs.com

Step 5: Setting up Solr

While the official guidance suggests that we will need to create the relevant symlink for jetty9, it was automatically created for me. It is easy to run Solr against Jetty, but you can also run it against (in my opinion a very bloated) Tomcat server. See the following for confirmation of symlink’s existence.

(default) ubuntu@ip-172-26-6-221:~$ sudo ln -s /etc/solr/solr-jetty.xml /var/lib/jetty9/webapps/solr.xml
ln: failed to create symbolic link '/var/lib/jetty9/webapps/solr.xml': File exists
(default) ubuntu@ip-172-26-6-221:~$ ls -lart /var/lib/jetty9/webapps/solr.xml 
lrwxrwxrwx 1 jetty adm 24 Mar  4  2019 /var/lib/jetty9/webapps/solr.xml -> /etc/solr/solr-jetty.xml
(default) ubuntu@ip-172-26-6-221:~$ 

Change jetty to start on port 8983 instead of 8080.

(default) ubuntu@ip-172-26-6-221:~$ sudo nano /etc/jetty9/start.ini 

To make sure Jetty runs automatically at reboot, confirm that NO_START is set to 0 in the following file and then restart jetty9. Note that I have not changed my jetty host to 127.0.0.1 in start.ini file (it doesn’t exist in /etc/default/jetty9 file) as I want to be able to access jetty via my domain name and via localhost at the moment.

(default) ubuntu@ip-172-26-6-221:~$ sudo nano /etc/default/jetty9 
(default) ubuntu@ip-172-26-6-221:~$ sudo service jetty9 restart

At this point, I would go back to my Amazon Lightsail ckan instance and add a new firewall exception for port 8983, which we configured above for Jetty.

Now to verify if all is working as expected. Unfortunately, in my case, it wasn’t and instead of seeing the default Solr page, I saw the Error 404 Not found page.

For avoidance of doubt, I checked this on terminal as well.

ubuntu@ip-172-26-6-221:~$ curl http://localhost:8983/solr
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /solr. Reason:
<pre>    Not Found</pre></p><hr><a href="http://eclipse.org/jetty">Powered by Jetty:// 9.4.15.v20190215</a><hr/>

</body>
</html>
ubuntu@ip-172-26-6-221:~$

Ok, back to debugging what may be the problem here. Right, so after some trial and error, I found the answer here: https://stackoverflow.com/questions/55939999/how-to-get-solr-and-ckan-to-run-on-ubuntu-18-04-after-recent-solr-jetty-updates.

Here is what I did to get Solr up and running on Jetty.

Firstly, use the following commands to allow Solr to read and write in particular path.

ubuntu@ip-172-26-6-221:/usr/lib/ckan/default/src/ckan$ sudo mkdir /etc/systemd/system/jetty9.service.d
ubuntu@ip-172-26-6-221:/usr/lib/ckan/default/src/ckan$ sudo nano /etc/systemd/system/jetty9.service.d/solr.conf

And add the following

[Service]
ReadWritePaths=/var/lib/solr

Then edit the file /etc/solr/solr-jetty.xml and comment out the part

<!-- Enable symlinks -->
  <!-- Disabled due to being deprecated
  <Call name="addAliasCheck">
    <Arg>
      <New class="org.eclipse.jetty.server.handler.ContextHandler$ApproveSameSuffixAliases"/>
    </Arg>
  </Call>
  -->

Restart jetty9 after this.

(default) ubuntu@ip-172-26-6-221:~$ sudo systemctl daemon-reload
(default) ubuntu@ip-172-26-6-221:~$ sudo service jetty9 restart

Now we need to replace the default Solr schema with the one that ckan provides us. For this reason, run the following commands.

(default) ubuntu@ip-172-26-6-221:~$ sudo mv /etc/solr/conf/schema.xml /etc/solr/conf/schema.xml.bak
(default) ubuntu@ip-172-26-6-221:~$ sudo ln -s /usr/lib/ckan/default/src/ckan/ckan/config/solr/schema.xml /etc/solr/conf/schema.xml
(default) ubuntu@ip-172-26-6-221:~$ sudo service jetty9 restart

At this point, you can also change the solr_url setting in your ckan configuration file. I have changed mine to:

solr_url=http://127.0.0.1:8983/solr

Step 6: Linking to who.ini

I have no idea what this file is, and I will create the symlink as the documentation suggests. However, I will go read more about this later on or if you know what this is about, please feel free to tell me in the comments below.

(default) ubuntu@ip-172-26-6-221:~$ ln -s /usr/lib/ckan/default/src/ckan/who.ini /etc/ckan/default/who.ini

Step 7: Creating Database tables

Let’s initiate the database tables using the ckan.ini configuration file.

(default) ubuntu@ip-172-26-6-221:~$ cd /usr/lib/ckan/default/src/ckan
(default) ubuntu@ip-172-26-6-221:/usr/lib/ckan/default/src/ckan$ ckan -c /etc/ckan/default/ckan.ini db init
2020-06-14 23:54:52,370 INFO  [ckan.cli] Using configuration file /etc/ckan/default/ckan.ini
2020-06-14 23:54:52,370 INFO  [ckan.config.environment] Loading static files from public
2020-06-14 23:54:52,424 INFO  [ckan.config.environment] Loading templates from /usr/lib/ckan/default/src/ckan/ckan/templates
2020-06-14 23:54:52,719 INFO  [ckan.config.environment] Loading templates from /usr/lib/ckan/default/src/ckan/ckan/templates
2020-06-14 23:54:54,369 CRITI [ckan.lib.uploader] Please specify a ckan.storage_path in your config
                         for your uploads
2020-06-14 23:54:54,640 INFO  [ckan.cli.db] Initialize the Database
2020-06-14 23:54:56,191 INFO  [ckan.model] CKAN database version upgraded: base -> 19ddad52b500 (head)
2020-06-14 23:54:56,191 INFO  [ckan.model] Database initialised
Initialising DB: SUCCESS
(default) ubuntu@ip-172-26-6-221:/usr/lib/ckan/default/src/ckan$ 

While the DB: SUCCESS indicates that all has worked, there is also a CRITICAL message on storage path not configured. We might as well configure that now while we are here.

(default) ubuntu@ip-172-26-6-221:~$ sudo mkdir -p /var/lib/ckan/default
(default) ubuntu@ip-172-26-6-221:~$ sudo chown `whoami` /var/lib/ckan/default
(default) ubuntu@ip-172-26-6-221:~$ sudo chmod u+rwx /var/lib/ckan/default
(default) ubuntu@ip-172-26-6-221:~$ nano /etc/ckan/default/ckan.ini 
(default) ubuntu@ip-172-26-6-221:~$ 

And change the setting in the file as following:

## Storage Settings

ckan.storage_path = /var/lib/ckan/default

If you run the db init command again, this is what you get this time.

(default) ubuntu@ip-172-26-6-221:/usr/lib/ckan/default/src/ckan$ ckan -c /etc/ckan/default/ckan.ini db init
2020-06-15 00:01:26,954 INFO  [ckan.cli] Using configuration file /etc/ckan/default/ckan.ini
2020-06-15 00:01:26,954 INFO  [ckan.config.environment] Loading static files from public
2020-06-15 00:01:26,999 INFO  [ckan.config.environment] Loading templates from /usr/lib/ckan/default/src/ckan/ckan/templates
2020-06-15 00:01:27,341 INFO  [ckan.config.environment] Loading templates from /usr/lib/ckan/default/src/ckan/ckan/templates
2020-06-15 00:01:27,639 INFO  [ckan.cli.db] Initialize the Database
2020-06-15 00:01:27,745 INFO  [ckan.model] CKAN database version remains as: 19ddad52b500 (head)
2020-06-15 00:01:27,746 INFO  [ckan.model] Database initialised
Initialising DB: SUCCESS
(default) ubuntu@ip-172-26-6-221:/usr/lib/ckan/default/src/ckan$

Step 8: Datastore – Yes or No? and running ckan

While I am genuinely interested in the datastore plugin and would be enabling it shortly, I am keen to see if ckan is up and running and working at this time. Thus I will skip enabling datastore plugin for now and will come back to it later.

Let’s start ckan in a development setting.

(default) ubuntu@ip-172-26-6-221:/usr/lib/ckan/default/src/ckan$ ckan -c /etc/ckan/default/ckan.ini run
2020-06-15 00:13:38,021 INFO  [ckan.cli] Using configuration file /etc/ckan/default/ckan.ini
2020-06-15 00:13:38,021 INFO  [ckan.config.environment] Loading static files from public
2020-06-15 00:13:38,063 INFO  [ckan.config.environment] Loading templates from /usr/lib/ckan/default/src/ckan/ckan/templates
2020-06-15 00:13:38,298 INFO  [ckan.config.environment] Loading templates from /usr/lib/ckan/default/src/ckan/ckan/templates
2020-06-15 00:13:38,576 INFO  [ckan.cli.server] Running server localhost on port 5000
2020-06-15 00:13:39,564 INFO  [ckan.cli] Using configuration file /etc/ckan/default/ckan.ini
2020-06-15 00:13:39,564 INFO  [ckan.config.environment] Loading static files from public
2020-06-15 00:13:39,608 INFO  [ckan.config.environment] Loading templates from /usr/lib/ckan/default/src/ckan/ckan/templates
2020-06-15 00:13:39,841 INFO  [ckan.config.environment] Loading templates from /usr/lib/ckan/default/src/ckan/ckan/templates
2020-06-15 00:13:40,123 INFO  [ckan.cli.server] Running server localhost on port 5000

And a curl call seems to work fine.

ubuntu@ip-172-26-6-221:~$ curl http://localhost:5000
<!DOCTYPE html>
<!--[if IE 9]> <html lang="en" class="ie9"> <![endif]-->
<!--[if gt IE 8]><!--> <html lang="en"> <!--<![endif]-->
  <head>
    <meta charset="utf-8" />
      <meta name="generator" content="ckan 2.9.0a" />
      <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Welcome - CKAN</title>

    
    <link rel="shortcut icon" href="/base/images/ckan.ico" />

...
...
...
<script src="/webassets/vendor/580fa18d_bootstrap.js" type="text/javascript"></script>
<script src="/webassets/base/5f5a82bb_main.js" type="text/javascript"></script>
<script src="/webassets/base/91df2ea0_ckan.js" type="text/javascript"></script>
  </body>
</html>

Step 9: Deploying ckan – Nginx and uwsgi

Right, so far so good. Ckan is up and running in a development setting but we can’t access it through our domain. To do this, let’s deploy ckan on a web server. Initially, I was going to use Apache, with mod_wsgi and Nginx but after some further work, it was clear that there are some issues with the official documentation here, particularly with lack of activate_this.py for Python 3 virtual environments, and thread safety concerns.

Instead, we will follow the guidance here: https://github.com/ckan/ckan/wiki/CKAN-2.9—New-Web-Server-options and go for Nginx with uwsgi.

ubuntu@ip-172-26-6-221:~$ sudo apt-get install nginx
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libgd3 libnginx-mod-http-geoip libnginx-mod-http-image-filter libnginx-mod-http-xslt-filter libnginx-mod-mail libnginx-mod-stream libwebp6 nginx-common nginx-core
Suggested packages:
...
...
...
Setting up libnginx-mod-http-image-filter (1.14.0-0ubuntu1.7) ...
Setting up nginx-core (1.14.0-0ubuntu1.7) ...
Not attempting to start NGINX, port 80 is already in use.
Setting up nginx (1.14.0-0ubuntu1.7) ...
Processing triggers for ureadahead (0.100.0-20) ...
Processing triggers for ufw (0.35-5) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
ubuntu@ip-172-26-6-221:~$ 

Ckan also uses an email server to send crash reports and other emails. For this, we will install postfix.

ubuntu@ip-172-26-6-221:~$ sudo apt-get install postfix
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Suggested packages:
  procmail postfix-mysql postfix-pgsql postfix-ldap postfix-pcre postfix-lmdb postfix-sqlite sasl2-bin dovecot-common resolvconf postfix-cdb mail-reader postfix-doc
The following NEW packages will be installed:
  postfix
...
...
...
After modifying main.cf, be sure to run 'service postfix reload'.

Running newaliases
Processing triggers for ureadahead (0.100.0-20) ...
Processing triggers for systemd (237-3ubuntu10.3) ...
Processing triggers for rsyslog (8.32.0-1ubuntu4) ...
Processing triggers for ufw (0.35-5) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
ubuntu@ip-172-26-6-221:~$ 

During postfix installation, I chose “Internet site” and for internal name, I chose my domain name as “masudklabs.com”.

The next step is to setup the WSGI script file (wsgi.py), and store it at the location: /etc/ckan/default

ubuntu@ip-172-26-6-221:/etc/nginx/sites-enabled$ cat /etc/ckan/default/wsgi.py 
import os
from ckan.config.middleware import make_app
from ckan.cli import CKANConfigLoader
from logging.config import fileConfig as loggingFileConfig
config_filepath = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'ckan.ini')
abspath = os.path.join(os.path.dirname(os.path.abspath(__file__)))
loggingFileConfig(config_filepath)
config = CKANConfigLoader(config_filepath).get_config()
application = make_app(config)
ubuntu@ip-172-26-6-221:/etc/nginx/sites-enabled$ 

Then install uwsgi, and make sure you are in the virtual environment.

(default) ubuntu@ip-172-26-6-221:~$ pip install uwsgi
Collecting uwsgi
  Downloading uWSGI-2.0.19.tar.gz (804 kB)
     |████████████████████████████████| 804 kB 11.7 MB/s 
Using legacy setup.py install for uwsgi, since package 'wheel' is not installed.
Installing collected packages: uwsgi
    Running setup.py install for uwsgi ... done
Successfully installed uwsgi-2.0.19
(default) ubuntu@ip-172-26-6-221:~$

After this, let’s create a uwsgi configuration file at /etc/ckan/default

(default) ubuntu@ip-172-26-6-221:~$ nano /etc/ckan/default/ckan-uwsgi.ini

And add the following settings in there. Some key settings to adjust to your local setup are uid and guid. My whole configuration is based on ubuntu username and group so that is what needs to be here. Another way to check is to identify which user in your virtual environment has access to /etc/ckan/default folder.

[uwsgi]
http            =  127.0.0.1:8080
uid             =  ubuntu
guid            =  ubuntu
wsgi-file       =  /etc/ckan/default/wsgi.py
virtualenv      =  /usr/lib/ckan/default
module          =  wsgi:application
master          =  true
pidfile         =  /tmp/%n.pid
harakiri        =  50
max-requests    =  5000
vacuum          =  true
callable        =  application  

At this point, you would want to make sure that your uwsgi server is always running in the background, ready to receive calls from your frontend server (which is Nginx for us). We will be using proxy pass for Nginx to pass calls to uwsgi, which in turn would send the call to ckan application and respond appropriately. For ensuring that uwsgi is always running, we will install supervisor.

(default) ubuntu@ip-172-26-6-221:~$ sudo apt-get install supervisor 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  apache2-bin apache2-data libaprutil1-dbd-sqlite3 libaprutil1-ldap liblua5.2-0 libpython2.7
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  python-meld3 python-pkg-resources
Suggested packages:
  python-setuptools supervisor-doc
The following NEW packages will be installed:
  python-meld3 python-pkg-resources supervisor
0 upgraded, 3 newly installed, 0 to remove and 231 not upgraded.
Need to get 415 kB of archives.
After this operation, 2138 kB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu bionic/main amd64 python-pkg-resources all 39.0.1-2 [128 kB]
Get:2 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu bionic/universe amd64 python-meld3 all 1.0.2-2 [30.9 kB]
Get:3 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu bionic/universe amd64 supervisor all 3.3.1-1.1 [256 kB]
Fetched 415 kB in 0s (17.2 MB/s)
Selecting previously unselected package python-pkg-resources.
(Reading database ... 84977 files and directories currently installed.)
Preparing to unpack .../python-pkg-resources_39.0.1-2_all.deb ...
Unpacking python-pkg-resources (39.0.1-2) ...
Selecting previously unselected package python-meld3.
Preparing to unpack .../python-meld3_1.0.2-2_all.deb ...
Unpacking python-meld3 (1.0.2-2) ...
Selecting previously unselected package supervisor.
Preparing to unpack .../supervisor_3.3.1-1.1_all.deb ...
Unpacking supervisor (3.3.1-1.1) ...
Processing triggers for ureadahead (0.100.0-20) ...
Setting up python-meld3 (1.0.2-2) ...
Setting up python-pkg-resources (39.0.1-2) ...
Setting up supervisor (3.3.1-1.1) ...
Created symlink /etc/systemd/system/multi-user.target.wants/supervisor.service → /lib/systemd/system/supervisor.service.
Processing triggers for systemd (237-3ubuntu10.3) ...
Processing triggers for man-db (2.8.3-2) ...
Processing triggers for ureadahead (0.100.0-20) ...
(default) ubuntu@ip-172-26-6-221:~$ 

Next, we will create a configuration file for supervisor to reload/respawn uwsgi. Create a new file called ckan-uwsgi.conf at /etc/supervisor/conf.d/ with the following settings. Make sure the paths reflected are correct.

[program:ckan-uwsgi]

command=/usr/lib/ckan/default/bin/uwsgi -i /etc/ckan/default/ckan-uwsgi.ini

; Start just a single worker. Increase this number if you have many or
; particularly long running background jobs.
numprocs=1
process_name=%(program_name)s-%(process_num)02d

; Log files - change this to point to the existing CKAN log files
stdout_logfile=/etc/ckan/default/uwsgi.OUT
stderr_logfile=/etc/ckan/default/uwsgi.ERR

; Make sure that the worker is started on system start and automatically
; restarted if it crashes unexpectedly.
autostart=true
autorestart=true

; Number of seconds the process has to run before it is considered to have
; started successfully.
startsecs=10

; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600

; Required for uWSGI as it does not obey SIGTERM.
stopsignal=QUIT

Restart supervisor.

(default) ubuntu@ip-172-26-6-221:~$ sudo service supervisor restart
(default) ubuntu@ip-172-26-6-221:~$ 

Now time to configure Nginx for proxy pass and to serve on the web. Remove the default file from /etc/nginx/sites-enabled folder and instead create a new file in there called ckan.conf. Then add the following in there (change server_name though).

proxy_cache_path /tmp/nginx_cache levels=1:2 keys_zone=cache:30m max_size=250m;
proxy_temp_path /tmp/nginx_proxy 1 2;

server {
    listen 80;
    listen [::]:80;
    server_name masudklabs.com;
    client_max_body_size 100M;
    location / {
        proxy_pass http://127.0.0.1:8080/;
        proxy_set_header X-Forwarded-For $remote_addr;
        proxy_set_header Host $host;
        proxy_cache cache;
        proxy_cache_bypass $cookie_auth_tkt;
        proxy_no_cache $cookie_auth_tkt;
        proxy_cache_valid 30m;
        proxy_cache_key $host$scheme$proxy_host$request_uri;
        # In emergency comment out line to force caching
        # proxy_ignore_headers X-Accel-Expires Expires Cache-Control;
    }

}

Restart Nginx and Voila!

(default) ubuntu@ip-172-26-6-221:/etc/ckan/default$ sudo service nginx restart
(default) ubuntu@ip-172-26-6-221:/etc/ckan/default$ 

It took me about 4.5 hours to get this working from the beginning to the end. Hopefully, this would save someone else some time and get you up and running quickly with ckan on Ubuntu 18 and Python 3.

When I get some time, I will be doing some further configuration, and start to look into theming.

2 comments On Setting up ckan on Ubuntu 18.04 – Python 3 and Amazon Lightsail

  • Great article, I am starting out with CKAN and this and the official docs have eased the stress levels…

    • That’s great to hear, I am glad it was useful. Let me know how you find working with it. I am currently working on customising the homepage and will write a separate blog post on that too.

Leave a Reply to Masud Khokhar Cancel Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Site Footer

Sliding Sidebar

Blog of Masud Khokhar

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Currently Reading