Monday, November 30, 2015

Impala Build Instructions

Cloudera/Apache Impala Build Instructions
CentoOS 6.4


The build instructions on the github wiki are underspecified. 

https://github.com/cloudera/Impala/wiki
Build Instructions from above wiki: https://github.com/cloudera/Impala/wiki/How-to-build-Impala

The best source seems to be the docker file from the Cloudera Impala User group. 

https://02310473242028226360.googlegroups.com/attach/9a37759db195a/Dockerfile.txt?part=0.1&view=1&vt=ANaJVrHBf3w5z6BMg4OU_77TeUPNNjnfpKsxFOCjpIGDBi1A2LXXaMsBhGugKTlXCHfhvhLky_hDaArwCL_o2UHZtE8OxzBa4qf2tugb3zLHJT2qKPQr3YU

In the event this post disappears, the dockerfile.txt look like:
FROM centos:6

ENV HOSTNAME localhost

RUN rm /etc/yum.repos.d/*.repo
COPY container_root/etc/yum.repos.d/mirror.repo /etc/yum.repos.d/

# Remove this flag otherwise man-pages won't be generated.
RUN sed -i /tsflags=nodocs/d /etc/yum.conf

RUN yum clean all
RUN yum -y update

RUN yum -y groupinstall "Development Tools"
RUN yum -y install \
    ant \
    ant-nodeps \
    automake \
    bash-completion \
    bison \
    bzip2-devel \
    cmake \
    curl \
    cyrus-sasl-gssapi \
    cyrus-sasl-plain \
    db4-devel \
    doxygen.x86_64 \
    emacs \
    flex \
    gcc-c++ \
    gdb \
    git \
    glib-devel \
    groff \
    krb5-workstation \
    libevent-devel \
    libtool \
    lsof \
    lzo-devel \
    lzop \
    make \
    man \
    man-pages \
    net-tools \
    openldap-devel \
    openssh-server \
    openssl-devel \
    postgresql \
    postgresql-devel \
    postgresql-server \
    psmisc \
    python-argparse \
    python-devel \
    python-ipython \
    python-pip \
    python-setuptools \
    redhat-lsb \
    subversion \
    sudo \
    svn \
    vim \
    wget \
    zsh

COPY container_root/tmp/oracle-j2sdk1.7-1.7.0+update67-1.x86_64.rpm /tmp/
RUN yum -y localinstall --nogpgcheck /tmp/oracle-j2sdk1.7-1.7.0+update67-1.x86_64.rpm
RUN update-alternatives --install \
    /usr/bin/java java /usr/java/jdk1.7.0_67-cloudera/bin/java 999
RUN update-alternatives --set java /usr/java/jdk1.7.0_67-cloudera/bin/java
ENV JAVA_HOME /usr/java/jdk1.7.0_67-cloudera

COPY container_root/opt/apache-maven.tar.gz /opt/
RUN tar --directory /opt -xzf /opt/apache-maven.tar.gz
RUN ln -s $(find /opt -name mvn) /usr/bin/

# For some reason hgdistver needs to be installed first.
RUN pip install hgdistver
RUN pip install \
    allpairs \
    git-review \
    impyla \
    paramiko \
    pexpect \
    prettytable \
    psutil==0.7.1 \
    psycopg2 \
    pyhive \
    pytest \
    pytest-xdist \
    pywebhdfs \
    sqlparse \
    texttable

RUN echo '%wheel ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
RUN sed -i '/requiretty/d' /etc/sudoers
ENV SUDO_GROUP wheel

RUN sed -i 114d /etc/init.d/postgresql
RUN service postgresql initdb
RUN sed -i s:ident:trust:g /var/lib/pgsql/data/pg_hba.conf
RUN service postgresql start \
    && sleep 5 \
    && sudo -u postgres psql -c " \
        CREATE ROLE hiveuser LOGIN PASSWORD 'password'; \
        ALTER ROLE hiveuser WITH CREATEDB;"

RUN unlink /etc/localtime && \
    ln -s /usr/share/zoneinfo/America/Los_Angeles /etc/localtime

RUN echo root:cloudera | chpasswd

ENV BASH_COMPLETION /etc/bash_completion

COPY container_root/opt/llvm.tar.gz /opt/llvm/
WORKDIR /opt/llvm
RUN tar -xzf llvm.tar.gz
RUN rm llvm.tar.gz
WORKDIR llvm-3.3.src/tools
RUN svn co http://llvm.org/svn/llvm-project/cfe/tags/RELEASE_33/final/ clang
WORKDIR ../projects
RUN svn co http://llvm.org/svn/llvm-project/compiler-rt/tags/RELEASE_33/final/ \
    compiler-rt
WORKDIR ..
RUN ./configure --with-pic
RUN make -j$(nproc) REQUIRES_RTTI=1
RUN make install

COPY container_root/opt/boost.tar.bz2 /opt/boost/
WORKDIR /opt/boost
RUN tar -xjf boost.tar.bz2
RUN rm boost.tar.bz2
WORKDIR /opt/boost/boost_1_46_1
RUN ./bootstrap.sh
# Impala has a strange setup where it expects the regex, system, and filesystem libs to
# be tagged with "mt" but date_time is not tagged with "mt".
RUN ./bjam threading=multi --layout=tagged --with-regex --with-system --with-filesystem \
    --with-thread install
RUN ./bjam threading=multi --with-date_time install

# The default is way too small; the limit is basically reached after run-all.sh.
RUN sed -i s:1024:10240:  /etc/security/limits.d/90-nproc.conf

COPY container_root/bin/docker-boot /bin/
COPY container_root/bin/docker-boot-daemon /bin/
COPY container_root/bin/docker-ip /bin/
CMD /bin/docker-boot-daemon

END DOCKER FILE. 

The tested instructions based on the above docker file for CentOS6.4 is: 

1) Install Prereqs
 yum -y groupinstall "Development Tools"
yum -y install \
    ant \
    ant-nodeps \
    automake \
    bash-completion \
    bison \
    bzip2-devel \
    cmake \
    curl \
    cyrus-sasl-gssapi \
    cyrus-sasl-plain \
    db4-devel \
    doxygen.x86_64 \
    emacs \
    flex \
    gcc-c++ \
    gdb \
    git \
    glib-devel \
    groff \
    krb5-workstation \
    libevent-devel \
    libtool \
    lsof \
    lzo-devel \
    lzop \
    make \
    man \
    man-pages \
    net-tools \
    openldap-devel \
    openssh-server \
    openssl-devel \
    postgresql \
    postgresql-devel \
    postgresql-server \
    psmisc \
    python-argparse \
    python-devel \
    python-ipython \
    python-pip \
    python-setuptools \
    redhat-lsb \
    subversion \
    sudo \
    svn \
    vim \
    wget \
    zsh

Install jdk-1.7 & set JAVA_HOME. I had this already done. The commands from the docker script are listed below. I didn't test this sequence of commands but it looks correct.


yum -y localinstall --nogpgcheck /tmp/oracle-j2sdk1.7-1.7.0+update67-1.x86_64.rpm

update-alternatives --install \
    /usr/bin/java java /usr/java/jdk1.7.0_67-cloudera/bin/java 999

update-alternatives --set java /usr/java/jdk1.7.0_67-cloudera/bin/java

EXPORT JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
EXPORT PATH=$PATH:$JAVA_HOME/bin

Test by:

[root@r2341-d5-us04 boost]# java -version
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
[root@r2341-d5-us04 boost]#


Install apache-maven-3.3. I already had this installed. The instructions below are not tested


mv apache-maven-3.3-bin.zip /opt
unzip apache-maven-3.3-bin.zip
export MAVEN_HOME=/opt/apache-maven-3.3
export PATH=$PATH:$MAVEN_HOME/bin

The docker instructions take a different approach which is probably better because it hard links the mvn binary to /usr/bin/mvn which doesn't depend on the PATH environment variable; sometimes mvn may not be found depending on which user the build is being run as. 

container_root/opt/apache-maven.tar.gz /opt/
tar --directory /opt -xzf /opt/apache-maven.tar.gz
ln -s $(find /opt -name mvn) /usr/bin/



[root@r2341-d5-us04 boost]# mvn -version
Apache Maven 3.2.3 (33f8c3e1027c3ddde99d3cdebad2656a31e8fdf4; 2014-08-11T13:58:10-07:00)
Maven home: /home/build/java-tools/apache-maven-3.2.3
Java version: 1.7.0_67, vendor: Oracle Corporation
Java home: /usr/java/jdk1.7.0_67-cloudera/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-504.el6.x86_64", arch: "amd64", family: "unix"

pip install hgdistver
You are using pip version 6.0.8, however version 7.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Collecting hgdistver
  Downloading hgdistver-0.25-py2.py3-none-any.whl
Installing collected packages: hgdistver

Successfully installed hgdistver-0.25

pip install \
    allpairs \
    git-review \
    impyla \
    paramiko \
    pexpect \
    prettytable \
    psutil==0.7.1 \
    psycopg2 \
    pyhive \
    pytest \
    pytest-xdist \
    pywebhdfs \
    sqlparse \
    texttable

I had an old version of pip which wasn't compatible with python2.6 which ressulted in an error message: 

 Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-00Unmk/psycopg2

Upgrade pip and run with python-2.7

[root@r2341-d5-us34 ~]# pip install --upgrade pip
You are using pip version 6.0.8, however version 7.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Collecting pip from https://pypi.python.org/packages/py2.py3/p/pip/pip-7.1.2-py2.py3-none-any.whl#md5=5ff9fec0be479e4e36df467556deed4d
  Downloading pip-7.1.2-py2.py3-none-any.whl (1.1MB)
    100% |################################| 1.1MB 193kB/s
Installing collected packages: pip
  Found existing installation: pip 6.0.8
    Uninstalling pip-6.0.8:
      Successfully uninstalled pip-6.0.8

Successfully installed pip-7.1.2
[root@r2341-d5-us34 ~]#

[root@r2341-d5-us34 ~]# /opt/tools/bin/python2.7 -m pip install psycopg2

Downloading/unpacking psycopg2
  Downloading psycopg2-2.6.1.tar.gz (371kB): 371kB downloaded
  Running setup.py egg_info for package psycopg2

    Error: pg_config executable not found.

    Please add the directory containing pg_config to the PATH
    or specify the full executable path with the option:

        python setup.py build_ext --pg-config /path/to/pg_config build ...

    or with the pg_config option in 'setup.cfg'.
    Complete output from command python setup.py egg_info:
    running egg_info

creating pip-egg-info/psycopg2.egg-info

writing pip-egg-info/psycopg2.egg-info/PKG-INFO

writing top-level names to pip-egg-info/psycopg2.egg-info/top_level.txt

writing dependency_links to pip-egg-info/psycopg2.egg-info/dependency_links.txt

writing manifest file 'pip-egg-info/psycopg2.egg-info/SOURCES.txt'

warning: manifest_maker: standard file '-c' not found



Error: pg_config executable not found.



Please add the directory containing pg_config to the PATH

or specify the full executable path with the option:



    python setup.py build_ext --pg-config /path/to/pg_config build ...



or with the pg_config option in 'setup.cfg'.

----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /tmp/pip_build_root/psycopg2
Storing complete log in /root/.pip/pip.log



[root@r2341-d5-us34 ~]# pip install pyhive \
>     pytest \
>     pytest-xdist \
>     pywebhdfs \
>     sqlparse \
>     texttable


Collecting pyhive
/usr/lib/python2.6/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:90: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  Downloading PyHive-0.1.6.tar.gz
Collecting pytest
  Downloading pytest-2.8.3-py2.py3-none-any.whl (149kB)
    100% |████████████████████████████████| 151kB 1.3MB/s
Collecting pytest-xdist
  Downloading pytest_xdist-1.13.1-py2.py3-none-any.whl
Collecting pywebhdfs
  Downloading pywebhdfs-0.4.0.tar.gz
Collecting sqlparse
  Downloading sqlparse-0.1.18.tar.gz (58kB)
    100% |████████████████████████████████| 61kB 3.0MB/s
Collecting texttable
  Downloading texttable-0.8.4.tar.gz
Collecting argparse (from pytest)
  Downloading argparse-1.4.0-py2.py3-none-any.whl
Collecting py>=1.4.29 (from pytest)
  Downloading py-1.4.30-py2.py3-none-any.whl (81kB)
    100% |████████████████████████████████| 86kB 2.3MB/s
Collecting execnet>=1.1 (from pytest-xdist)
  Downloading execnet-1.4.1-py2.py3-none-any.whl (40kB)
    100% |████████████████████████████████| 40kB 4.1MB/s
Collecting requests (from pywebhdfs)
  Downloading requests-2.8.1-py2.py3-none-any.whl (497kB)
    100% |████████████████████████████████| 499kB 492kB/s
Collecting six (from pywebhdfs)
  Downloading six-1.10.0-py2.py3-none-any.whl
Collecting apipkg>=1.4 (from execnet>=1.1->pytest-xdist)
  Downloading apipkg-1.4-py2.py3-none-any.whl
Installing collected packages: pyhive, argparse, py, pytest, apipkg, execnet, pytest-xdist, requests, six, pywebhdfs, sqlparse, texttable
  Running setup.py install for pyhive
  Running setup.py install for pywebhdfs
  Running setup.py install for sqlparse
  Running setup.py install for texttable
Successfully installed apipkg-1.4 argparse-1.4.0 execnet-1.4.1 py-1.4.30 pyhive-0.1.6 pytest-2.8.3 pytest-xdist-1.13.1 pywebhdfs-0.4.0 requests-2.8.1 six-1.10.0 sqlparse-0.1.18 texttable-0.8.4



Modify sudo:

RUN echo '%wheel ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
RUN sed -i '/requiretty/d' /etc/sudoers
ENV SUDO_GROUP wheel

translated to uncomment the # %wheel        ALL=(ALL) NOPASSWD: ALL in /tec/sudoers

#ENV SUDO_GROUP wheel

RUN sed -i 114d /etc/init.d/postgresql
RUN service postgresql initdb
RUN sed -i s:ident:trust:g /var/lib/pgsql/data/pg_hba.conf
RUN service postgresql start \
    && sleep 5 \
    && sudo -u postgres psql -c " \
        CREATE ROLE hiveuser LOGIN PASSWORD 'password'; \
        ALTER ROLE hiveuser WITH CREATEDB;"

RUN unlink /etc/localtime && \
    ln -s /usr/share/zoneinfo/America/Los_Angeles /etc/localtime

#do not run this command!!!! this changes your root password to cloudera
#RUN echo root:cloudera | chpasswd
#this doesnt do anything in our system; not necessary
ENV BASH_COMPLETION /etc/bash_completion



2) check out llvm-3.3-src. 
http://llvm.org/releases/download.html#3.3
copy to /opt, unarchive and build



Test by:
[root@r2341-d5-us04 boost]# mvn -version
Apache Maven 3.2.3 (33f8c3e1027c3ddde99d3cdebad2656a31e8fdf4; 2014-08-11T13:58:10-07:00)
Maven home: /home/build/java-tools/apache-maven-3.2.3
Java version: 1.7.0_67, vendor: Oracle Corporation
Java home: /usr/java/jdk1.7.0_67-cloudera/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-504.el6.x86_64", arch: "amd64", family: "unix"



4) Install Boost and llvm-3.3-src following the pattern in the Docker file above. The requirements are any version >1.46 but not all the versions seem to be compatible. I used 1.59. Download the tar file and: 








No comments:

Post a Comment