Showing posts with label Intel Compiler. Show all posts
Showing posts with label Intel Compiler. Show all posts

Installing Intel Compiler on a compute node

The Compute Nodes of a HPC are often on a private network and may not have access to the internet and they are often very lean. How do we compile the Intel Compiler on the compute node?

After untaring the Intel Compiler, and running install.sh and registering the serial number, you will notice that there will be a missing essential prequiste. How to know what is the missing important prequiste?

Go to the intel installation log file /tmp/intel.pset.root........, you can see in the log file "missing g++". To install g++, do the following:

# yum install gcc-c++

Problem solved!

Building OpenMPI with Intel Compiler (Ver 2)

This is a follow up from Thursday, April 2, 2009 Blog Entry Building OpenMPI with Intel Compiler

Step 1: Download the OpenMPI Software from http://www.open-mpi.org/ . The current stable version at point of writing is OpenMPI 1.3.2

Step 2: Download and Install the Intel Compilers from Intel Website. More information can be taken from Free Non-Commercial Intel Compiler Download

Step 3: Add the Intel Directory Binary Path to the Bash Startup
At my ~/.bash_profile directory, I've added
PATH=$PATH:/opt/intel/Compiler/11.0/081/bin/intel64
At command prompt
# source .bashrc

Step 4: Configuration Information
gunzip -c openmpi-1.2.tar.gz tar xf -
# cd openmpi-1.2
#./configure --prefix=/usr/local CC=icc CXX=icpc F77=ifort FC=ifort
# make all install

Step 5: Setting PATH environment for OpenMPI
At my ~/.bash_profile directory, I've added.
export PATH=/usr/local/bin:${PATH}  
export LD_LIBRARY_PATH=/opt/intel/Compiler/11.0/081/lib/intel64:${LD_LIBRARY_PATH}
(The LD_LIBRARY_PATH must point to /opt/intel/Compiler/11.0/081/lib/intel64/libimf.so)

Step 6: mpicc ........

Step 7: Repeat the procedures on the Compute Nodes

How to fix -fPIC errors

A very good article on fPIC error. See 3. HOWTO fix -fPIC errors by Gentoo Linux

If you have problem like " relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC .libs/assert.o: could not read symbols: Bad value ".

The article lists 4 cases of fPIC

Case 1: Broken Compiler
At least GCC 3.4 is known to have a broken implementation of the -fvisibility-inlines-hidden flag. The use of this flag is therefore highly discouraged, reported bugs are usually marked as RESOLVED INVALID. See bug 108872 for an example of a typical error message caused by this flag."
Case 2: Broken `-fPIC' support checks in configure
Many configure tools check whether the compiler supports the -fPIC flag or not. They do so by compiling a minimalistic program with the -fPIC flag and checking stderr. If the compiler prints *any* warnings, it is assumed that the -fPIC flag is not supported by the compiler and is therefore abandoned. Unfortunately, if the user specifies a non-existing flag (i.e. C++-only flags in CFLAGS or flags introduced by newer versions of GCC but unknown to older ones), GCC prints a warning too, resulting in borkage.

To prevent this kind of breakage, the AMD64 profiles use a bashrc that filters out invalid flags in C[XX]FLAGS

Case 3: Lack of `-fPIC' flag in the software to be built
This is the most common case. It is a real bug in the build system and should be fixed in the ebuild, preferably with a patch that is sent upstream. Assuming the error message looks like this:


Code Listing 6.1: A sample error message
.libs/assert.o: relocation R_X86_64_32 against `a local symbol' can not be used
when making a shared object; recompile with -fPIC .libs/assert.o: could not
read symbols: Bad value

This means that the file assert.o was not compiled with the -fPIC flag, which it should. When you fix this kind of error, make sure only objects that are used in shared libraries are compiled with -fPIC.
In this case, globally adding -fPIC to C[XX]FLAGS resolves the issue, although this practice is discouraged because the executables end up being PIC-enabled, too.

 Case 4: Linking dynamically against static archives
Sometimes a package tries to build shared libraries using statically built archives which are not PIC-enabled. There are two main reasons why this happens:
Often it is the result of mixing USE=static and USE=-static. If a library package can be built statically by setting USE=static, it usually doesn't create a .so file but only a .a archive. However, when GCC is given the -l flag to link to said (dynamic or static) library, it falls back to the static archive when it can't find a shared lib. In this case, the preferred solution is to build the static library using the -fPIC flag too.

Sometimes it is also the case that a library isn't intended to be a shared library at all, e.g. because it makes heavy usage of global variables. In this case the solution is to turn the to-be-built shared library into a static one.

    Resolving Single and Double Precision Discrepancy between pre-Nehalem Chipsets and Nehalem Chipsets

    One of our researchers was running a job running on an SMP with older Intel Processors such as Intel(R) Xeon(R) CPU X7460 @2.66GHz (code-named "Dunnington") and we notice the accuracy between single and double precision was in the order of 5 decimal different.

    For example:
    0.623291xxxxxxx (Single Precision Code)
    0.623290xxxxxxx (Double Precision Code)

    One important thing to note is that the Intel Compiler is 11.x

    But if we run the same code on the newer Intel Nehalem Architecture, you will see that the discrepancy between the single and double precision quite large. We notice the discrepancy of the order of 1 decimal point.

    For example:
    0.523667xxxxx (Single Precision Code)
    0.4353836xxxxx (Double Precision Code)

    Similarly, the Compiler used is the Intel Compiler 11.x

    If we compare the results between the Dunnington Chipsets and the Nehalem Architecture, the discrepancy is really quite unacceptable.

    Well, the solution is actually quite easy, you should update the Intel Compiler to the latest Intel® Parallel Studio XE 2011 for Linux* and your discrepancy should be eliminated and your results should be similar as what given to discrepancy. The Intel® Parallel Studio XE 2011 for Linux* has the latest libraries for the Nehalem Architecture.

    For more information on where to download, do look at the Free Non-Commercial Intel Compiler Download

    Intel® Optimized LINPACK Benchmark for Linux OS

    This blog entry is taken from Intel(R) Math Kernel Library for the LINUX* OS User's Guide. This document comes when you download and install the Math Kernel Library.

    To download, see Intel® Math Kernel Library – LINPACK Download

    Intel® Optimized LINPACK Benchmark is a generalization of the LINPACK 1000 benchmark.
    It solves a dense (real*8) system of linear equations (Ax=b), measures the amount of
    time it takes to factor and solve the system, converts that time into a performance rate, and tests the results for accuracy. The generalization is in the number of equations (N) it can solve, which is not limited to 1000. It uses partial pivoting to assure the accuracy of the results.

    Intel is providing optimized versions of the LINPACK benchmarks to make it easier than using HPL for you to obtain high LINPACK benchmark results on your systems based on genuine Intel® processors. Use this package to benchmark your SMP machine.



    1. Running the Software
    To Run pre-determined sample problem sizes on a give system
    # ./runme_xeon32 OR
    #./runme_xeon64
    To run problem for other problem sizes, you can use and amend liniput_xeon32, and liniputxeon64. However each input file requires the following amount of memory:
    • lininput_xeon32 2GB
    • lininput_xeon64 16GB



    2. Known Limitation
    • Intel LINPACK Benchmark is threaded to effectively use multiple processors. In multi-processor systems, best performance will be obtained with Hyper-Threading Technology turned off.
    • If an incomplete data input file is given, binaries may either hang or fault

    Free Non-Commercial Intel Compiler Download

    The Intel® Software Development Products listed below are available for free non-commercial download. Click on a product to initiate the download process.
    Non-Commercial Software Download

    Changing Compilers from Intel to PGI for OpenMPI

    Extending from Building OpenMPI with Intel Compiler (Ver 2). Assuming you are using Intel Compiler as the default compiler.

    Step 1: If you wish to change to another compiler like PGI (in this example) from the Intel Compiler,
    You can issue the commands
    # export CC=pgcc
    # export CXX=pgCC
    # export F77=pgf77
    # export FC=pgf90

    To check that the openmpi is using the PGI compiler
    # /usr/local/openmpi/bin/mpicc --showme
    # /usr/local/openmpi/bin/mpiCC --showme
    # /usr/local/openmpi/bin/mpif77 --showme
    # /usr/local/openmpi/bin/mpif90 --showme


    Step 2: If you wish to "return" to the default
    # unset CC
    # unset CXX
    # unset F77
    # unset FC


    To check that the openmpi is using the PGI compiler
    # /usr/local/openmpi/bin/mpicc --showme
    # /usr/local/openmpi/bin/mpiCC --showme
    # /usr/local/openmpi/bin/mpif77 --showme
    # /usr/local/openmpi/bin/mpif90 --showme

    Installing Cluster OpenMP* for Intel® Compilers



    Overview
    OpenMP* is a high level, pragma-based approach to parallel application programming. Cluster OpenMP is a simple means of extending OpenMP parallelism to 64-bit Intel® architecture-based clusters. It allows OpenMP code to run on clusters of Intel® Itanium® or Intel® 64 processors, with only slight modifications. 


    Prerequisite
    Cluster OpenMP use requires that you already have the latest version of the Intel® C++ Compiler for Linux* and/or the Intel® Fortran Compiler for Linux*.


    Benefits of Cluster OpenMP
    1. Simplifies porting of serial or OpenMP code to clusters.
    2. Requires few source code modifications, which eases debugging.
    3. Allows slightly modified OpenMP code to run on more processors without requiring investment in expensive Symmetric Multiprocessing (SMP) hardware.
    4. Offers an alternative to MPI. Is easier to learn and faster to implement.

    How to Install Cluster OpenMP.
    1. Installing Cluster OpenMP is simple. First you have to install Intel Compilers. For more information, see Blog Entry Free Non-Commercial Intel Compiler Download
    2. After installation of the Compilers, download the Cluster OpenMP License File from Cluster OpenMP Download site
    3. Place the Cluster OpenMP License file at the License Directory. Usually it is at /opt/intel/licenses
    4. With the Cluster OpenMP license file in place it will make it possible to use either the “-cluster-openmp” or “-cluster-openmp-profile” compiler options with your compiler when compiling a program.

    Using Intel Compilers with Eclipse IDE on Linux

    Intel has come up with a dated documentation on Using Intel Compilers with Eclipse IDE on Linux. Still useful to give an idea of how to integrate the intel compilers to Eclipse

    Intel® C++ Compiler for Linux* - Using Intel® Compilers with the Eclipse* IDE [PDF]

    A Hello World OpenMPI program with Intel

    I compiled a simple parallel hello world program to test whether OpenMPI is working well with Intel Compilers using the example taken from Compiler Examples from https://wiki.mst.edu/nic/how_to/compile/openmpi-intel-compile

    Step 1: Ensure your OpenMPI is compiled with Intel. Read the Building OpenMPI with Intel Compiler (Ver 2) for more information


    Step 2: Cut and paste the parallel program taken from https://wiki.mst.edu/nic/how_to/compile/openmpi-intel-compile. Compile the C++ program with mpi
    $ mpicxx -o openmpi-intel-hello mpi_hello.cpp


    Step 3: Test on SMP Machine
    $ mpirun -np 8 open-intel-hello


    Step 4: Test on  Distributed Cluster
    $ mpirun -np 8 -hostfile hostfile.file open-intel-hello
    You should see some output something like
    Returned: 0 Hello World! I am 1 of 8
    Returned: 0 Hello World! I am 6 of 8
    Returned: 0 Hello World! I am 3 of 8
    Returned: 0 Hello World! I am 0 of 8
    Returned: 0 Hello World! I am 2 of 8
    Returned: 0 Hello World! I am 5 of 8
    Returned: 0 Hello World! I am 4 of 8
    Returned: 0 Hello World! I am 7 of 8

    Compiling ScaLAPACK

    ScaLAPACK is a library of high-performance linear algebra routines for distributed-memory message-passing MIMD computers and networks of workstations supporting PVM [68] and/or MPI [64, 110]
    There are 2 ways you can compile ScaLAPACK, you can download scalapack.tgz and manually compile. Do look at the excellent article ScaLAPACK, LAPACK, BLACS and ATLAS on OpenMPI Linux installation tutorial

    One challenges you might face is that if scaLAPACK dependencies are compiled with different Fortran compilers, you will face quite a challenge to complete the compilation.

    Alternatively you can use the scalapack installer from http://www.netlib.org/scalapack/. 
    Do look at the README to see the flags you will need.

    forrtl: severe (24): end-of-file during read & forrtl

    forrtl: severe (24): end-of-file during read & forrtl: severe (67): input statement requires too much data....Read this forum entry for more information. http://software.intel.com/en-us/forums/showthread.php?t=63981

    Solving error while loading shared libraries: libmpi_f77.so.0

    If you encontered this error "error while loading shared libraries: libmpi_f77.so.0: cannot open shared object file: No such file or directory".

    This is due to missing /usr/local/lib files which are not accessible by the compute nodes. After you have made the libraries available, you may have to add /usr/local/lib to your LD_LIBRARY_PATH environment


    You should be able to eliminate the error.

    Using Intel® MKL with Threaded Applications

    It seems that calling Intel MKL routines that are threaded from multiple application threads can lead to conflict (including incorrect answers or program failures) or at best longer unexpected CPU times.

    A good and through description is given by the Intel Website on the issue as well as the workaround
    .Using Intel® MKL with Threaded Applications.

    The crux of the problem accoring to Intel is as followed:

    Intel MKL can be aware that it is in a parallel region only if the threaded program and Intel MKL are using the same threading library. If the user program is threaded by some other means, Intel MKL may operate in multithreaded mode and the computations may be corrupted. Here is Intel recommendation
    Here are several cases and our recommendations:


    1. User threads the program using OS threads (pthreads on Linux*, Win32* threads on Windows*). If more than one thread calls Intel MKL and the function being called is threaded, it is important that threading in Intel MKL be turned off. Set OMP_NUM_THREADS=1 in the environment.
    2. User threads the program using OpenMP directives and/or pragmas and compiles the program using a compiler other than a compiler from Intel. This is more problematic because setting OMP_NUM_THREADS in the environment affects both the compiler's threading library and the threading library with Intel MKL. In this case, the safe approach is to set OMP_NUM_THREADS=1.
    3. Multiple programs are running on a multiple-CPU system. In cluster applications, the parallel program can run separate instances of the program on each processor. However, the threading software will see multiple processors on the system even though each processor has a separate process running on it. In this case OMP_NUM_THREADS should be set to 1.
    4. If the variable OMP_NUM_THREADS environment variable is not set, then the default number of threads will be assumed 1. 


    Setting the Number of Threads for OpenMP* (OMP)

    The OpenMP* software responds to the environment variable OMP_NUM_THREADS:
    1. Windows*: Open the Environment panel of the System Properties box of the Control Panel on Microsoft* Windows NT*, or it can be set in the shell the program is running in with the command: set OMP_NUM_THREADS=.
    2. Linux*: To set and export the variableP "export OMP_NUM_THREADS=".
    This is issue was mentioned by Axel Kohlmeyer at this forum on Parallization Issues

    Building Open MPI* with the Intel® compilers

    A good hands-on Tutorial from Intel on how to Build OpenMPI with Intel Compilers.
    Building Open MPI* with the Intel Compilers

    A few things to note:
    1. Make sure your path to your Intel Compilers are path-ed already