Sorry, you need to enable JavaScript to visit this website.

Linpack on Zynq

 

I was browsing the Zedboard Forums and saw a post about performance of the ARM processor.  So I thought I would do some linpack tests and post them here.

 

Here is a quick explanation of what linpack is:

 

http://en.wikipedia.org/wiki/LINPACK

 

I have put the zynq-linpack repo on github for easy downloading/forking.  The repo lives here:

 

https://github.com/zynqgeek/zynq-linpack

 

So let's get this thing running!

 

First, we need to make sure we have git installed.  git is a revision control system written by Linus Torvald, and is maintained here.  I am using Ubuntu 12.04 LTS.

 

zynqgeek@beth:~/arm-devel# sudo apt-get install git

 

Now that we have git installed we can 'clone', or make a copy off locally, the zynq-linpack repo.

 

 

zynqgeek@beth:~/arm-devel# git clone https://github.com/zynqgeek/zynq-linpack

Cloning into 'zynq-linpack'...

remote: Counting objects: 7, done.

remote: Compressing objects: 100% (6/6), done.

remote: Total 7 (delta 0), reused 4 (delta 0)

Unpacking objects: 100% (7/7), done.

zynqgeek@beth:~/arm-devel# cd zynq-linpack/

zynqgeek@beth:~/arm-devel/zynq-linpack# ls

create_bins.sh  linpack.c  README.md

zynqgeek@beth:~/arm-devel/zynq-linpack#

 

We need to change the permissions on the create_bins.sh file so we can execute it.  We will use chmod to do this, and give it execution rights.

 

zynqgeek@beth:~/arm-devel/zynq-linpack# chmod +x create_bins.sh

zynqgeek@beth:~/arm-devel/zynq-linpack# ls -al

total 44

drwxr-xr-x 3 root root  4096 Aug 12 19:30 .

drwxrwxr-x 7 tim  tim   4096 Aug 12 19:30 ..

-rwxr-xr-x 1 root root   406 Aug 12 19:30 create_bins.sh

drwxr-xr-x 8 root root  4096 Aug 12 19:30 .git

-rw-r--r-- 1 root root 21133 Aug 12 19:30 linpack.c

-rw-r--r-- 1 root root   137 Aug 12 19:30 README.md

 

Now we just need to execute the script, and it will create four files.  These files are single precision and double precision floating point versions of the linpack code, as well as both rolled and unrolled versions.

 

The difference between rolled and unrolled code comes in to play with loops.  Rolled code places loops as branch commands where the program counter is placed back up in a previous position in memory once the end of the loop is reached.  Unrolled code is a linear increase of the program counter though the loop code.  This results in much larger code, however over some loop structures and processor architecture this can lead to faster executing code.

 

The ARM processors within the Zynq-7000 EPP device have the NEON floating point core within them which allow them to handle both single and double precision floating point numbers.

 

Note: You will need to follow the instructions here if you do not have your development environment setup yet for compiling for ARM processors.

 

Well, let's execute our binary creation script!

 

root@beth:~/arm-devel/zynq-linpack# ./create_bins.sh

Creating linpack binaries for Zynq-7000 EPP ...

... Done

zynqgeek@beth:~/arm-devel/zynq-linpack# ls -al

total 124

drwxr-xr-x 3 root root  4096 Aug 12 19:42 .

drwxrwxr-x 7 tim  tim   4096 Aug 12 19:30 ..

-rwxr-xr-x 1 root root   396 Aug 12 19:38 create_bins.sh

drwxr-xr-x 8 root root  4096 Aug 12 19:41 .git

-rw-r--r-- 1 root root 21133 Aug 12 19:30 linpack.c

-rwxr-xr-x 1 root root 17329 Aug 12 19:42 linpack_dp_rolled_arm

-rwxr-xr-x 1 root root 17464 Aug 12 19:42 linpack_dp_unrolled_arm

-rwxr-xr-x 1 root root 17345 Aug 12 19:42 linpack_sp_rolled_arm

-rwxr-xr-x 1 root root 17502 Aug 12 19:42 linpack_sp_unrolled_arm

-rw-r--r-- 1 root root   137 Aug 12 19:30 README.md

 

Now that we have our binaries created we need to put them onto our ZedBoard.  We can do this using the ftp server that is running on the Zedboard.

 

Note: that the IP for my ZedBoard maybe different than yours.  Look here for a networking howto.

 

zynqgeek@beth:~/arm-devel/zynq-linpack# ftp 192.168.2.210

Connected to 192.168.2.210.

220 Operation successful

Name (192.168.2.210:tim): root

230 Operation successful

Remote system type is UNIX.

Using binary mode to transfer files.

ftp> cd root

250 Operation successful

ftp> mput *

 

...

 

ftp>

 

Now on our Zedboard we can go to our root directory and change the execution properties of the binary files so we can execute them.

 

zynq> chmod +x linpack_*

zynq> ls -al

total 86

drwxr-xr-x    2 12319    300           1024 Jan  3 07:37 .

drwxr-xr-x   17 12319    300           1024 Jan  3 06:58 ..

-rw-------    1 root     0              257 Jan  3 07:38 .ash_history

-rwxr-xr-x    1 root     0             7797 Aug 10  2012 helloworld

-rwxr-xr-x    1 root     0            17329 Jan  3 07:35 linpack_dp_rolled_arm

-rwxr-xr-x    1 root     0            17464 Jan  3 07:35 linpack_dp_unrolled_arm

-rwxr-xr-x    1 root     0            17345 Jan  3 07:35 linpack_sp_rolled_arm

-rwxr-xr-x    1 root     0            17502 Jan  3 07:35 linpack_sp_unrolled_arm

-rw-r--r--    1 root     0              512 Jul 12  2012 logo.bin

 

That's it!  Cool hu!?  Ok, now here are my results:

 

Single Precision, Rolled Loops

zynq> ./linpack_sp_rolled_arm

Rolled Single Precision Linpack

 

Rolled Single Precision Linpack

 

     norm. resid      resid           machep         x[0]-1        x[n-1]-1

       1.6        3.80277634e-05  1.19209290e-07 -1.38282776e-05 -7.51018524e-06

    times are reported for matrices of order   100

      dgefa      dgesl      total       kflops     unit      ratio

 times for array with leading dimension of  201

       0.00       0.00       0.00        inf       0.00       0.00

       0.01       0.00       0.01      68667       0.03       0.18

       0.00       0.00       0.00        inf       0.00       0.00

       0.00       0.00       0.00     137333       0.01       0.09

 times for array with leading dimension of 200

       0.01       0.00       0.01      68667       0.03       0.18

       0.00       0.00       0.00        inf       0.00       0.00

       0.01       0.00       0.01      68667       0.03       0.18

       0.01       0.00       0.01     137333       0.01       0.09

Rolled Single  Precision 137333 Kflops ; 10 Reps

DoublePrecision, Rolled Loops

zynq> ./linpack_dp_rolled_arm

Rolled Double Precision Linpack

 

Rolled Double Precision Linpack

 

     norm. resid      resid           machep         x[0]-1        x[n-1]-1

       1.7        7.41628980e-14  2.22044605e-16 -1.49880108e-14 -1.89848137e-14

    times are reported for matrices of order   100

      dgefa      dgesl      total       kflops     unit      ratio

 times for array with leading dimension of  201

       0.00       0.00       0.00        inf       0.00       0.00

       0.01       0.00       0.01      68667       0.03       0.18

       0.01       0.00       0.01      68667       0.03       0.18

       0.01       0.00       0.01     114444       0.02       0.11

 times for array with leading dimension of 200

       0.01       0.00       0.01      68667       0.03       0.18

       0.00       0.00       0.00        inf       0.00       0.00

       0.01       0.00       0.01      68667       0.03       0.18

       0.01       0.00       0.01     114444       0.02       0.11

Rolled Double  Precision 114444 Kflops ; 10 Reps

Single Precision, Unrolled Loops

zynq> ./linpack_sp_unrolled_arm

Unrolled Single Precision Linpack

 

Unrolled Single Precision Linpack

 

     norm. resid      resid           machep         x[0]-1        x[n-1]-1

       1.6        3.80277634e-05  1.19209290e-07 -1.38282776e-05 -7.51018524e-06

    times are reported for matrices of order   100

      dgefa      dgesl      total       kflops     unit      ratio

 times for array with leading dimension of  201

       0.01       0.00       0.01      68667       0.03       0.18

       0.01       0.00       0.01      68667       0.03       0.18

       0.00       0.00       0.00        inf       0.00       0.00

       0.00       0.00       0.01     114444       0.02       0.11

 times for array with leading dimension of 200

       0.00       0.00       0.00        inf       0.00       0.00

       0.01       0.00       0.01      68667       0.03       0.18

       0.00       0.00       0.00        inf       0.00       0.00

       0.00       0.00       0.01     114444       0.02       0.11

Unrolled Single  Precision 114444 Kflops ; 10 Reps

Double Precision, Unrolled Loops

zynq> ./linpack_dp_unrolled_arm

Unrolled Double Precision Linpack

 

Unrolled Double Precision Linpack

 

     norm. resid      resid           machep         x[0]-1        x[n-1]-1

       1.7        7.41628980e-14  2.22044605e-16 -1.49880108e-14 -1.89848137e-14

    times are reported for matrices of order   100

      dgefa      dgesl      total       kflops     unit      ratio

 times for array with leading dimension of  201

       0.01       0.00       0.01      68667       0.03       0.18

       0.00       0.00       0.00        inf       0.00       0.00

       0.01       0.00       0.01      68667       0.03       0.18

       0.00       0.00       0.00     171667       0.01       0.07

 times for array with leading dimension of 200

       0.00       0.00       0.00        inf       0.00       0.00

       0.01       0.00       0.01      68667       0.03       0.18

       0.00       0.00       0.00        inf       0.00       0.00

       0.01       0.00       0.01     114444       0.02       0.11

Unrolled Double  Precision 114444 Kflops ; 10 Reps

 

I will note here that I ran each of these several times and they did hop around a bit ... I'll have to write a script to run them each 100 times and collect some max/min/avg numbers.

 

Hope that sheds some light to the NEON performance of the ARM within the Zynq-7000 AP SoC!

 

Comments

Linpack is definitely that surprised me, so it's nice to learn everything about it here. As I work for the assignment writing site, this is just the perfect thing for me to learn today. I am happy that you were able to share this here.

Thanks a lot for sharing this thread here. I have had similar problems with my AMD processor, and now I can solve these problems easily. You always help me out, and that's why you are one of my favorite people on the internet. best waist trainer

I am glad to see the things you shared [url=http://www.allthestudent.com/]Heeve Point[/url] is best.