Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Programming Threads > little research...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 1 of 1 Topic 4072 of 4146
Post > Topic >>

little research on false sharing [long post]

by Wojciech Waga <adun_wywal@[EMAIL PROTECTED] > Oct 7, 2008 at 06:56 PM

Hello,

can anyone confirm on deny my conclusions on execution times on 
different cpus:

there's a tiny program which doesn nothing except for incrementation 
particular variable. There is one such a variable per thread to be 
incremented. I'm running this program on different cpus to observe the 
false sharing phenomenon.

computers:

1. single core  [ AMD Athlon XP ]
2. single core HT [ Pentium IV ]
3. dual core separate caches [AMD Turion X2]
4. dual core common L2 cache [Core 2 DUO ]
5. 2xdual core separate caches [ 2x Opteron ]

notation: ./a.out  N means that N threads are spawned, amount of work is 
pro****tional to the number of threads so execution time should increase 
with N.

1.
../a.out 1
real    0m1.447s
user    0m1.440s

../a.out 2

real    0m2.758s
user    0m2.744s

../a.out 4
real    0m5.382s
user    0m5.364s

note: linear scaling is what was to be expected

2.
../a.out 1
real    0m0.617s
user    0m0.616s

../a.out 2
real    0m1.164s
user    0m2.244s

../a.out 4
real    0m2.277s
user    0m4.464s

note: same as above

3.
../a.out 1
real    0m1.011s
user    0m1.008s

../a.out 2
real    0m5.108s
user    0m8.817s

../a.out 4
real    0m10.790s
user    0m19.573s


note: non-linear growth - false sharing

4.
../a.out 1
real    0m0.692s
user    0m0.692s

../a.out 2
real    0m1.025s
user    0m1.588s

../a.out 4
real    0m1.993s
user    0m3.524s

note: interesting example. time is growing at slower rate that in 
example 3. Does it mean that in CPUs with common L2 cache false  sharing 
takes place only at level L1? And common cache is better than separate? 
What are drawbacks of common cache? (reduced transfer rate per core?)

4.
../a.out 1
real    0m0.544s
user    0m0.540s

../a.out 2
real    0m1.787s
user    0m2.952s

../a.out 4
real    0m7.460s
user    0m22.817s

note: why there is so big jump from 2 to 4? Is it because false sharing 
occurs among two separate chips? (unfortunately I do not have an access 
to 2xsingle core)

*****************************88

And now the most concerning thing for me. I realized that I forgot to 
set an optimization flag in gcc namely "-O3". Each architecture 
responded to this change exactly as it should but one: Pentium IV HT.

2'.
../a.out 1
real    0m0.269s
user    0m0.268s

note: time drop due to -O3, everything's ok

../a.out 2
real    0m2.767s
user    0m5.436s

note: what the hell??

../a.out 4
real    0m5.704s
user    0m11.261s



thank you in advance for any help
Wojtek








the code:

#include <stdio.h>
#include <string.h>
#include <pthread.h>
#include <stdlib.h>

typedef unsigned char uchar;

typedef struct
{
   unsigned long long pos;
   uchar *in;
   uchar *out;
} obj;

int NUM=1;

pthread_attr_t attr;

void* cryptThread(void* ptr)
{
   int i;
   obj *cS=(obj*)ptr;
   int s=1;
   for ( i=0; i<10000000; i++)
     cS->out[cS->pos]+=s;
   return s%1; //to avoid removal of loop which does nothing
}

void test(int *in,int *out, int len)
{
   int rV,status,i,j;
   obj cs[NUM];

   for (i=0; i<NUM; i++)
     {
       cs[i].in=(uchar*)in;
       cs[i].out=(uchar*)out;
       cs[i].pos=i;
     }

   pthread_t threads[NUM];

   for (i=0; i<NUM; i++)
     {
       rV=pthread_create(&threads[i],&attr,cryptThread,&cs[i]);
       if (rV) exit(-1);
     }

for(i=0; i<NUM; i++)
     {
       rV = pthread_join(threads[i], NULL);
       if (rV)  exit(-1);
     }
}


int main(int argc, char* argv[])
{
   int i;

   if (argc==2) NUM=atoi(argv[1]);

   pthread_attr_init(&attr);
   pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);

   int *plaintext=(int*)malloc(NUM*sizeof(int));
   int *ciphertext=(int*)malloc(NUM*sizeof(int));

   for (i=0; i<10; i++)
     test(plaintext,ciphertext,NUM);

   free(plaintext);
   free(ciphertext);

   pthread_attr_destroy(&attr);
   pthread_exit(NULL);
   return 0;
}
 




 1 Posts in Topic:
little research on false sharing [long post]
Wojciech Waga <adun_wy  2008-10-07 18:56:21 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Sat Nov 22 7:44:55 CST 2008.