cuda - Separating out .cu and .cpp(using c++11 library) -


i trying convert c++ program have uses random library c++11 feature. after having read through couple of similar posts here, tried separating out code 3 files. @ outset not conversant @ c/c++ , use r @ work.

the main file looks follows.

#ifndef _kernel_support_ #define _kernel_support_ #include <complex> #include <random> #include <iostream> #include "my_code_header.h" using namespace std; std::default_random_engine generator; std::normal_distribution<double> distribution(0.0,1.0); const int rand_mat_length = 24561; double rand_mat[rand_mat_length];// = {0}; void create_std_norm(){   for(int = 0 ; < rand_mat_length ; i++)     ::rand_mat[i] = distribution(generator); } . . . int main(void) {   ...   ...   call_global();   return 0; } #endif 

the header file looks follows.

#ifndef mykernel_h #define mykernel_h void call_global(); void two_d_example(double *a, double *b, double *my_result, size_t length, size_t width); #endif 

and .cu file looks following.

#ifndef _my_kernel_ #define _my_kernel_ #include <iostream> #include "my_code_header.h" #define tile_width 8 using namespace std; __global__ void two_d_example(double *a, double *b, double *my_result, size_t length, size_t width) {   unsigned int row = blockidx.y*blockdim.y + threadidx.y;   unsigned int col = blockidx.x*blockdim.x + threadidx.x;   if ((row>length) || (col>width)) {     return;   }   ...  } void call_global() {   const size_t imagelength = 528;   const size_t imagewidth = 528;   const dim3 threadsperblock(tile_width,tile_width);   const dim3 numblocks(((imagelength) / threadsperblock.x), ((imagewidth) / threadsperblock.y));   double *d_a, *d_b, *mys ;    ...   cudamalloc((void**)&d_a, sizeof(double) * imagelength);   cudamalloc((void**)&d_b, sizeof(double) * imagewidth);   cudamalloc((void**)&mys, sizeof(double) * imagelength * imagewidth);    two_d_example<<<numblocks,threadsperblock>>>(d_a, d_b, mys, imagelength, imagewidth);   ...     cudafree(d_a);   cudafree(d_b);   }  #endif 

please note __global__ has been removed .h since getting following error owing being compiled g++.

in file included my_code_main.cpp:12:0: my_code_header.h:5:1: error: ‘__global__’ not name type 

when compile .cu file nvcc fine , generates my_code_kernel.o. since using c++11 in .cpp trying compile g++ , getting following error.

/tmp/ccr2rxzf.o: in function `main': my_code_main.cpp:(.text+0x1c4): undefined reference `call_global()' collect2: ld returned 1 exit status 

i understand might not have cuda such , may wrong use of including header @ both places. right way compile , importantly link my_code_kernel.o , my_code_main.o(hopefully)? sorry if question trivial!

it looks not linking my_code_kernel.o. have used -c nvcc command (causes compile not link, i.e. generate .o file), i'm going guess you're not using -c g++ command, in case need add my_code_kernel.o list of inputs .cpp file.

the separation trying achieve possible, looks not linking properly. if still have problems, add compilation commands question.

fyi: don't need declare two_d_example() in header file, used within .cu file (from call_global()).


Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

javascript - addthis share facebook and google+ url -

ios - Show keyboard with UITextField in the input accessory view -