cuda - Separating out .cu and .cpp(using c++11 library) -
i trying convert c++ program have uses random library c++11 feature. after having read through couple of similar posts here, tried separating out code 3 files. @ outset not conversant @ c/c++ , use r @ work.
the main file looks follows.
#ifndef _kernel_support_ #define _kernel_support_ #include <complex> #include <random> #include <iostream> #include "my_code_header.h" using namespace std; std::default_random_engine generator; std::normal_distribution<double> distribution(0.0,1.0); const int rand_mat_length = 24561; double rand_mat[rand_mat_length];// = {0}; void create_std_norm(){ for(int = 0 ; < rand_mat_length ; i++) ::rand_mat[i] = distribution(generator); } . . . int main(void) { ... ... call_global(); return 0; } #endif
the header file looks follows.
#ifndef mykernel_h #define mykernel_h void call_global(); void two_d_example(double *a, double *b, double *my_result, size_t length, size_t width); #endif
and .cu file looks following.
#ifndef _my_kernel_ #define _my_kernel_ #include <iostream> #include "my_code_header.h" #define tile_width 8 using namespace std; __global__ void two_d_example(double *a, double *b, double *my_result, size_t length, size_t width) { unsigned int row = blockidx.y*blockdim.y + threadidx.y; unsigned int col = blockidx.x*blockdim.x + threadidx.x; if ((row>length) || (col>width)) { return; } ... } void call_global() { const size_t imagelength = 528; const size_t imagewidth = 528; const dim3 threadsperblock(tile_width,tile_width); const dim3 numblocks(((imagelength) / threadsperblock.x), ((imagewidth) / threadsperblock.y)); double *d_a, *d_b, *mys ; ... cudamalloc((void**)&d_a, sizeof(double) * imagelength); cudamalloc((void**)&d_b, sizeof(double) * imagewidth); cudamalloc((void**)&mys, sizeof(double) * imagelength * imagewidth); two_d_example<<<numblocks,threadsperblock>>>(d_a, d_b, mys, imagelength, imagewidth); ... cudafree(d_a); cudafree(d_b); } #endif
please note __global__
has been removed .h since getting following error owing being compiled g++.
in file included my_code_main.cpp:12:0: my_code_header.h:5:1: error: ‘__global__’ not name type
when compile .cu file nvcc fine , generates my_code_kernel.o. since using c++11 in .cpp trying compile g++ , getting following error.
/tmp/ccr2rxzf.o: in function `main': my_code_main.cpp:(.text+0x1c4): undefined reference `call_global()' collect2: ld returned 1 exit status
i understand might not have cuda such , may wrong use of including header @ both places. right way compile , importantly link my_code_kernel.o , my_code_main.o(hopefully)? sorry if question trivial!
it looks not linking my_code_kernel.o. have used -c
nvcc command (causes compile not link, i.e. generate .o file), i'm going guess you're not using -c
g++ command, in case need add my_code_kernel.o list of inputs .cpp file.
the separation trying achieve possible, looks not linking properly. if still have problems, add compilation commands question.
fyi: don't need declare two_d_example()
in header file, used within .cu file (from call_global()
).
Comments
Post a Comment