Cannot connect to OmniSci Server error using GPU in ubuntu instance

Comments

13 comments

  • Avatar
    Candido Dessanti

    Hi @Youan_Lu,

    I suggest you check out if there are errors in the logs, that are located into /var/lib/omnisci/mapd_logs.

    Probably you run out of free disk space while loading the data, but I can't be sure about that, so maybe you are hitting a bug or something; I can't say without knowing the DDL of the table, the query or the logs. If you think you have a problem on the GPUs, you can try to run the query in CPU mode using the /+ cpu_mode / hint or the \cpu command in omnisql command

    e.g.

    select /*+ cpu_mode */ col1,sum(col2) from table1 group by col1;
    

    or

    \\cpu
    select col1,sum(col2) from table1 group by col1;
    

    regards.

    0
    Comment actions Permalink
  • Avatar
    Youan Lu

    (post withdrawn by author, will be automatically deleted in 24 hours unless flagged)

    0
    Comment actions Permalink
  • Avatar
    Youan Lu

    If I switch to CPU , everything is working. But the GPU version hit this error and I need to do some task on comparing CPU and GPU. Here’s the error log: 2020-08-16T03:20:33.642922 F 13654 4 NvidiaKernel.cpp:122 Check failed: cuLinkAddData_v2(link_state, CU_JIT_INPUT_PTX, static_cast(const_cast(ptx.c_str())), ptx.length() + 1, 0, 0, nullptr, nullptr) == CUDA_SUCCESS (218 == 0) ~ I use sample flight data with 10k rows and this query SELECT origin_city AS "Origin", dest_city AS "Destination", AVG(airtime) AS "Average Airtime" FROM flights_2008_10k WHERE distance < 175 GROUP BY origin_city, dest_city;

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Hi,

    which GPU are you using? which driver version? From 5.2 onwards the driver version required is the 418.39 (or later)

    0
    Comment actions Permalink
  • Avatar
    Youan Lu

    (post withdrawn by author, will be automatically deleted in 24 hours unless flagged)

    0
    Comment actions Permalink
  • Avatar
    Youan Lu

    NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 Product name: GRID K520. I think the driver version is matched with the requirement?

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    I am not sure about k520 and Cuda 11, but probably the problem is the grid configuration; I never tried to run out database on such enviroment

    I asked internally btw

    0
    Comment actions Permalink
  • Avatar
    Youan Lu

    What environment do you use ? any tutorial for installation?

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Hi @Youan_Lu,

    Our head engineer pointed me in the right direction; the k520q isn't supported because the card while using Kepler architecture, is using the older of the two and it's limited to Cuda Capabilities 3.0, while we are generating code for at least cc 3.5, so the crash when you try to run the query.

    To try out the database using GPU acceleration you can use any instance with a Kepler k80, any Pascal, Turing or Volta card. I strongly recommend to skip K80 and go for Pascal (GTX, Quadro or Tesla it's not important), and upgrade to Volta (Tesla) if you are planning to use geo functions in an heavly manner (because the increased number of fp64 cores).

    My personal hardware span from small pascal Gpu in my gaming notebook to sever Turing Gpu in my workstation.

    Hope's this helps

    0
    Comment actions Permalink
  • Avatar
    Youan Lu

    That helps a lot. But I’m not that familiar with GPU stuff. But does these GPU work for the database? NVIDIA V100, NVIDIA K80, NVIDIA M60, NVIDIA T4.

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Except for M60 (the Maxwell archs isn't fully supported, so is better to skip it) all other cards works, while the Kepler (K80) is the less capable of the group (some queries would fall back to cpu) and has the days numbered; nVidia is going to cease the support for such old cards

    0
    Comment actions Permalink
  • Avatar
    Youan Lu

    Thank you so much! I use K80 and problem is solved!

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    I'm happy everything is working, but with K80 is likely that some aggregates are going to fall back to CPU, because it lacks some features, like atomic operations with doubles datatypes.

    0
    Comment actions Permalink

Please sign in to leave a comment.