Advice on running an API

Comments

2 comments

  • Avatar
    Candido Dessanti

    Hi @why_not,

    Thanks for your interest in OmniSci Database and to have joined our community.

    About your BIG question, I suggest reading this message written by one of our engineers

    https://community.heavy.ai/t/concurrent-queries/2399/2?u=candido.dessanti

    Depending on your current configurations, you can change the num_executors parameter and check if the response time of the batch of queries is better.

    Anyway, we are working hard to overcome the limitation of running one query at a time on a system. Still, I wouldn't overlook this feature because the increase of performance isn't huge and depends heavily on the kind of queries you are running against the database (did some benchmarks by myself).

    To reply more precisely to your questions. About sharding.

    Sharding a table can make your queries faster on lots of scenarios, even in a single system, because it gives you better control over the number of rows sent to each GPU. On our database, the table data is subdivided in fragments, and the fragments are sents to GPUs for processing; if the number of fragments is equal to the number of the GPUs, you will get the best performance from the query, but if the number if different, the performance will be worse.

    Imagine having 2 GPUs and a table of 90 Million records (2 fragments of 32 Million records each and a fragment of 26 Million); the fragments 1 and 2 will be processed by the GPUs at the same time, while the 3rd will be processed by the first GPU with the second one idling; but creating a sharded table with many shards as the number of GPUs in the server will increase the performance because during execution all GPUs will be crunching data.

    So while sharding won't make your queries run in parallel (cluster or single server is the same), it can reduce the response time of your queries, so the reply to you first question is... it's not correct

    1. You have to use clustering when you run out of CPU or GPU Ram; typically, on a on-premise server, you can't use more than 8 GPUs, so assuming you are using a bunch of RTX 8000 that comes with 48GB of VRAM each, each query is limited to 384GB. The total amount of data that fit in a dual-socket is around 2/4TB (assuming you can mount 8x256GB dimms per socket or Intel's Optane memory), so depending on your hardware and the data volume you need to process, you will be forced to move to a cluster. Loading data won't stop your application from querying the database; the response time will be a little higher because the new data has to be moved to CPU or GPU Ram to be processed.

    2. Cluster is available with Enterprise Edition only as of the ODBC driver; this table will give you an idea about the features you are missing using the OS version immagine|464x500

    I suggest you ask for a temporary enterprise key (here) and try out the DB with Immerse; it's an amazing experience

    1. I'm not sure about javascript, but I used Python and JDBC without big problems, and if java is your development platform, maybe is better to go with JDBC.

    The driver can be extended and has a dedicated repository on Github

    If you have further questions feel, free to ask ;)

    Regards

    0
    Comment actions Permalink
  • Avatar
    why_not

    @candido.dessanti Thanks for an excellent answer. This helps a lot. I did not mean to make it sound like I wanted to run parallel queries on the database level. I meant to give you a general idea of number simultaneous request that could be made with "reasonable" queries. Either way, you answered my question, your extensive response is very much appreciated. We already have a test environment, and I am currently profiling the performance with various schema configurations. I'm sure we will need to test the enterprise level setup at some point, but not just yet.

    0
    Comment actions Permalink

Please sign in to leave a comment.