Table Partition on Date Field --- Missing Feature

Comments

8 comments

  • Avatar
    anon23740792

    Sumit, thanks for the input. these are good points - some of them we are already at work on, around improving deletes and also optimizing how we handle epochs on disk. While we dont support date-based partitions yet, we are also looking into how we can use filter pushdowns in queries to improve performance for date-based queries that result in wide scans (e.g. 2 years worth of data). More to come in the next few months.

    All that said, a better path to loading omnisci is to avoid too many loads with small batches - this causes metadata to grow. As i have pointed out earlier, as a column store, it is hard for us (and many other column stores) to handle high-throughput small batch data loads. It might be better in the short term if you redo your import process to have larger than 16MB batches

    0
    Comment actions Permalink
  • Avatar
    Sumit

    Hi @anon23740792,

    I am in agreement with your point to do writing in large chunks to utilize the 16MB DB Blocks. However many Real time use cases are required continuous data ingestion and commit to make data available. Like i mentioned in this case we will get next day to consolidate the data and reduce DB Blocks. However, now in OMNISCI database no option is available other than CTAS on complete table. As we all know CTAS on billion of records is time consuming job and may be it can fail also.

    i think we can discuss separately on use cases.

    Regards, Sumit

    0
    Comment actions Permalink
  • Avatar
    Sam Carroll

    Sumit, One other thing you can try to avoid having to run deletes is set a max_rows property on the DDL of the table in question. This property will automatically remove rows on a first in first out basis.

    Here’s a link to the max_rows property: https://docs.omnisci.com/latest/5_tables.html#encoding-spec

    Sam

    0
    Comment actions Permalink
  • Avatar
    Sumit

    Hi @sam.carroll

    As this will remove rows without actual knowledge and this will create bigger problem. as requirement to drop all records which are x Days old to keep uniformity in reports and analytics.

    Regards, sumit

    0
    Comment actions Permalink
  • Avatar
    anon23740792

    Sumit, these are all good input and suggestions, but as you can imagine, we cannot immediately work on date partitions because there are customer priorities on our roadmap that take precedence.

    For now, you'll need to the make the adjustments at your end to figure out the right batch size. Like I said, we're working on ways to improve this.

    0
    Comment actions Permalink
  • Avatar
    Sumit

    Hi @anon23740792,

    Thank you for sharing feedback and i can totally understand design pipelines.

    For the time being, i will use CTAS method so we can make data available as soon as possible and manage Data Blocks.

    Regards, Sumit

    Regards, sumit

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    a warm welcome to @sam.carroll to our forum.

    @Sumit adding the feature you requested took a very long time also to Oracle, and while it was very important to them to get some performance. This isn't true for our database, because we have metadata recorded for each column of each fragment of data, so if you run a query for a specific range of time, just the fragments containing data of that rage are read from disk and cached to CPU and GPU for further processing.

    When we fix the way, we delete data, or better, optimize data after deletes, the needs for a horizontal partition will be less obvious.

    0
    Comment actions Permalink
  • Avatar
    Sumit

    Hi @candido.dessanti and Omnisci team,

    Yes I am in agreement with you that adding a new feature is complex.

    As Omnisci team is Active in Telecom domain and hence let me share with you most frequent use case in Telecom domain and that can give you idea about what could be useful feature

    1. Telecom data analytics dependent on CDR (Call Detail Records)/ BDR (balance Detail Records)/ Data Usage Detail Records/Syslogs.
    2. Telecom system generate 200Million to 800Million records per day for mid-size network.
    3. Daily Data consumption is in range of 10GB to 25GB and depend on number of fields in Records.
    4. Most cases Telecom operator keep 30 days Data in table for analytics and use system 24 x 7.
    5. Based on FIFO (First In First Out), system need to delete 30 days old data.
    6. Backup Team, take daily backup before deleting the records from Main Table. This Backup stored in Historical database server and tape drive for future reference.
    7. System HDD’s/SSD’s dimension are [30Days x Daily Volume]+ 50% additional usable space.
    8. CPU and RAM are based on product requirement and product specification.

    I am seeing following challenges

    • How to manage 30 days data in Table in efficient way
    • How to take daily Partition Backup
    • How to manage disk space.

    For the above challenges please let us know your suggestion. I firmly believe one problem has many solutions. My suggested Daily partition is one of the solution. However, new generation database can advise other more efficient option.

    0
    Comment actions Permalink

Please sign in to leave a comment.