Unable to get Output from MAPD

Comments

21 comments

  • Avatar
    Candido Dessanti

    HI @Raj_Kiran,

    to get an idea, of when you start to have this issue? Is this issue circumvented to this particular table or a particular database?

    The select count(*) from the table wouldn't even access to the data but just the metadata. What happens if you run select(fielad_name_nullable) from the table?.

    Thanks in advance, Candido

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Then you can use check the status of the files in the filesystem this way

    run

    heavysql> show databases;
    Database|Owner
    omnisci|admin
    adsb|admin
    asof|admin
    

    and starting from omnisci database that the database with id of 1 count until your database. In my example I connected to database asof that's has the id of 3

    then run

    show table details WDBS_ZONE, then get the first number, that's the table_id and check in your data directory (typically /var/lib/omnisci) and check the status of the directory and the files with ls command. S if the table_id is 10

    ls -la /var/lib/omnisci/data/mapd_data/table_3_10

    you should get an output like this

    drwxr-xr-x   2 mapd mapd      4096 lug 21 12:02 .
    drwxrwxr-x 401 mapd mapd     20480 lug 21 11:53 ..
    -rw-r--r--   1 mapd mapd 536870912 giu 30  2019 0.2097152.mapd
    -rw-r--r--   1 mapd mapd  16777216 giu 30  2019 1.4096.mapd
    -rw-r--r--   1 mapd mapd         4 giu 30  2019 epoch
    -rw-rw-r--   1 mapd mapd        16 lug 21 12:02 epoch_metadata
    -rw-rw-r--   1 mapd mapd         4 lug 21 12:02 filemgr_version
    

    after that try this command xdd /var/lib/omnisci/data/mapd_data/table_3_10/filemgr_version

    and share the output of the commands with us.

    Can I ask if you have tried an upgrade to the 6.0 that's failed and you did a sort of rollback?

    Regards, Candido

    0
    Comment actions Permalink
  • Avatar
    Raj Kiran

    Hello @candido.dessanti

    This happens only for few tables . Sorry we are unable to run even show details commands as below

    omnisql> show databases; Database|Owner mapd|mapd wdbsreportdb|wdbsreport omnisql> omnisql> show table details WDBS_ZONE ..> ;

    When we run the above commands ,MAPD freezes and it doesn't even allow login and prints below error when tried to login from other terminal. Error as below

    /opt/omnisci/bin/omnisql XXXX -u XXXXXX -p XXXXXXXX

    Thrift: Thu Jul 21 16:18:01 2022 TSocket::open() connect() : Connection refused Thrift error: No more data to read. Thrift connection error: No more data to read. Retrying connection Thrift: Thu Jul 21 16:18:18 2022 TSocket::write_partial() send() : Broken pipe Thrift error: write() send(): Broken pipe Thrift connection error: write() send(): Broken pipe Retrying connection Thrift: Thu Jul 21 16:18:22 2022 TSocket::write_partial() send() : Broken pipe Thrift error: write() send(): Broken pipe Thrift connection error: write() send(): Broken pipe Retrying connection Thrift: Thu Jul 21 16:18:30 2022 TSocket::write_partial() send() : Broken pipe Thrift error: write() send(): Broken pipe Thrift connection error: write() send(): Broken pipe Retrying connection

    Have tried to extract the file system details as below

    sqlite> select dbid,name from mapd_databases; 1|mapd 2|wdbsreportdb

    sqlite> select name,tableid from mapd_tables where name='WDBS_ZONE'; WDBS_ZONE|19 sqlite>

    [wdbs@pcrfreporting mapd_data]$ ll|grep _19 drwxr-xr-x 2 root root 56 Feb 27 2019 DB_1_DICT_19 drwxr-xr-x 2 root root 63 May 26 11:30 table_2_19 [wdbs@pcrfreporting mapd_data]$

    [wdbs@pcrfreporting DB_1_DICT_19]$ ls -la /opt/data/data/mapd_data/DB_1_DICT_19 total 8208 drwxr-xr-x 2 root root 56 Feb 27 2019 . drwxr-xr-x 307 root root 12288 Jul 21 15:45 .. -rw-r--r-- 1 root root 4194304 Feb 27 2019 DictOffsets -rw-r--r-- 1 root root 4194304 Feb 27 2019 DictPayload [wdbs@pcrfreporting DB_1_DICT_19]$ ls -la /opt/data/data/mapd_data/table_2_19 total 24 drwxr-xr-x 2 root root 63 May 26 11:30 . drwxr-xr-x 307 root root 12288 Jul 21 15:45 .. -rw-r--r-- 1 root root 16 May 26 11:30 epoch_metadata -rw-r--r-- 1 root root 5 May 26 11:30 filemgr_version [wdbs@pcrfreporting DB_1_DICT_19]$

    0
    Comment actions Permalink
  • Avatar
    Raj Kiran

    debug.txt|attachment (2.4 KB) have uploaded traces in attachment

    0
    Comment actions Permalink
  • Avatar
    Raj Kiran

    @candido.dessanti sorry

    answering your other 2 queries

    No have not tried to upgrade 6.0

    Xdd command is not available on our production server where MAPD is running

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    well,

    from which I can see here the table is empty (probably has been truncated?) and the file filemgr_version is badly formed can you post the output of the command.

    xdd /opt/data/data/mapd_data/table_2_19/filemgr_version

    (you can also try this. backup the directory containing the table this way cp /opt/data/data/mapd_data/table_2_19/ /opt/data/data/mapd_data/table_2_19_backup and run echo -n -e '\x1\x0\x0\x0' >/opt/data/data/mapd_data/table_2_19/filemgr_version )

    0
    Comment actions Permalink
  • Avatar
    Raj Kiran

    Hello ,

    Have tried suggested commands ,have tried below

    [root@xxxxxxx ~]# cp -r /opt/data/data/mapd_data/table_2_19/ /opt/data/data/mapd_data/table_2_19_backup

    [root@xxxxxxx ~]# echo -n -e ‘\x1\x0\x0\x0’ >/opt/data/data/mapd_data/table_2_19/filemgr_version

    [wdbs@xxxxxxx ~]$ xxd /opt/data/data/mapd_data/table_2_19/filemgr_version {0000000: e280 9878 3178 3078 3078 30e2 8099 ...x1x0x0x0...} [wdbs@xxxxxxx ~]$

    Sorry .Still have same error

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Hi,

    i have been able to reproduce setting the error using a negative number in the filemgr_version, so setting to 1 it's impossible get the error.

    probably you are querying another table?

    could you run the command xdd /opt/data/data/mapd_data/table_2_19/filemgr_version

    and on another table thats working (18 maybe) xdd /opt/data/data/mapd_data/table_2_18/filemgr_version

    0
    Comment actions Permalink
  • Avatar
    Raj Kiran

    hi @candido.dessanti

    Please see the attached doc with details of working and non working table debug1.txt|attachment (1.8 KB)

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Hi @raj,

    looking at your data [wdbs@pcrfreporting log]$ xxd /opt/data/data/mapd_data/table_2_19/filemgr_version 0000000: e280 9878 3178 3078 3078 30e2 8099 ...x1x0x0x0...

    this file look corrupted. when you run the command echo -n -e ‘\\x1\\x0\\x0\\x0’ >/opt/data/data/mapd_data/table_2_19/filemgr_version

    the resulting file would be 4 bytes and like this one

    0000000: 0100 0000
    

    Have you moved the database on other disks lately? Could you try do do this un-mount and remount the filesystem where you data is located?

    0
    Comment actions Permalink
  • Avatar
    Raj Kiran

    hi @candido.dessanti

    Thanks for your feedback we have not moved data to any disks and all the tables that are within MAPD resides on same disk and mount point . DO you feel if we try to drop and recreated the corrupted tables ,will it help ?

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Hi,

    I am not sure the tables are corrupted, but it looks the filesystem is because if you run the echo command you should get a 4 bytes file with the content 01000000, not that random number you are getting. You can try removing the filemgr_version of the table 2 19 restart the database and see what happens.

    Looks a filesystem corruption to me, maybe some ssd are failing on some parts. It happened once to me

    0
    Comment actions Permalink
  • Avatar
    Raj Kiran

    @candido.dessanti okay .We shall seek window from customer ,do below

    rm -f /opt/data/data/mapd_data/table_2_19/filemgr_version

    Restart Database .Assuming you meant the same by saying try removing .

    Thank you

    0
    Comment actions Permalink
  • Avatar
    Raj Kiran

    debug2.txt|attachment (311 Bytes) Also please see the attached txt file where i have fired xxd on backup file that i took before running echo command and then with the latest file on which echo was fired debug2.txt|attachment (311 Bytes)

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    have you did echo in the 2_19 table?

    I'm seeing Backup

    [wdbs@pcrfreporting ~]$ xxd /opt/data/data/mapd_data/table_2_19/filemgr_version_210722 0000000: 0000 00ff ff .....

    Post doing echo

    [wdbs@pcrfreporting ~]$ xxd /opt/data/data/mapd_data/table_2_46/filemgr_version 0000000: 0100 0000

    0
    Comment actions Permalink
  • Avatar
    Raj Kiran

    (post deleted by author)

    0
    Comment actions Permalink
  • Avatar
    Raj Kiran

    Hi Sorry

    Please refer this debug3

    debug3.txt|attachment (385 Bytes)

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    try to run the

    echo -n -e ‘\\x1\\x0\\x0\\x0’ >/opt/data/data/mapd_data/table_2_19/filemgr_version and then this on the same file xxd /opt/data/data/mapd_data/table_2_19/filemgr_version

    the database is crashing because an unexpected value is read and it's aborting the server to limit a possible corruption.

    so the possible solutions, are fixing the filemgr_version files with the echo -n -e ‘\x1\x0\x0\x0’ command, or removing tham and making the syste re-create, but I'm not sure it's going to work, because the values into the files cannot be come from the software, so check you disk and the filesystem to be sure that you havent a corruption

    0
    Comment actions Permalink
  • Avatar
    Raj Kiran

    Thanks @candido.dessanti ..Shall try and update by tomorrow

    0
    Comment actions Permalink
  • Avatar
    Candido Dessanti

    Hi Raj,

    I will wait for feedback then

    0
    Comment actions Permalink
  • Avatar
    Raj Kiran

    hi @candido.dessanti ..Have executed the echo command again and it looks good now .Somehow MAPD seems to be auto restarted so not able to understand what helped to resolve. But now it looks good and we are able to query the table .Many thanks for your time and help

    0
    Comment actions Permalink

Please sign in to leave a comment.