Introduction & Data Setup
Choropleth maps are valuable assets when the need arises to visualize complex datasets and derive insights from thematic maps. Choropleth maps use geographic POLYGON data to present data in a manner that is engaging and highly informative.
For instance, if you intend to visually portray the number of airline flights and their corresponding delays upon arrival within the United States on a state-by-state basis, the most effective approach is employing a choropleth map. To experiment with this, you'll need to load the sample data provided by Heavy AI.
To initiate this process, you can utilize the following terminal command to import sample data:
sudo docker exec -it <container-id> \
./insert_sample_data --data /var/lib/heavyai/storage
Enter dataset number to download, or 'q' to quit:
# Dataset Rows Table Name File Name
1) Flights (2008) 7M flights_2008_7M flights_2008_7M.tar.gz
2) Flights (2008) 10k flights_2008_10k flights_2008_10k.tar.gz
3) NYC Tree Census (2015) 683k nyc_trees_2015_683k nyc_trees_2015_683k.tar.gz
*The command presented assumes use of Docker deployment method and default installation and data storage paths. Please see the "Final Checks" section of the installation documentation relevant to your deployment for instructions
Using Heavy Immerse, connect to heavyai database and open up the Data Manager, you'll discover that the system comes preloaded with a set of foundational "heavyai_*" tables such as "heavyai_us_states" These tables provide essential reference information for geographical entities.
If you don't already have these tables in your database, within the Data Manager, you can click Add Table and choose "Data Catalog", and then find on the subsequent page choose to import the "Geospatial: US States and Territories" data which will be used in this tutorial. To match this tutorial, be sure to name the table heavyai_us_states and rename "STUSPS" to "abbr".
Geo-Joins in Immerse
Leveraging Heavy Immerse's capabilities, you can enhance your analyses by seamlessly integrating this foundational data with a distinct table that houses string keys along with corresponding polygonal boundary data. In other words, you can use shared string values between a table with geographic POLYGON data and a table with data for analysis.
Here are the generic steps steps to setup a Geo-Join in Immerse:
- Select a data source containing our data for analysis
- Set the DIMENSION to a column in our data for analysis containing values matching a column in our polygon table.
- On the right side of the screen, in the "Geo Join" section, select the table containing polygon data.
- In the drop down immediately below, select a column in your polygon data table that has values matching those of the DIMENSION column you selected in step two.
- After completing the above step, the "Geo" measure is automatically filled in with the geometry-containing column in your polygon data.
- Edit your Color measure as desired, using any desired column in your data for analysis.
Tutorial: Average Delay by State (Using flights 2008 sample data)
Let's use the Geo-Join tool to visualize average flight arrival delay by state.
Begin by selecting the desired dataset, such as "flights_2008_7M." Within this context, you can designate the "origin_state" attribute as a dimension, encompassing complete US postal state abbreviations like AZ, MO, FL, and OR. Do not worry about setting Measures at this time.
The subsequent step involves navigating to the right side of the page. Here, you will initiate the process by selecting the table that contains the necessary geometry data within the designated geo-join box. Following this selection, you will proceed to meticulously choose the specific columns that will serve as the basis for the subsequent lookup operation.
To provide practical context, consider the scenario where you are engaging in a geo-join operation. In this instance, your first task is pinpointing and selecting the table housing the pertinent geometry data. For our example, we will use the "heavyai_us_states" table, which houses geographical (POLYGON) data for each US State or Territory, as well as additional columns.
Then select the "abbr" column. This particular column contains the abbreviated representations of distinct entities, such as the abbreviations associated with U.S. states. This column choice is a key reference point for the subsequent lookup procedure.
Furthermore, back on the right side of the screen under Measures, notice that the "Geo" column was automatically completed. To complete your visualization, let's choose a Color measure.
You can choose a color measure that effectively quantifies any aggregation for your measure, from a simple count of records to the averages of delays of airline flights. In our case, we'll choose the arrdelay column with AVG as our aggregation method.
Now, you're presented with the opportunity to visualize your measure through a compelling graphical representation within an interactive choropleth map.
The interactive choropleth map serves as a canvas where your chosen measure takes shape, offering a clear and insightful depiction of patterns, trends, and variations. The map empowers you to explore the geographical distribution of your data, uncover correlations, and gain a deeper understanding of the underlying insights.
Some examples of changing Map Styles, Colors, Opacity, and other attributes can contribute to enriching the visual experience for the users, adding the same time information from the underlying map layers.
Lowering the opacities to show up the contents of the map layers.
Changing the map style to get different visual attributes in the map.
Changing the map style and color palette to get a nice artistic touch.