Main Page

About this visualization

Previous Page

1. Introduction

I used the dataset: Population by language, sex, and urban/rural residence authored by the United Nations Statistics Division and available at http://data.un.org/Data.aspx?d=POP&f=tableCode%3a27#POP . I cleaned the data to exclude any records that are meaningless for the purpose of visualization (e.g. 0 speakers, rows with totals, languages marked as "Other", etc.). The dataset is based on data from several censuses. Filtering by census seems to produce incomplete datasets. Therefore, I had to use the entire dataset, and any numerical results should be treated as relative because the number of speakers may be overestimated. Also, the dataset is lacking data on Chinese, which has more speakers than any other language, and Japanese. I could have easily recreated the data using public sources, but the numbers would be in a different system of coordinates because of the reasons stated above. So, I decided against it.

In addition, I had to apply minimum cutoffs for languages with the smallest number of speakers (10 to 100 speakers depending on the case) in order to avoid languages with 0 speakers in any of the four assessed categories as much as possible. The main purpose of this project is to demonstrate D3.js visualization capabilities, but apart form this I also tried to approximate my work to data analysis with the aim to establish a relation between the number of speakers of a natural language and the female vs. male or rural vs. urban ratios. Overall, the data is presented in such a way that languages can be analyzed based on these ratios for the largest and smallest 30-50 or so languages in an attempt to establish the said correlation.

2. Overview

In my visualization, the ordering of scenes is more user-directed with soft author-led ordering recommendations - the so called hybrid structure. From the main overview page the user can drill down on one of the two detailed aspects of the dataset consisting of two scenes each. In each scene, the user has a freedom of either continuing to go forward or backward, or to return to the main overview page. Plus there is an option to click on the About Visualization button on each page to see this essay.

According to E. Segel and J. Heer in Narrative Visualization: Telling Stories with Data [2], "the drill-down story visualization structure presents a general theme and then allows the user to choose among particular instances of that theme to reveal additional details and backstories." This is exactly what my visualization does because it has the first overview scene where the number of speakers are shown for each language in the form of a pie chart, and then the viewer can drill down on two more detailed aspects: the number of speakers per the female vs. male distribution of languages (two scenes - one for the largest and one for the smallest languages) or the urban vs. rural distribution (similar two scenes). Therefore, the type of hybrid structure that I selected for my visualization is the drill-down story visualization.

Thus, my visualization contains 5 scenes: a pie chart as an overview of the dataset and four categorical vs. quantitative side-by-side bar charts. The layout includes the charts themselves and triggers in the form of buttons to initiate transition from one scene to another. Arcs on the pie chart, bars on the bar chart, and labels on all charts have a hover over effect which makes them stand out during mouse over. I tried to use a consistent visual design to keep the audience oriented through transitions:

The annotations are applied as follows in my visualization:

To support visual consistency, I used exactly the same styles and templates for similar types of the above annotations on each slide so that they would not distract the viewer from data analysis (e.g. font size, font color, hover over behavior, etc.).

As defined by A. Satyanarayan and J. Heer in Authoring narrative visualizations with Ellipsis [1], parameters are name-value pairs that set visualization state and provide a means for decoupled control. In my visualization parameters include:

When the value of any of these parameters changes, the visualization is re-drawn. In my visualization I do not use any parameters bound to widgets (sliders or drop-down menus), but this is something to think of in the future. Each of these parameters controls the current state of every scene type in my narrative visualization; the corresponding scene types are mentioned in the parenthesis above.

Go forward / go back buttons initiated by a user click event are triggers in my case, and they initiate a change in parameters: for example you can move from a chart showing the female vs. male distribution for the largest languages to the one showing the same distribution for the smallest languages. The viewer becomes aware of this through the evident change of the scene and the corresponding title of the page. Also, the data on the chart + the chart scale change to show numbers on a completely different scale (largest languages - millions of speakers, smallest languages - individual speakers).

3. Web Page Interaction Design

Since my visualization is not overly complex, I did a simple task analysis and dialog design. First of all, it is the viewer who initiates the dialog. The task consists in displaying various subsets of data through onclick events which are the way of the user-computer communication. Actions are connected to interface elements in the form of control widgets - buttons operating through the button press callback mechanism when the event is a click followed by system effect: transition to another scene.

In terms of the interactive dynamics, the data are:

4. Conclusions Deduced from Dataset Analysis

There is a slight variation in the numbers for male and female speakers for the largest languages. The differences in the rural vs. urban ratio for the largest languages are more noticeable with the urban population prevailing for the languages spoken in Europe and America and rural - for Asian languages (e.g. compare English and Hindi). But the overall differences are not critical.

The imbalance between the categories becomes much more evident for the languages with the smallest number of speakers. There are even languages that don't have speakers in the other category or have very few speakers in it. It is evident that there is a correlation between the number of speakers and the two distributions in question. The less speakers there are, the more imbalanced the female vs. male and rural vs. urban distributions become. Further detailed analysis may discover deeper patterns.

5. References

  1. Satyanarayan, A., & Heer, J. (2014). Authoring narrative visualizations with Ellipsis. Proc. EuroVis
  2. E. Segel and J. Heer (2010) Narrative Visualization: Telling Stories with Data
  3. Creating a horizontal bar graph using D3.js: http://hdnrnzk.me/2012/07/04/creating-a-bar-graph-using-d3js
  4. Pie chart with relaxed labels: https://jsfiddle.net/thudfactor/HdwTH
  5. D3.js documentation: https://github.com/d3/d3-3.x-api-reference