world time zones
word count of TIDES text, plotted by
world time zones and date
Visualization process in italics.
Technical process in normal text. |
The major challenges with the TIDES data were that the data was entirely text in emails, and that the customer wanted some kind of visualization of time and location information, gleaned from bylines.
I did this with almost no new coding. (I finished a small feature in the mkRingsScript program so I could use it.) Mostly I just hand-massaged data files and used existing tools.
Country City TZ ----------- ------------------------- ------ USA Austin, TX -06:00 Canada Toronto, Ontario -05:00 USA Boston, MA -05:00 UK London +00:00 Nigeria Lagos +01:00 Egypt Cairo +02:00 Israel Jerusalem +02:00 Israel Palestine +02:00 Israel Tel-Aviv +02:00 Jordan Amman +02:00 Lebanon Beirut +02:00 Russia Moscow +03:00 Uganda Kampala +03:00 Yemen Sana'a +03:00 Iran Tehran +03:30 Ahghanistan Kabul +04:30 Pakistan Islamabad +05:00 Pakistan Karachi +05:00 Pakistan Lahore +05:00 Pakistan Peshawar +05:00 Uzbekistan Tashkent +05:00 India Mumbai +05:30 Indonesia Jakarta +07:00 China Hong Kong +08:00 Malaysia Kuala Lumpur +08:00 Morocco Rabat +08:00 Philippines Manila +08:00 Japan Tokyo +09:00 Australia Sydney +10:00 |
I started with TIDES emails numbers 81 through 88, dated 24 Feb to 4 March 2002. By hand I cut and paste the text into files by date and time zone of publication. A typical filename would be: 2002.02.24_+00.txt, for text dated 24 Feb 2002 and from time zone Zulu+0, which is Greenwhich Mean Time (GMT) and the time zone of London, UK. The filename 2002.02.26_+04.txt is for text dated 26 Feb 2002 and from time zone Zulu+4, which includes Afghanistan. (Actually Afghanistan is at Zulu+4:30, but I have ignored the half-hour adjustments for simplicity.)
Given the files I'd created, it was a simple matter to use the UNIX utility
wc (word count) to count the words in each file, and then manually
paste the resulting numbers into a 2-dimensional AVS field file. Here is the
output from the command
wc -w 2*.txt:
993 2002.02.23_+02.txt 504 2002.02.23_+03.txt 2620 2002.02.23_+05.txt 5507 2002.02.23_+08.txt 1662 2002.02.24_+01.txt 3433 2002.02.24_+02.txt 1222 2002.02.24_+03.txt 4084 2002.02.24_+05.txt 5297 2002.02.24_+08.txt 694 2002.02.24_+09.txt 1753 2002.02.25_+00.txt 3912 2002.02.25_+02.txt 6322 2002.02.25_+03.txt 1097 2002.02.25_+05.txt 667 2002.02.25_+07.txt 1460 2002.02.25_+08.txt 2163 2002.02.25_-06.txt 685 2002.02.26_+00.txt 1111 2002.02.26_+02.txt 2037 2002.02.26_+03.txt 2134 2002.02.26_+05.txt 9339 2002.02.26_+08.txt 2184 2002.02.26_-05.txt 668 2002.02.26_-06.txt 466 2002.02.27_+00.txt 2870 2002.02.27_+02.txt 5226 2002.02.27_+03.txt 3745 2002.02.27_+05.txt 2497 2002.02.27_+08.txt 552 2002.02.27_+10.txt 735 2002.02.28_+02.txt 3918 2002.02.28_+03.txt 4321 2002.02.28_+05.txt 1281 2002.02.28_+08.txt 630 2002.02.28_-05.txt 1031 2002.02.28_-06.txt 938 2002.03.01_+02.txt 1069 2002.03.01_+03.txt 6698 2002.03.01_+05.txt 4986 2002.03.01_+08.txt 1672 2002.03.02_+02.txt 471 2002.03.02_+03.txt 1113 2002.03.02_+04.txt 1793 2002.03.02_+05.txt 550 2002.03.02_+07.txt 3163 2002.03.02_+08.txt 2469 2002.03.02_+10.txt 1993 2002.03.02_-05.txt 7539 2002.03.03_+02.txt 742 2002.03.03_+04.txt 3562 2002.03.03_+05.txt 873 2002.03.03_+08.txt 682 2002.03.04_+00.txt 956 2002.03.04_+02.txt 1405 2002.03.04_+03.txt 8712 2002.03.04_+05.txt 522 2002.03.04_+07.txt 3783 2002.03.04_+08.txt 1799 2002.03.04_+09.txt 6945 2002.03.04_-05.txt 153255 total |
...and here is the resulting AVS field data (in two files):
tz_freqs.fld # AVS field file # ndim = 2 dim1 = 19 dim2 = 10 nspace = 2 veclen = 1 data = float field = uniform variable 1 file=tz_freqs.txt filetype=ascii skip=2 stride=1 |
tz_freqs.txt -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 0 0 0 6945 0 0 0 0 682 0 956 1405 0 8712 0 522 3783 1799 0 0 0 0 0 0 0 0 0 0 0 7539 0 742 3562 0 0 873 0 0 0 0 0 1993 0 0 0 0 0 0 1672 471 1113 1793 0 550 3163 0 2469 0 0 0 0 0 0 0 0 0 0 938 1069 0 6698 0 0 4986 0 0 0 0 1031 630 0 0 0 0 0 0 735 3918 0 4321 0 0 1281 0 0 0 0 0 0 0 0 0 0 466 0 2870 5226 0 3745 0 0 2497 0 552 0 0 668 2184 0 0 0 0 685 0 1111 2037 0 2134 0 0 9339 0 0 0 0 0 0 0 0 0 0 1753 0 3912 6322 0 1097 0 667 1460 0 0 0 0 2163 0 0 0 0 0 0 1662 3433 1222 0 4084 0 0 5297 694 0 0 0 0 0 0 0 0 0 0 0 993 504 0 2620 0 0 5507 0 0 |
I produced the animation above in AVS. The field_to_mesh module let me easily control the height and color, experimenting until I liked the result. The most difficult thing was getting the labels of date and time zone correctly positioned, which I did mostly by trail and error, editing this label file:
zones.label colors dropshadow left 5 29 Z-08 0.02 0.0 9.0 0.0 0.00 0.00 0.00 0.9 0.9 0.9 Z-07 0.02 1.0 9.0 0.0 0.00 0.00 0.00 0.9 0.9 0.9 Z-06 0.02 2.0 9.0 0.0 0.00 0.00 0.00 0.9 0.9 0.9 Z-05 0.02 3.0 9.0 0.0 0.00 0.00 0.00 0.9 0.9 0.9 Z-04 0.02 4.0 9.0 0.0 0.00 0.00 0.00 0.9 0.9 0.9 Z-03 0.02 5.0 9.0 0.0 0.00 0.00 0.00 0.9 0.9 0.9 Z-02 0.02 6.0 9.0 0.0 0.00 0.00 0.00 0.9 0.9 0.9 Z-01 0.02 7.0 9.0 0.0 0.00 0.00 0.00 0.9 0.9 0.9 Z+00 0.02 8.0 9.0 0.0 0.00 0.00 0.00 0.9 0.9 0.9 Z+01 0.02 9.0 9.0 0.0 0.00 0.00 0.00 0.9 0.9 0.9 Z+02 0.02 10.0 9.0 0.0 0.00 0.00 0.00 0.9 0.9 0.9 Z+03 0.02 11.0 9.0 0.0 0.00 0.00 0.00 0.9 0.9 0.9 Z+04 0.02 12.0 9.0 0.0 0.00 0.00 0.00 0.9 0.9 0.9 Z+05 0.02 13.0 9.0 0.0 0.00 0.00 0.00 0.9 0.9 0.9 Z+06 0.02 14.0 9.0 0.0 0.00 0.00 0.00 0.9 0.9 0.9 Z+07 0.02 15.0 9.0 0.0 0.00 0.00 0.00 0.9 0.9 0.9 Z+08 0.02 16.0 9.0 0.0 0.00 0.00 0.00 0.9 0.9 0.9 Z+09 0.02 17.0 9.0 0.0 0.00 0.00 0.00 0.9 0.9 0.9 Z+10 0.02 18.0 9.0 0.0 0.00 0.00 0.00 0.9 0.9 0.9 04-Mar-2002 0.03 0.0 9.0 0.0 -0.20 0.00 0.00 0.9 0.9 0.9 03-Mar-2002 0.03 0.0 8.0 0.0 -0.20 0.00 0.00 0.9 0.9 0.9 02-Mar-2002 0.03 0.0 7.0 0.0 -0.20 0.00 0.00 0.9 0.9 0.9 01-Mar-2002 0.03 0.0 6.0 0.0 -0.20 0.00 0.00 0.9 0.9 0.9 28-Feb-2002 0.03 0.0 5.0 0.0 -0.20 0.00 0.00 0.9 0.9 0.9 27-Feb-2002 0.03 0.0 4.0 0.0 -0.20 0.00 0.00 0.9 0.9 0.9 26-Feb-2002 0.03 0.0 3.0 0.0 -0.20 0.00 0.00 0.9 0.9 0.9 25-Feb-2002 0.03 0.0 2.0 0.0 -0.20 0.00 0.00 0.9 0.9 0.9 24-Feb-2002 0.03 0.0 1.0 0.0 -0.20 0.00 0.00 0.9 0.9 0.9 23-Feb-2002 0.03 0.0 0.0 0.0 -0.20 0.00 0.00 0.9 0.9 0.9 |
I chose for this test to only show the frequencies of the two highest peaks. I also just "impaled" each set of rings on the associated peak.
I created shell scripts to use the UNIX utilites grep (get regular expression) and wc - (line count) to count the occurences of specific words in the files. This yielded these word counts:
2002.02.26_+08.txt 30 2 0 0 0 2002.03.04_+05.txt 9 11 21 12 0 |
I pasted the resulting counts into a text file formatted so my C program mkRingsScript could read it. This program makes a shell script which then runs other programs to create data files. After running everything I ended up with rings superimposed over the total word counts, representing word frequency information.
I also changed the words I was counting, to this list of 6 words:
This was easy to to with the pre-esxisting tools; I just modified my search scripts to pass more data in the input file to mkRingsScript, and selected a higher Z offset than zero.
Last update 28-Mar-2002 by ABS.