Local content generation for Wikipedia, Google
The new Google in Sub-Saharan Africa resource is a mix of standard Google tools and uniquely African assets. Included in the slew of new features are resources for educators (University programs) and developers (g-Africa). More features will undoubtedly arise as time goes on, so be sure to bookmark the page.
Of interest is the Google Africa Community Translation project which irons out linguistic issues on African domains, and helps research volunteers to organize their activities. The site provides a tidy list of which African nations have specific Google search engines.
In October 2010, Google Africa Community Translation published an insightful presentation for the W3 Consortium. Included are:
- highlights of the language density found in Africa
- low demand for local language services due to the high prevalence of a handful of language for education, training, ICT, oral history
- tables and charts show the gap in Internet user growth and the number of Wikipedia articles for a given language. Amharic and Swahili are specifically listed.
- ways to grow the online community through publishing contests, tools, and standards
- finally, the question: Do users first generate content, or does content draw in users?
The document references a Wikipedia Stats page, which was last updated in January 2012. When sorted by African languages, we can see the number of speakers, editors, number of articles, and usage of this content. The most obvious trend is the lack of content for languages other than English, French, Spanish, Portuguese, and Arabic.
More editors are needed. Hopefully with the measures outlined by Google volunteers can be encouraged to contribute to the minority language content on the Web:
Code | Language | Speakers (Prim + Sec) | Editors per million speakers | Visits/hr | Articles |
---|---|---|---|---|---|
en | English | 1500 M | 24 | 9,889,432 | 3,455,258 |
fr | French | 200 M | 24 | 864,971 | 1,025,634 |
es | Spanish | 500 M | 8 | 1,396,322 | 667,680 |
pt | Portuguese | 290 M | 6 | 496,775 | 615,648 |
ar | Arabic | 530 M | 1 | 73,754 | 130,833 |
simple | Simple English | 1500 M | 0.1 | 8,963 | 65,811 |
sw | Swahili | 50 M | 0.3 | 1,671 | 21,020 |
af | Afrikaans | 13 M | 2 | 2,761 | 16,412 |
yo | Yoruba | 25 M | 0.1 | 700 | 10,182 |
arz | Egyptian Arabic | 76 M | 0.2 | 1,466 | 6,984 |
am | Amharic | 25 M | 0.3 | 513 | 5,519 |
mg | Malagasy | 20 M | 0.1 | 262 | 2,609 |
so | Somali | 14 M | 0.5 | 232 | 1,485 |
ln | Lingala | 25 M | 0 | 204 | 1,290 |
wo | Wolof | 4 M | 0.3 | 210 | 1,081 |
kab | Kabyle | 8 M | 0.1 | 130 | 895 |
ig | Igbo | 22 M | 0 | 158 | 663 |
kg | Kongo | 7 M | 0 | 100 | 573 |
bm | Bambara | 6 M | 0 | 118 | 348 |
ss | Siswati | 3 M | 0.7 | 83 | 256 |
ee | Ewe | 4 M | 0 | 109 | 252 |
om | Oromo | 26 M | 0 | 55 | 209 |
zu | Zulu | 26 M | 0.1 | 75 | 195 |
ts | Tsonga | 3 M | 0 | 81 | 172 |
ha | Hausa | 39 M | 0 | 66 | 147 |
ve | Venda | 875 k | 0 | 50 | 140 |
rw | Kinyarwanda | 12 M | 0 | 46 | 135 |
ti | Tigrinya | 7 M | 0 | 40 | 131 |
sg | Sangro | 3 M | 0 | 67 | 126 |
ki | Kikuyu | 5 M | 0 | 53 | 112 |
st | Sesotho | 5 M | 0 | 65 | 112 |
xh | Xhosa | 8 M | 0 | 52 | 107 |
ak | Akan | 19 M | 0 | 61 | 97 |
tn | Setswana | 4 M | 0 | 45 | 97 |
rn | Kirundi | 5 M | 0 | 26 | 90 |
ny | Chichewa | 9 M | 0 | 27 | 75 |
tum | Tumbuka | 2 M | 0 | 30 | 66 |
ff | Fulfulde | 13 M | 0 | 36 | 58 |
sn | Shona | 7 M | 0 | 33 | 56 |
tw | Twi | 15 M | 0 | 37 | 52 |
lg | Ganda | 10 M | 0 | 24 | 47 |
ng | Ndonga | 690 k | 0 | 11 | 25 |
[…] year, we posted Wikipedia language data with the comment that more editors are needed for non-English language. What has changed in the past 14 […]
[…] at least 80 percent of all content on the Internet is in one of 10 languages. The debate over the lack of Wikipedia content in non-English languages has raged for […]