[sticky entry] Sticky: How can I help?

Nov. 26th, 2022 06:42 pm
doranwen: picture of a book with the word logophile (logophile)
[personal profile] doranwen

Who can help?



To finish getting all of the groups tagged and sorted will take a lot of volunteers. We can find a role for anyone who is willing to devote time to this. We can especially use people with at least one of the following skills or traits:

- detail-oriented
- know a language other than English
- have an area of expertise (such as a fandom or a genre of music)
- good at performing difficult searches of the Internet to track down information
- comfortable with installing a program and learning to use it
- good at recruiting others


Later on we may be able to use someone who is skilled with scripts.


Can I help out right now?



Absolutely! Tagging is in full swing and we need all the help we can get. Just head over to our Discord server and follow the directions to get started.

For more information on what tagging involves, or to find text you can share to boost this, see the following post: https://yahoogroups.dreamwidth.org/6497.html
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
We've got some great taggers at work, but we still need far more to finish. If you're interested in seeing it get done, please consider tagging a tab if you haven't already. Joining the Discord server and following the instructions is the easiest way to do it, but you can also PM me here, message me on IRC's Hackint network (find me in #yahoosucks), or even message me on Reddit (Doranwen there as well).

If you can't tag a tab, maybe you can post the boost text from this post somewhere to get more eyes on it? Or just point people to this comm and say "hey, they could really use some help"? :)

If you know of anyone who might enjoy working on one of the smaller languages, that would be great as well. Many of the single tabs have only a handful of groups on them, making it a fairly easy and quick job for a native speaker. We're happy to help them get started!


Actual stats:

Now up to 3.73% tagged.

Available tabs (sorted by descending numbers by language):

English: 3219
Unknown: 544
Spanish: 388
Portuguese: 336
French: 146
Indonesian/Malay: 131
Italian: 89
German: 77
Turkish: 64
Chinese: 59
Arabic: 54
Romanian: 36
Spam: 32
Persian: 16
Dutch: 12
Filipino: 12
Swedish: 8
Hungarian: 7
Polish: 7
Vietnamese: 6
Bosnian: 3
Finnish: 3
Catalan: 2
Danish: 2
Esperanto: 2
Lithuanian: 2
Norwegian: 2
Russian: 2

Single tabs available:
African: Afrikaans, Chichewa, Hausa, Kinyarwanda, Malagasy, Somali, Swahili, Yoruba
Asian: Acehnese, Armenian, Azerbaijani, Batak Toba, Bengali, Georgian, Gujarati, Hebrew, Hindi/Urdu, Javanese, Kannada, Kapampangan, Kazakh, Korean, Kurdish, Malayalam, Marathi, Mongolian, Sundanese, Tamil, Telugu, Tetum, Thai, Turkmen, Uzbek, Uyghur
European: Albanian, Basque, Breton, Croatian, Czech, Estonian, Galician, Greek, Icelandic, Ido, Interlingua, Latin, Latvian, Maltese, Occitan, Slovak, Slovenian, Welsh

All compilation tabs are still available as well.

Note: The "Unknown" groups are by and large not unknown as far as language goes, but they can't be tagged without looking at the actual messages. Most are clearly in English.

The spam tabs will also need to be looked at a little to confirm spam status, but many are in clear patterns and most of the groups won't need to be looked at once one or two are. I have no idea what language, if any, most of those are. My suspicion is that they were created just for email address harvesting, but we may never know for sure.
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
The tagging process is moving out of beta mode; while minor improvements may be made over time, we are hopeful that no big changes will need to be made from here on.

We therefore invite any and all willing volunteers to come help. Attention to detail, the ability to work in Google Sheets/Docs, and the willingness to ask questions when in doubt are all you need.


What does tagging involve?



Tagging is the process of reading through group metadata (names, descriptions, and a couple other useful fields) and deciding what the group is about. Then you copy/paste the correct information into specific columns. You will be assigned a 'tab' of groups to keep the process manageable. Each tab has between 100 and 300 groups on it, from one or more categories on Yahoo. You can work at your own pace and drop in when you have time.

The guidelines for the process are in a Google Doc and cover most of the situations you'll encounter. If you run across something difficult, we encourage you to paste it in the correct channel on our Discord server and ask for advice.


Can I ask to tag something specific?



Yes, but you're limited to a) categories that Yahoo actually had, and b) things that haven't been tagged yet. Not all fandoms that were popular during that time had their own categories, and except for the very large (such as Harry Potter, Star Trek, Dragon Ball, or Backstreet Boys), most will be on a tab with other fandoms. (When you volunteer to claim a tab, you're volunteering to tag all of the groups on that tab, not only the categories you're interested in.) Nonfandom areas also vary wildly in their size; there are enormous quantities of classmate and alumni groups, software groups, genealogy groups, romance groups, recycling groups, and adoption groups for instance, but there may only be one or two groups for a particular automotive make or health condition.

Tagging isn't exceptionally difficult, however, and once you successfully complete one tab, you will probably find the second one even easier. Plus, you never know what gems you may run across while you do it!


What if I'm not comfortable tagging groups? Is there any other way to help?



Yes, there are many other ways to help.

1. Importing
We need people who are willing to install an email program called Sylpheed on their computers to import Yahoo Groups into a format that will help us tag. Not everyone can install the program, so you can help with importing. That way we have a steady flow of groups for people to tag.

2. Languages
If you can read a language other than English, you can help on the server by opting into a role for your language. Then if someone is trying to tag groups in that language, they can call on you for help with anything confusing.

3. Boost!
We also need more visibility on this project, which brings us to…


Can I advertise this somewhere I know?



Yes, please! There are about a million groups left to tag, on over 5000 tabs. The more volunteers we have, the fewer tabs each person has to tag.

If it's useful, feel free to share this text in quotes:

Five years ago, with little notice, Yahoo announced Yahoo Groups was being deleted. An army of archivists swung into action and saved nearly a million groups in all - 14 terabytes of data.

The next step of the Save Yahoo Groups project is tagging the groups.

We need volunteers who:

* are detail-oriented and careful

* ask lots of questions when in doubt

* can use Google Sheets/Docs at a basic level

Also helpful:

* able to read languages other than English

* able to install a simple program and follow a visual guide to importing mbox files

* extensive knowledge about a particular subject

If this interests you, check out our Dreamwidth community and volunteer to help: https://yahoogroups.dreamwidth.org/profile
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
The metadata sorting is finally done! You have no idea how happy I am to have that part off my plate.

So what's next? Well, for the next week or so you won't see anything here, as I encourage beta taggers to finish up outstanding tabs (and deal with a few tasks of my own that I've been putting off). There are also several main categories that haven't had a thorough testing by beta taggers (Recreation & Sports, Hobbies & Crafts, Health & Wellness, and Schools & Education), so if you've been waiting to help, here's your invitation to pick one of those and try tagging a tab of it.

Once the main categories have been tested, then I'll post a message that you can link to, so you can invite people you know who might be interested in doing this. The more volunteers who help, the sooner this will be done and can be uploaded to the Internet Archive.

The three key things we need in volunteers (besides the time to complete the task) are:
  • carefulness / attention to detail

  • a willingness to ask questions when in any doubt

  • and

  • the ability to use Google Sheets / Docs at a basic level.


Specialty knowledge in an area (including being able to read another language) is a bonus but not necessary. Discord is the easiest way to connect with us but Dreamwidth PMs, Google Chat, or even IRC can work instead if need be.

Actual stats:

Now up to 2.89% tagged.

Available tabs (sorted by descending numbers by language):

English: 3264
Unknown: 544
Spanish: 389
Portuguese: 336
French: 146
Indonesian/Malay: 131
Italian: 89
German: 77
Turkish: 64
Chinese: 59
Arabic: 54
Romanian: 36
Spam: 32
Persian: 16
Dutch: 12
Filipino: 12
Swedish: 8
Hungarian: 7
Polish: 7
Vietnamese: 6
Bosnian: 3
Finnish: 3
Catalan: 2
Danish: 2
Esperanto: 2
Lithuanian: 2
Norwegian: 2
Russian: 2


Besides the list of available tabs above, we have one tab each (often well under 100 groups - many don't even have 10!) of the following languages:

African: Afrikaans, Chichewa, Hausa, Kinyarwanda, Malagasy, Somali, Swahili, Yoruba
Asian: Acehnese, Armenian, Azerbaijani, Batak Toba, Bengali, Georgian, Gujarati, Hebrew, Hindi/Urdu, Javanese, Kannada, Kapampangan, Kazakh, Korean, Kurdish, Malayalam, Marathi, Mongolian, Sundanese, Tamil, Telugu, Tetum, Thai, Turkmen, Uzbek, Uyghur
European: Albanian, Basque, Breton, Croatian, Czech, Estonian, Galician, Greek, Icelandic, Ido, Interlingua, Latin, Latvian, Maltese, Occitan, Slovak, Slovenian, Welsh


There are also special compilation tabs with one or two groups each of a variety of lesser-used languages, one for each of African, Asian, European, North American, Oceania, and South America. These tabs are very small, all being 20 groups or fewer.

Of special notice is a tab of around 200 groups from this family of languages (Zo, Tedim, Hakha, Mara, etc.). If you know of anyone who can read any of these, please put them in contact with us! Google Translate can only recognize and understand Mizo and the rest must be manually identified (a very difficult task for someone who doesn't speak or read any of them). Translating them is next to impossible. (Some don't even have dictionaries available online.)
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
I finished all the German categories and have been going through Portuguese ones. Oddly enough, unlike the other categories, the Portuguese ones started out somewhat jumbled, though it seems to have stabilized mostly going through top-level categories one at a time (but in reverse order alphabetically, chunk by chunk). After the Portuguese, the only language directory that's left is French, which I suspect will be relatively smaller overall.

I did find a new language—Cape Verdean Creole—I think! Google Translate thought it was Brazilian Portuguese but it's definitely not and all research that volunteers did for me seems to turn up Cape Verdean Creole, so I'm assuming that must be it. Hopefully we can find someone who can read it in order to tag it properly.

This will be, I believe, the final update for metadata sorting until I'm finished. If my calculations are correct, the 95% mark will occur in the middle of the final category, which is the massive NULL one (groups for which we have no information on category path, category, or categoryid). Sorting this category will go remarkably fast, because there are only three possibilities: we have at least a partial description which can be used to tag (likely to be very rare), we have no info but we have the GMD so the mboxes can be looked at (also probably rare), or we have no info at all (the most likely). The first can stay on the appropriate language tab, the second and third go to Unknown. After that, I have only the final 1-2% to handle - finalizing all the smaller language tabs (including some that turned out to have far more groups than I imagined and will require splitting into multiple tabs). I can't wait to be done with this stage!


Actual stats:

Now up to 90.00% sorted and 2.89% tagged.

Available tabs:

English: 3254
Spanish: 388
Portuguese: 269
Italian: 88
German: 76
French: 8
Chinese: 59
Indonesian/Malay: 130
Arabic: 54
Persian: 16
Turkish: 63
Romanian: 35
Unknown: 281
Spam: 31


Something fun:

Yahoo Groups were used to host hundreds of groups containing downloads of custom content for The Sims. Such groups could be found in many languages, as the group "the_sims_downloads" shows:

Comunidade destinada à downloads do jogo The Sims 1,2,3 e no futuro 4.
Este grupo aqui no yahoo, é para fazer downloads dos jogos da série The Sims(TS1,TS2 e no futuro TS3).
Aqui terão vários tipos de downloads e também dicas sobre os jogos da série The Sims.
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
I'm continuing to sort jumbled categories—mostly English, with patches of Spanish and the occasional Chinese category. I've hit quite a few categories for fanfic for specific fandoms as well as categories for fanfic for broad media types (TV shows, movies, comics & animation, music artists, etc.).

I've also hit the German categories, which are very easy to sort, as they have little to pull off (a little English, a little other languages, a little Unknown - but very, very little). I've sorted all of the German categories that were in order and am currently going through the jumbled ones they added afterwards.


Actual stats:

Now up to 85.00% sorted and 2.89% tagged.

Available tabs:

English: 3254
Spanish: 388
Portuguese: 14
Italian: 88
German: 61
French: 8
Chinese: 59
Indonesian/Malay: 130
Arabic: 54
Persian: 16
Turkish: 63
Romanian: 35
Unknown: 279
Spam: 31


Something fun:

The group "dr202" was just one of many that offered manuals or other downloads for keyboards (at least some of which may be the only places such files can be found anymore):

Information, Mods, Patches, pictures, manuals etc. A place for people to post information regarding electronic instruments. sh-32, dw8000, dx200, an200, dr202, sk-1, theremin, arp, yamaha, korg, moog, casio, ANYTHING! :)
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
I'm still sorting jumbled categories, often sets of related ones, such as sports by location, or individual family groups by letter. I've had sets of dog breeds, soccer/football categories, and schools in various countries to sort, along with scattered TV shows and movies, and lots of actors and music artists/bands. There have also been quite a few role playing categories, as well as categories specifically for fanfic for various fandoms.

I found a couple new natural languages—Luganda and Ladino—as well as some constructed languages.


Actual stats:

Now up to 80.01% sorted and 2.81% tagged.

Available tabs:

English: 3112
Spanish: 349
Portuguese: 13
Italian: 88
German: 5
French: 8
Chinese: 52
Indonesian/Malay: 127
Arabic: 52
Persian: 15
Turkish: 61
Romanian: 35
Unknown: 252
Spam: 31


Something fun:

Anyone into history or costuming might find "HistoricCostuming_EdwardianWW1" interesting (it was one out of a whole series, with groups for every era imaginable):

This group is for the discussion of, sharing research about and reconstruction of Historic Costume/Fashion during the Edwardian period through World War 1 (1900-1920AD). This is a public group that any serious person is welcome to take part in. I encourage using the resources available on the Yahoo Groups site to share information.
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
I've continued sorting through jumbles of categories - mostly English, with the occasional Spanish or Chinese category thrown in. (Oddly enough, I can't remember seeing any Italian categories mixed in - perhaps they'd planned enough room in the categoryid numbers to fit them all in before the English ones began?) I've been seeing lots of actors and music artists, quite a few movies and TV shows, random nonfandom categories that hadn't been created originally, and role playing categories for various fandoms that were created either after the fact or at the same time as the category for the fandom.

Found more new languages—Konkani and Gilbertese!


Actual stats:

Now up to 75.00% sorted and 2.71% tagged.

Available tabs:

English: 2920
Spanish: 315
Portuguese: 12
Italian: 88
German: 5
French: 7
Chinese: 44
Indonesian/Malay: 119
Arabic: 49
Persian: 15
Turkish: 57
Romanian: 34
Unknown: 228
Spam: 30


Something fun:

For anyone who loves fonts, the group "fontmaniacs" sounded like fun:

If you're one of those crazy folks who spends hours on the web and in newsgroups searching for more and more and more wonderfully diverse ways of making letters on paper, then this club is for you. Come here to trade fonts, talk about creating them, or just talk about collecting them.
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
This has still been a big jumble of English categories - everything from military groups to dog breed groups to various actors, music artists, and authors (of which they had a set by letter, like the previous sets for actors, music artists, movies, and TV shows). I also hit the Chinese categories, which are slower to sort (detangling the Unknown from groups that are in English and Chinese both from groups that are only in Chinese but use English for fandom titles). Since I can't easily tell what they are unless I copy/paste them into Google Translate, any Chinese speakers offering to tag (and I hope there are some eventually!) will definitely need to look up which cat_ids they want, so I can identify the correct tab. Fortunately there weren't many Chinese groups initially—mostly in fandom categories, especially computer & video games—so I got through them and back to the jumble of mostly English categories with the occasional Spanish one. They do pop up now and then later on, but in smaller and more manageable batches.

Found a few new languages - Venetian, Xhosa, Lojban, Monda, Aranese, and Coptic!


Actual stats:

Now up to 70.00% sorted and 2.67% tagged.

Available tabs:

English: 2692
Spanish: 296
Portuguese: 11
Italian: 88
German: 5
French: 7
Chinese: 44
Indonesian/Malay: 114
Arabic: 47
Persian: 14
Turkish: 53
Romanian: 33
Unknown: 208
Spam: 30


Something fun:

Some group descriptions just make me chuckle, like that of "Cat_Trek":

These are the voyages of the starship Catstongue. Their mission: to boldly meow where no cat has meowed hitherto. Come join the crew as we explore the deepest darkest furtherest reaches of space and time in our neverending quest for the lost planet of Catalonia. You will be assigned a Trek identity, and you can contribute to the continuing adventures being recorded in the Captain's Log. OR you can just join to have a good read! The latest cast list can be found in the shared files, under the cast list folder! New members might like to look here before they choose a character for themselves.
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
The last batch had me sorting through Spanish categories for Business & Finance, Computers & Internet, and Arts & Entertainment. This batch had the Spanish equivalents of Music, more Arts & Entertainment, Family & Home, Games, Government & Politics, Health, Hobbies & Crafts, Sports & Recreation, Religion & Beliefs, Schools & Education, Science, and Romance & Relationships (which definitely had some explicit content). And then it went back to jumbled English categories (with the rare Spanish one), with lots of blocks of regional categories on various topics, as well as sets of categories by letter for movies and for TV shows.

I found some new languages—Aymara, Occitan, Waray, and something I couldn't identify whatsoever. If anyone knows what language this is in, let me know!
FREECOM - Kayuttuq ulsa Ulinda karg Seesco irfdy FREE COMMUNICATING WORLD!
Asklad i fardad um in!

It was in an Australian cultural category, but that doesn't necessarily mean it has anything to do with Australia, given the high percentage of groups that were miscategorized.


Actual stats:

Now up to 65.00% sorted and 2.67% tagged.

Available tabs:

English: 2491
Spanish: 286
Portuguese: 10
Italian: 87
German: 4
French: 6
Chinese: 8
Indonesian/Malay: 107
Arabic: 44
Persian: 13
Turkish: 50
Romanian: 31
Unknown: 189
Spam: 30


Something fun:

As an LOTR fan, the group "ainulindale" sounds absolutely fascinating:

Esta lista pretende ser una herramienta de trabajo para la creación de un corpus de música tolkienista, así como creaciónn en danzas tradicionales o medievales.
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
Among the many varied categories this time was a whole series of categories for music artists in general, by letter. A had lots of Aaliyah and Avril Lavigne, for instance, while M had My Chemical Romance and Michael Jackson among many, many others. There was also a similar series of categories for actors, and for computer & video games, though the last set was much smaller overall.

I also finally hit the Spanish categories, as you can tell by how the number of available Spanish tabs shot up suddenly. These are by far easier to sort than English tabs, for although I have to be careful to spot the occasional Catalan and Basque groups mixed in (and once in a blue moon, Galician), there's little Unknown (I can't skim the Spanish as easily as I do English, and I'm less certain of whether someone will be able to tag them or not based on what's there, and have chosen to err more on assuming they can), no spam so far, and virtually none of the languages so prevalent in the English categories (Indonesian, Arabic, Turkish, Persian, etc., very few other languages at all). The Spanish fandom categories are almost entirely Spanish with a very rare Catalan group; I mostly see the odd other languages in nonfandom categories.

I did find a new language - Mayan!


Actual stats:

Now up to 60.01% sorted and 2.67% tagged.

Available tabs:

English: 2390
Spanish: 126
Portuguese: 10
Italian: 87
German: 4
French: 6
Chinese: 8
Indonesian/Malay: 102
Arabic: 43
Persian: 13
Turkish: 49
Romanian: 31
Unknown: 184
Spam: 30


Something fun:

Sometimes the interesting part about a group was who created it or was part of it, like with "heroesiiimapmakers":

Greetings. This club is for Heroes of Might and Magic III, IV and V fan's that like making their own maps. Here you can get and exchange tips and tricks on how to make a great map for the game. I am a former Level Designer for New World Computing's Heroes of Might and Magic II and III series. Though I no longer do it professionally but I still love making maps as a hobby. This club is not affiliated with New World Computing, The 3DO Company or Ubisoft in any way. This is strictly group for fans.
Now that Heroes V is here let's meet here often and share our initial impressions of the new game.
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
I'm still wading through a wild variety of categories, many highly specific. There was, for instance, a category under Star Wars/Characters that was just for Chewbacca. Mostly, however, this batch was a lot of categories under Regional, for subcategories along various lines such as Cultures & Community, Religion & Beliefs, Government & Politics, or Schools & Education. There were also large numbers of small categories under Business & Finance, automotive makes, music artists, and actors, as well as various fandoms that must have had their categories created later.

I also found a couple new languages—Nepali and Chuukese! And a couple new spam types in the /Government & Politics/Intelligence/ category (less interesting, lol).


Actual stats:

Now up to 55.01% sorted and 2.65% tagged.

Available tabs:

English: 2228
Spanish: 32
Portuguese: 10
Italian: 87
German: 4
French: 6
Chinese: 8
Indonesian/Malay: 99
Arabic: 42
Persian: 13
Turkish: 48
Romanian: 30
Unknown: 173
Spam: 30


Something fun:

Someone who was old enough to have a typewriter (or who have parents who did) may find the group "ibmselectrics" interesting:

Finally!! A place where people can worship, complain, and just plain mingle about the machine that hogged 75% of the typewriter market at one time, the one and only, IBM Selectric!
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
Halfway done with sorting! I hope the second half goes quicker than the first.

These categories have been quite a wild jumble of mostly smaller categories that Yahoo created later. More Cultures & Community, more Regional, more Health & Wellness, more Music, more Entertainment & Arts, more Computers & Internet, more Science… and yes, more Schools & Education and Romance & Relationships. I've even run across a new language: Georgian!

As I sort, I'm reminded afresh of some of the reasons why this project is important. True, some groups, like the ones filling the Technical Support category (for long-obsolete devices and operating systems), are relics of another time and are largely useless to modern users. But others - such as the ones sharing tubes for Paint Shop Pro (which, as far as I can tell, can still use the old files) - may still be relevant to people today. Not to mention the sheer amount of creative work of all types that is preserved only in the messages and files of various groups.


Actual stats:

Now up to 50.01% sorted and 2.45% tagged.

Available tabs:

English: 2001
Spanish: 29
Portuguese: 16
Italian: 87
German: 3
French: 5
Chinese: 8
Indonesian/Malay: 88
Arabic: 39
Persian: 12
Turkish: 43
Romanian: 29
Unknown: 155
Spam: 26


Something fun:

Someone will surely find the group "lostcities" interesting:

This group explores the legends and reports of lost cities, lost continents, lost communities and lost peoples around the world, and highlights the real-life expeditions that have set out to find them.

Did Plato's Atlantis exist, and if so where was it? What happened to British explorer Percy Fawcett, who vanished in the Amazon while searching for a lost city? Is it time for a revival of "lost race" novelists like Canada's James DeMille ("A Strange Manuscript Found in a Copper Cylinder" - 1888) and America's William Starbuck Mayo ("Kaloolah, Or Journeyings to the Djebel Kumri" - 1849)?

The illustration at right is Maxfield Parrish's "City of Brass" (1909), a bookplate rendition of the Saharan lost city depicted in the Thousand and One Nights.

Facebook group: http://www.facebook.com/group.php?gid=247655422288
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
This update has been a bit slow in coming, mainly due to life busyness. However, I at least finally finished sorting through Schools & Education groups, have sorted through Science groups, and then waded through Romance & Relationships (which had an inordinate amount of porn, naturally). The Romance & Relationships/Adult category, in particular, had approximately a 2:1 ratio of porn to non-porn, the latter being a mix of alternative lifestyle groups, fandom groups (which might have explicit fanfic or fanart), and random other groups that somehow ended up in there (such as a Nigerian ladies' golf association group!). Unfortunately, not all of the non-porn groups were saved, but at least a decent number were, mainly the fandom ones.

The Romance & Relationships/Romance category was possibly the most tedious so far of all categories, because not only did I have to pull off at least two to three times the number of groups that got to stay on the sheet I worked off (so much Unknown, porny Unknown, and Arabic, plus a good number with encoding issues), I couldn't tell apart many of those types from a quick scan due to the sheer quantity of HTML tags present. I frequently had to double-click to expand the description in order to tell if it was purely Unknown, if there was Arabic script or encoding symbols, etc.

Given that, I was relieved to finally finish the Romance & Relationships categories and move on to all the categories that Yahoo created after the fact. When put in sequential order, the earlier categoryid numbers belonged to categories who, next to each other, were all related under the same main category. However, Yahoo's people in charge of Groups clearly realized after creating them that more categories were needed. Suddenly the categories are in smaller blocks - occasionally a stretch of 10 or 20 related categories, but often just one or two categories, completely isolated from anything else related. It's meant a lot more variety in sorting, which is delightful after the tedium of Schools & Education and Romance & Relationships. I even found a group in a new language—Tongan.

I also discovered a new type of spam group - found only in the Schools & Education/Other category so far. The groups have a 5-14 character keysmash (including digits) type name and summary, and a description that reads "Dont know anything". I find it somewhat ironic that such a description is found in a spam group type that's only in a Schools & Education category…


Actual stats:

Now up to 45.06% sorted and 1.75% tagged.

Available tabs:

English: 1803
Spanish: 26
Portuguese: 15
Italian: 86
German: 3
French: 4
Chinese: 8
Indonesian/Malay: 75
Arabic: 37
Persian: 11
Turkish: 38
Romanian: 27
Unknown: 144
Spam: 22


Something fun:

The group "MEDTC-DISCUSS" looked a bit out of place in the Teaching and Methods category, but would almost certainly be of interest to someone…

This is a moderated discussion group about publications and other resources on medieval and early modern clothing, dress accessories, and textiles (including tools and processes).

Selections specifically concern clothing and textiles as a subset of material culture (not furniture or pottery, for example). Time and place focus is Europe and the Mediterranean, approximately 500 to 1600 CE. Emphasis is on scholarly and academic work, as opposed to "craft" or theatrical resources. Works under discussion include monographs, journal articles, theses, archaeology reports, and other published (and unpublished) resources, in any language, as well as events such as symposia, conferences, and museum exhibits substantially devoted to clothing and textiles of this period. Posts may be in any language, though English is preferred.

Membership requires moderator permission. Spam will not be tolerated. List members are invited to send notifications of any sources that they have encountered. Active scholars are encouraged to send announcements of their own publications and presentations. Please keep posts on-topic; discussion of personal projects (other than publications), reproduction techniques, supplies, social or re-enactment events, etc. should be taken to more appropriate fora.

This is NOT a SCA or reenactment list but that of academia. References to reenactment organizations... and one's membership in them... are strongly discouraged.
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
Well, I hit some of the real tedium. Schools & Education - including two of the five largest categories - generally for class groups, whether it's the "everyone taking this section of Chemistry" or the "everyone graduating from this school in this year" type. They weren't very interesting to sort and will probably be less interesting to actually tag, but they have to be gone through. The smaller categories specifically for educators will be more interesting, I suspect, but the classmates/alumni sorts are really, really not - and there are a LOT of them. That, plus real life projects and events, is why this update has been later than usual.

I didn't find any new languages, but I did find a group in Armenian which used the Armenian alphabet. That was interesting! The other two Armenian groups I'd seen didn't, and I haven't seen Armenian enough to recognize it without the alphabet, but with it, I was able to ID it quickly; the alphabet is quite distinctive.


Actual stats:

Now up to 40.04% sorted and 1.57% tagged.

Available tabs:

English: 1614
Spanish: 24
Portuguese: 13
Italian: 86
German: 3
French: 4
Chinese: 7
Indonesian/Malay: 67
Arabic: 28
Persian: 10
Turkish: 34
Romanian: 25
Unknown: 108
Spam: 20


Something fun:

While most of the classmates groups were quite boring, there was the occasional oddity, like the group called "school-wedgies":
Do you get wedgies at school? Do you give wedgies at school?
If you do this is the group for you!
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
It's been a busy few weeks! First I sorted through hobbies & craft groups, everything from collecting autographs to soapmaking to model trains to ham radio to knitting. Then I sorted categories of groups related to various issues and causes, from human rights to the environment to community service/volunteering. Then it was sports and outdoor hobbies - cars, hiking, baseball, soccer/football, etc. After that it was a whole lot of religion & belief-related groups - atheism, Buddhism, Christianity, Islam, and much more. I still have more religion & belief categories to sort through before I'm done with them.

Being out of the cultural categories, I'm not encountering many new languages (only Yiddish this time), though the religious categories had the occasional language which I've seen fewer than ten groups for.



Actual stats:

Now up to 35.07% sorted and 1.52% tagged. (Yay for being over 1/3 done with the metadata sorting now!)

Available tabs:

English: 1409
Spanish: 20
Portuguese: 10
Italian: 86
German: 2
French: 4
Chinese: 7
Indonesian/Malay: 46
Arabic: 27
Persian: 9
Turkish: 25
Romanian: 10
Unknown: 101
Spam: 20



Something fun:

There was a group called "potatocannons", with the following description:
This is a club for all you potato projectile loving people. This club is dedicated to the furtherment of potato cannon science.
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
I've been sorting through games, games, and more games. Particularly notable was the sheer number of groups for Sims CC and Freedom Force meshes and skins. (I was one of the ones who specifically hunted for Sims CC groups so that at least does not surprise me one bit.) And there were an enormous number of RPGs, more than I ever knew existed before. I have literally dozens of tabs of purely RPGs awaiting tagging.

After the games were a host of categories for law and lawyers, military, and politics. Then a set of categories for health-related groups—doctors, support groups for medical conditions, fitness & weight-related, pregnancy-related, etc. I've just finished those and have begun sorting groups for hobbies and collecting.


Actual stats:

Now up to 30.01% sorted and 1.40% tagged.

Available tabs:

English: 1169
Spanish: 18
Portuguese: 9
Italian: 85
German: 2
French: 3
Chinese: 7
Indonesian/Malay: 38
Arabic: 22
Persian: 8
Turkish: 22
Romanian: 9
Unknown: 93
Spam: 17


Something fun:

The group "allyourbasearebelongtous" made me chuckle, with description below:

In A.D. 2101
War was beginning.
Captain: What happen ?
Operator: Somebody set up us the bomb
Operator: We get signal
Captain: What !
Operator: Main screen turn on
Captain: It's You !!
Cats: How are you gentlemen !!
Cats: All your base are belong to us
Cats: You are on the way to destruction
Captain: What you say !!
Cats: You have no chance to survive make your time
Cats: HA HA HA HA ....
Cats: Take off every 'zig'
Captain: You know what you doing
Captain: Move 'zig'
Captain: For great justice
Put related pictures in the Photos section.


Fortunately, we were able to save this group's Photos section. (This was only true of ~5% of the groups we saved.)

(There was also a group devoted to the Zero Wing game, titled "someonesetupusthebomb".)
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
Lately I've sorted through lots of genealogy groups, family groups, recipe groups, pet-focused groups (so many dog breeds, lol), and heaps more cultural/language groups… It's been particularly slow going due to those, as they required a lot more scrutiny to tag and sort the languages properly. It was a relief to start into the games.

New languages I've found groups in are Ilocano, Chichewa, Uyghur, Kapampangan, Aragonese, Dakota, Hmong, Igbo, Malagasy, Soninke, Jingpho, Maltese, Kazakh, Northern Kankanaey, and Fulani, as well as several individual conlangs/auxlangs. (And maybe you'd count the groups for studying Old English, Pennsylvania Dutch, Papiamento, Cherokee, or Kristang, or the one dedicated to trying to revive a version of Phoenician, of all things!)

Next up are a whole lot of categories for games, mainly computer & video games. I just finished sorting the general category for them and spotted so many Sims and Freedom Force groups! It will be interesting to see what there is in the smaller, more specific categories (such as 3D gaming or various genres).


Actual stats:

Now up to 25.10% sorted and 1.33% tagged.

Available tabs:

English: 939
Spanish: 16
Portuguese: 7
Italian: 84
German: 2
French: 3
Chinese: 7
Indonesian/Malay: 34
Arabic: 20
Persian: 8
Turkish: 17
Romanian: 7
Unknown: 88
Spam: 12


Something fun:

An interesting group for a very niche interest, this is the description for the group "non_current_German":

This is a group for specialists -- translators; scholars, academics, and graduate students in various fields; and professional genealogists, among others -- who translate or work with non-current (18th- to 20th-century) German, including old handwritten or typed letters and diaries, printed documents, and historical or literary material of all sorts. Its purpose is to pool members' expertise and resources in terminology, regional terms and variants, non-standard grammar, official jargon, social and cultural conventions, and other elements that make a given text difficult to convey in English.
Guidelines for posting:
1. Queries must provide context. This should include:
* At least the sentence in which the word being queried;
* A description of the document in terms of approximate date, type of text, provenance, and anything else that would point list members in the right direction or provide clues, including your own hunches, reasons for rejecting the obvious, etc. Be explicit!
2. Any resources consulted, including published works, Web sites, and experts should be cited (title, URL, etc). This way we can all expand our knowledge of what is out there. Do some homework before querying the list.
3. All responses should be cited (title, URL, etc.; for the same reasons as in 2.).
4. Do not send in guesses, unless they are based on sufficiently solid experience. If you must send a guess, send it to the querier, not to the list.
5. Please don't send "thank-yous" to the list -- they should go to the person answering the query. If we see that a response is going to be helpful, let us all say a private "thank you."
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
It may seem all quiet on this front but the project is indeed progressing! I've found groups in new languages (Scots Gaelic, Irish Gaelic, Hausa, Bengali, Kannada) and sorted through heaps of categories of music artists and bands. (So many Backstreet Boys and NSYNC groups!)

More recently, I've been sorting through TV shows. Anyone wanting to tag a tab of their favorite show might just be able to now. Especially large were Buffy (over 800 groups!), Star Trek (at least 500), X-Files and General Hospital (300-400 each), with dedicated tabs (between 100 and 300 groups) also for Stargate, Xena, Power Rangers, Days of Our Lives, and The Simpsons. Other shows have to share tabs but there are still decent-sized categories for many. It's wonderful to see how many of the groups we saved!

Next up are a few more general/miscellaneous TV shows categories and then I'll be heading into a few Family & Home categories (family-specific groups, genealogy, and home building, for a sample) and the food & drink categories (which will mostly be lots and lots of recipes).


A note on spam groups:

A small but definite percentage of what got saved in the rush of everything was actually a whole bunch of spam groups. These are easily identifiable because their descriptions are what you might call "keysmash" - a mishmash of characters all jumbled together. After sorting out hundreds of these, I've observed there are really two distinct types.

Groups of the most common type were all created in 2011-2012, and follow this pattern:
Name is 5-6 characters.
Descriptions are a string of 13-19 characters (most are on the shorter side).
Summaries are 5-6 characters again, a different string from the name.
Characters are a mix of letters and numbers.

Groups of the less common type were all created in 2009, and follow this pattern:
Name is 13-14 characters.
Descriptions are often quite long and have spaces between "words" of spam of varying length (though a few occasionally have two or three short "words").
Summaries are identical to the group name.
Characters are only letters, no numbers involved.

I've run across a couple groups that look very much like the first pattern described - except the group name and summary are 4-7 characters, and the groups were created in 2010. Only two that I know of, though there are likely a few more already sorted onto Spam tabs early on, before I realized there were multiple distinct patterns at all.


Actual stats:

Now up to 20.00% sorted and 1.17% tagged.

Available tabs:

English: 743
Spanish: 14
Portuguese: 6
Italian: 84
German: 1
French: 2
Chinese: 2
Indonesian/Malay: 25
Arabic: 15
Persian: 5
Turkish: 14
Romanian: 3
Unknown: 62
Spam: 12


Something fun:

I was amused to see that, among the three groups in the Rick Astley category, one was titled "Rick_Roll", and its description was, primarily - you guessed it - the words to "Never Gonna Give You Up." XD
doranwen: female nerds, rare and precious (Default)
[personal profile] doranwen
The title sounds more grand than it really is, but I figure I ought to let people know how it's progressing every so often, and my previous updates were posted only to my personal journal at first and later reposted here.


Possibly-helpful explanation:

When I talk about "sorting metadata", it really means that I am pulling blocks of groups at a time, just names, creation dates, descriptions, and a couple other useful fields (because they had user-created text in them), and putting them on tabs in a spreadsheet. Then I move all the rows off that have group descriptions in a language other than English (moving them to the correct tab for their language), which are pure spam, which have encoding errors meaning the description will need an extra step of fixing it before it can be tagged, and any group which needs a volunteer looking in the messages themselves in order to tag content and/or language. Some categories are quick to sort because there is little to pull off, but others are cluttered with non-English and/or spam and take some time to get them all to the appropriate spreadsheet.

Once the blocks of groups (taken from one or more Yahoo categories, depending on size) are sorted through, the remaining groups can be split into tabs of ~200 groups apiece, and made available for volunteers to tag. Currently I'm going through English categories, but which language is kept vs. removed will naturally shift when I get to categories for Spanish or German, for instance.

Once all of the metadata has been fully sorted, then the tabs for all the smaller languages will be made available, and the tagging will move from beta status to "come one, come all". :)

I've just finished sorting a category for DJs, and will next be sorting through a lot of categories for various music artists and genres.


Actual stats:

Now up to 15.24% sorted and 1.09% tagged.

Available tabs:

English: 514
Spanish: 12
Portuguese: 5
Italian: 83
German: 1
French: 1
Chinese: 1
Indonesian/Malay: 26
Arabic: 14
Persian: 5
Turkish: 12
Romanian: 3
Unknown: 53
Spam: 4


Something fun:

And a sample of one of the group descriptions that I thought intriguing, for the group "ethelsmith":
This site is dedicated to the memory and the music of the legendary organist Ethel Smith. Through her films, Hammond organ arrangements, personal appearances and over 25 albums for Decca Records, Miss Smith was a driving force in introducing the Hammond organ to the world over 50 years ago. Her music publishing company produced literally hundred’s of Hammond arrangements, and one of the first instructional courses for the Hammond. This site has become a virtual museum of her record covers, movie posters, sheet music and music book covers. There is no other place on the web that contains more pictures and information on the incredible Ethel Smith. We hope your time here is interesting, informative, and fun. To check out details of Miss Smith’s film career, simply hit in message number 11, 12, and 13. Also check out the MP3 files of some of her hottest recordings. We welcome member’s contributions, especially personal stories and photos.
Page generated May. 20th, 2025 11:17 pm
Powered by Dreamwidth Studios
OSZAR »