I’ve spent large portions of the past few weeks working with Google’s product feeds–the big data files which provide them with the various items that appear under Google Shopping.
(Fun Fact: Everything you see under both Google Shopping and Bing’s Product listings are paid ads–neither service will show items which don’t directly kick back money for the click to the search company in question. This is why the shopping results seem much more limited in their results than the corresponding web searches).
A less fun fact is that Google has extensive algorithms for blocking content that “violates their system policies”. Since I’ve been attempting to upload simple lists of all the comics for sale on Atomic Avenue, it’s been interesting to say the least to discover what title names qualify instantly for blocking (in most cases, apparently immune to any sort of human review–not that we haven’t spent numerous hours begging for just that. The repeated answer is always that we have no choice but to remove the “offensive” material”).
Google refuses to explicitly confirm their algorithms for filtering, but after hand-reconciling the banned items from the more than 1.6 million comics we are attempting to list, I can say with some confidence that these are some of the low-lights of their often ludicrously broad–I’d go so far as to say “utterly defective”) algorithm:
- Ridiculously, Google treats as offensive any comic title containing the word “Black“. (E.g. “The Black Knight“, “Black Ops“, “Black Panther“, “Black Widow“, “Batman: Blackgate“, etc.)
- Same story for a variety of miscellaneous “angry” nouns: “Rage“, “Malice“, etc. Several issues of The Avengers, for instance, have been rejected since they include appearances by a character named “Rage“.
- Sword and sorcery titles may be huge, but they’ll apparently have to get by on the sorcery alone. Any title containing the word “Sword” (e.g. “Savage Sword of Conan“), “Dagger” (e.g. “Cloak & Dagger“) is denied. Same thing goes with “Gun” and “Rifle“–even “Mace” (Sorry Mace Windu!).
- Google seems to have a fear of books or products which mention armed combat. “Revolution” is consistently banned, as is “Uprising“. (Goodbye Marvel crossover events, as well as science-fiction titles). “Frontline Combat“, the classic E.C. war series also can’t be posted.
- “Muse” and “Pandora“? They’re not just bitchin’ female comic characters–apparently they’re also now the names of prescription meds and are automatically banned from posting (although apparently you can file an appeal to have these reconsidered).
- Mental issues seem to be a touchy subject: Bedlam is banned, as is all mention of the Watchmen character Rorshach.
- And then there are the weird, and strangely ominous ones. for instance, “Sentinel” is an automatic disapproval, whether it refers to the Marvel robots, the English sci-fi mag, or Captain America: Sentinel of Liberty. All banned.
It’s too early to say whether Microsoft (Bing/Yahoo) shares this similar–and rather odd, to say the least–set of exclusionary rules. I’ll update the post once I know how I do going through their catalog process.
As a postscript, Google has recently floated the idea of changing search rankings of news sites based on their deemed “truthfulness”.
It’s worth noting that this proposal for filtering the world’s news sites comes from the same folks who are currently banning the word “Black” from product names.