Thoughts on the challenges of spider-IDing on iNaturalist

Long effortpost TL;DR: I am proposing that the people who help identify spiders on iNat (though this probably also applies to many other groups of small arthropods) make more liberal use of the Data Quality Assessment flag "No, it's as good as it can get" when we believe it is unlikely an observation will ever get a more specific ID. This is intended to be a conversation starter among the handful of people who follow spiders and help with identification. Again, this is not entirely specific to spiders but that's my area of study/interest so that's how I wrote this out.


Recently I spent a couple of weeks trying to review "all" of the Texas spider observations in Needs-ID state and give reasonable IDs where I could (even just moving from Order -> Family for later review). For people who have spent a lot of time sorting through these buckets, you are probably familiar with some of the headaches. For a variety of reasons, many observations are simply not identifiable to any reasonable level (say, Family). After some discussion with a few of the other active identifiers, we decided it would be helpful to start tagging "unidentifiable" observations as such - using the Data Quality Assessment flag labeled "No, it's as good as it can get." This will turn the observation to either Research Grade or Casual depending on the community taxon level and (maybe?) the number of people who agree with that assessment. So it will be filtered out of most peoples' (default) identify criteria.

My main motivation for doing this is the steady increase in observations outpacing the ability of the identifiers to keep up with - There is a limited number of people actively reviewing spiders in North America (my main focus but also the bulk of observations) while iNat's popularity is increasing. So the number of observations is growing rapidly, and the number of identifiers seems to be either flat or dropping. I don't have exact numbers (although certainly the data is out there) but I recall from previous conversations that in 2019 the number of Needs-ID (Araneae / United States) was approaching 200k. By mid-2020 it was over 300k, and at the end of 2020 it was over 400k. Over 40% of the total observations in iNat's history were uploaded in 2020 alone. Going through these buckets is fairly time consuming if you are trying to be accurate and (even better) helpful. Over a 2-week period (probably averaging ~8 hours a day) I reviewed something like 20 thousand observations (rough guess) and made about 5-6 thousand IDs - mostly to family or genus. Probably less than 10% of those IDs were specific and probably only a couple percent actually resulted in an observation reaching RG. Of course I spent a lot of time consulting the literature and BugGuide, trying to include helpful comments where I could, etc. - so I was not going for maximum speed, but still was attempting to get through as much as I could in the time I had. Just trying to give a rough idea of the time it would take an average(ish) person to work through a pile of a given size. A month of work and 10,000+ IDs later and I feel like I'm about where I started.

Anyway, before long I decided to start marking observations I considered plainly "unidentifiable," to remove them from Needs-ID status. The rough criteria I initially used was that, due to the photo quality, I couldn't confidently place the observation in any particular family, and I doubt anyone else could either. I did not apply this to anything with clear photos that I simply wasn't familiar with, nor did I apply it to confusing taxa like the many similar-looking Agelenidae, Philodromidae, Thomisidae, Dictynidae, etc., where it was a good enough photo(s) but I couldn't identify it further. Because probably there is someone out there who studies the Dictynidae and is familiar enough with the patterns to make better IDs (even if that happens years later) and I don't want to get in the way of that. Basically, just photos where the quality/focus/angles could not justify even a family level ID. One example would be a photo of a "typical" orb web with no spider - so you could give an ID of Araneoidea (could be Araneidae, Tetragnathidae, maybe Uloboridae?) - but is it really necessary to keep that as "Needs ID" ? In most cases I left a copy/paste comment along the lines of "Unfortunately there is likely not enough detail to give a more specific ID" or "It is an orbweaver but I can't be sure which kind" so the observer at least knew that someone reviewed it.

Examples of where I have been applying this:

  • Photos that are plainly too blurry or distant to even suggest a family
  • Photos that are too dark and I could not improve sufficiently with basic photo editing software
  • Night photos lit with flash (mostly orbweavers in webs) where only the rough shape is visible
  • (Most) shed exoskeletons that do not seem to have identifying features other than 8 legs
  • Partial/abandoned webbing with no animal visible
  • Multiple possible species/genera and definitely not enough detail in the photos to be more specific

A lot of these are cell phone images from users who made an iNat account, posted an observation or ten, then never came back. Many of them seem to be what we call "duress users" - students who had to make X number of observations for a school assignment, then never came back. I definitely support that (we want more people to discover iNat) but it leaves a lot of "frass" as BugGuide calls it. Also I want to make clear that I fully appreciate the challenge of making good photos of tiny (often moving) animals and I am not trying to criticize anyone's photography. Spiders are difficult to photograph well, even with dedicated equipment (I still suck at it) and I don't want to discourage people from submitting these observations. But at this point iNat has a rapidly growing pile of spider photos that I feel will never even be reviewed, and I think removing "unidentifiable" things from Needs-ID as we go will eventually help the small group of people who are willing to spend their time on this. Of course I know it is not really possible to make a definite ID without the specimen in hand, and for that reason many observations may never reach RG, and that's fine. But there are 1000s of cases where we have the same photos being reviewed by the same 5 or 6 people over the course of several years, each individually making the determination that "It looks like some type of orbweaver maybe but that's the best I can do" and then it is left there for the next person. Which eats up a lot of time and seems unproductive/frustrating to IDers. So I am trying to find a way to make things better without being too aggressive/critical or accidentally "hiding" something that could be scientifically interesting.

Some other ideas I have had in parallel with this:

  • An Observation Field indicating the observation has (multiple) high quality photos - for easier review, maybe by more seasoned arachnologists.

    Could be particularly helpful for the smaller or more cryptic spiders like Erigoninae/Linyphiinae, Thomisidae, uncommon Therirdiids, etc. The idea being that we could present a more curated subset of high quality observations (e.g. all of wildcarrot's photos :) ) and request help from outside experts.
  • An Observation Field indicating the observation contains microscope photos - this is uncommon but I think would be useful.
  • "Holding bins" to help sort easily-confused or similar-looking taxa (like many Clubionidae/Cheiracanthidae/Anyphaenidae) for later review

    Joe Lapp did some initial work on this while he was more active on iNat (I think he stepped back partly because of the stuff I'm hoping to improve)
  • Observation fields or some other way to tag things like egg sacs/webs/spiderlings for further review, but get them out of Needs-ID
  • Some easy way to tag-team other IDers on observations that need more people to correct the community ID

    A common example is: Computer Vision said *Oecobius* (it's not), some other person agreed, so now we need 4 votes to fix it. This could take years to happen naturally, especially on older observations.

I ran some quick numbers while I was working and found that almost 40% of the total observations in iNat's history (Spiders / Texas) were made in 2020. Almost 40,000 observations, just spiders, just Texas. For USA it was well over 40%. Over 2/3 of all US spider observations (400,000+) are Needs-ID. I expect iNat will continue to grow at a steady pace, or at least I don't see any reason why its popularity would suddenly fall off. This is awesome, but is overwhelming for the limited number of volunteers we have to try and sort through everything. So that's pretty much it - I am looking at this as a way to make Spider-IDing-on-iNat better for us, without upsetting observers or obscuring any potentially-interesting observations. I welcome anyone's thoughts. I chose a journal entry because many people are not active on the iNat forum and this seemed the best way to involve everyone who might have input. It might not be the best forum for an active conversation but we'll see.

I did save a bunch of representative examples of things I would or wouldn't treat as "unidentifiable" for various reasons, but I didn't include it here because I didn't want this to seem like a call-out post - more a group problem solving thing. But if there is interest I can include some examples. I have had this basic conversation with several people individually so I thought a sort of group discussion might be productive.

Thanks for reading (sorry for the wall of words) and any opinions you would like to share about this, and thanks for the work you do to make iNat so awesome!

-Justin

Posted on January 22, 2021 08:33 AM by jgw_atx jgw_atx

Comments

Not an Identifier of spiders, that said, this beginning conversation, is more than a "wall of words" and very important. Every time I post a spider or any photo, the identifier and the individual aspects of each insect is in mind. Am thinking, what does the identifier need to see to make the necessary distinctions. Am fairly new to INaturalist, but perhaps a reminder to those posting cellphone images to use the editing/crop options on their phone is in order, making the insect more easily identifiable. Some folks just don't realize what you need to see in a photo. Appreciate everyone here on INaturalist taking the time to help us learn. Thank you, Justin

Posted by kneubaue about 3 years ago

Justin, great post! I also like the “Observation Field” ideas, some way for a spider identifier to at least tag an observation that they personally are not able to identify. And while the "No, it's as good as it can get" assessment flag is not something I have used up to this point, you make an excellent case for its use going forward. Thanks!

Posted by tim1009 about 3 years ago

@kneubaue Excellent points! As far as what a spider identifier is looking for, image uploads of both the dorsal and ventral sides are always appreciated when it's possible to get both. An image/description of the design of the web can many times narrow the id to Family level by itself. And if possible, an image of the eye arrangement is very telling. An observation with images uploads of all or most of these criteria should be able to effectively be id-ed by one of the spider identifiers on iNat.

Posted by tim1009 about 3 years ago

@kneubaue I appreciate when people keep identification in mind when they make observations. I have been loosely planning to put together another post, maybe a wiki-type post on the iNat forum, about how to take better pictures of spiders. My plan is to take my phone, point&shoot, and SLR cameras and go find some spiders - then post examples of how different devices and techniques produce different images. For example, cell phone images of a spider found indoors with and without the flash on, to illustrate which one produces a more "identifiable" photo. And how much detail is lost when iNat resizes images (as a suggestion for people to crop their photos), that sort of thing. The iNat forum allows wiki-style posts that are editable by multiple users, so hopefully others could share their techniques and examples too.

Posted by jgw_atx about 3 years ago

Fantastic post, Justin! I love your data-driven, streamlining approach. I had totally forgotten how useful "No, it's as good as it can get" can be and will start using it more effectively right away. Thanks for all the amazing ways you contribute to iNat!

Posted by tigerbb almost 3 years ago

@ jgw_atx This is a very very random time to reply. However, I just came across this thread and if you haven't done that comparison already I think you should! it sounds like a fantastic idea.

Posted by mbwildlife almost 2 years ago

Add a Comment

Sign In or Sign Up to add comments