A new Computer Vision Model (v2.1) including 1,770 new taxa

We released a new computer vision model today. It has 71,286 taxa, up from 69,966. This new model (v2.1) was trained on data exported last month on January 15th and added 1,770 new taxa.

Why v2.1 and not v1.7? As we mentioned in August, we have been training our new computer vision models using a transfer learning strategy.

All of the models that we have released since August (v1.1 - v1.6), one every month, were trained based on the same source model. We call that source model v1.0. The v1.0 model started training in 2021 on 55,000 taxa, 27 million photos, and trained for about 4 months (80 epochs).

While we've been transfer learning new production models, we've also been working on a new source model. We call this new source model v2.0. The v2.0 model started training in 2022 on 60,000 taxa, 30 million photos, and trained for about 9 months (200 epochs). All of the additional data and training time have produced a better source model, which in turn is making better final production models. The model we released today was the first model based on the v2.0 source model (v2.1). Note from the figure below that v2.0 won't ever be released since was trained on data/a taxonomy that is now over 9 months out of sync, which is why we are releasing v2.1 trained (via transfer learning) on data/a taxonomy from January 15th. Also note that we trained v1.7 as a backup in case v2.1 didn't evaluate well. But since v2.1 performed significantly better, we won't be releasing v1.7 and will continue releasing models derived from the v2.0 base until we move to v3.0 in the next 9 to 12 months.

Thanks to NVIDIA for the generous hardware grant that made all of this training possible!

Taxa differences to previous model

The charts below summarize these 1,465 new taxa using the same groupings we described in past release posts.

By category, most of these 1,465 new taxa were insects and plants

Here are species level examples of new species added for each category:

Click on the links to see these taxa in the Explore page to see these samples rendered as species lists. Remember, to see if a particular species is included in the currently live computer vision model, you can look at the “About” section of its taxon page.

We couldn't do it without you

Thank you to everyone in the iNaturalist community who makes this work possible! Sometimes the computer vision suggestions feel like magic, but it’s truly not possible without people. None of this would work without the millions of people who have shared their observations and the knowledgeable experts who have added identifications.

In addition to adding observations and identifications, here are other ways you can help:

Share your Machine Learning knowledge: iNaturalist’s computer vision features wouldn’t be possible without learning from many colleagues in the machine learning community. If you have machine learning expertise, these are two great ways to help:
Participate in the annual iNaturalist challenges: Our collaborators Grant Van Horn and Oisin Mac Aodha continue to run machine learning challenges with iNaturalist data as part of the annual Computer Vision and Pattern Recognition conference. By participating you can help us all learn new techniques for improving these models.
Start building your own model with the iNaturalist data now: If you can’t wait for the next CVPR conference, thanks to the Amazon Open Data Program you can start downloading iNaturalist data to train your own models now. Please share with us what you’ve learned by contributing to iNaturalist on Github.
Donate to iNaturalist: For the rest of us, you can help by donating! Your donations help offset the substantial staff and infrastructure costs associated with training, evaluating, and deploying model updates. Thank you for your support!

Posted on February 23, 2023 06:27 PM by

loarie

Comments

Hi Scott, many thanks for sharing and illustrating iNat's CV training approach in such an easily understandable way, while I can only figure that the actual process must be darn challenging! Your general approach sounds like a really smart one, and I look forward testing the new CV with examples.

Posted by jakob about 1 year ago

Very Nice. Many thanks!
(please add the unpin from dashboard option)

Posted by tonyrebelo about 1 year ago

Great news and explanation!

Posted by cthawley about 1 year ago

Thanks! It always brings a smile to my face when I see these updates.

Posted by wildlife13 about 1 year ago

Wow, that's awesome! I'm always happy to see these!

Posted by gatorhawk about 1 year ago

I really appreciate this consistent, brief, clear updates about what's happening with the CV

Posted by leptonia about 1 year ago

This is awesome. Thank you, volunteers and staff (and sponsors). I've been wondering: does it help train the model for me to upload multiple photos of the same specimen from different angles? Or is having 20 photos of the same beetle in one iNat record more of a drain on server resources than a help to training the computer vision algorithms? I mostly mean for the somewhat rarer taxa, as I'm pretty sure the database already has images of mallards and dandelions from every imaginable angle!

Posted by hmheinz about 1 year ago

Great! Looks like a big step forward, and personally happy to see three more Christmas Beetle species incorporated into the model now.

Posted by hauke_koch about 1 year ago

One of my observations helped! :D

Posted by observerjosh about 1 year ago

Wonderful! Thanks for the detailed details!!

Posted by schizoform about 1 year ago

Nice to have another model, glad that my photos of 21 species were used, hope that even more species will be added after new season in a v3 model!

Posted by marina_gorbunova about 1 year ago

Very cool, I always love hearing about these updates. @marina_gorbunova: How did you determine that your photos of 21 species were used? (Did you just count up all the species that you knew you had observations of?)

Interesting that casual observations are included. I noticed this because I saw Atelopus zeteki (Panamanian Golden Frog) on the list, which only has casual observations since it is considered extinct in the wild.

Posted by sullivanribbit about 1 year ago

@sullivanribbit there're links with new species added to the model in text. Of course overall in the whole model there're a couple of thousands species that I have photos of, but you can't know if they were used or not, with new ones it's pretty clear as there's not much to choose from.

@marina_gorbunova: Thanks -- so it sounds like you followed the links, and manually counted up species for which you had supplied photos. That's what I figured. It's great that your photos helped with so many of these species! I was proud of my measly 4 lizards, though I confess I didn't attempt to look through all the new plants.

@sullivanribbit you don't need to look through, just click "my observations" in filters.

Thanks Marina - even not so good photos help I see
https://www.inaturalist.org/observations/63477472

Posted by dianastuder about 1 year ago

Is the trained model or the training source code available for download? I will probably be training my own model on the INaturalist Open data, but if the pretrained model was available it could save a lot of electricity and CO2 emissions :)

Posted by eturpin about 1 year ago

The not so good ones, provided properly identified, probably help tune the model the best!! Esp. given that most beginners will be posting suboptimal pictures.

Congratulations on the launch. I'm constantly impressed by the accuracy of you model, and excited to see the continued progress. Computer vision is not an easy problem to crack, so kudos to your team. :)

Posted by borala about 1 year ago

@marina_gorbunova: Aha! I didn't think of that -- thanks!

The transfer learning has already been working well and I'm even happier that there's a new source model to use as a basis.

@loarie: When you say that "v2.1 performed significantly better" than v1.7, what kind of tests are used to determine the quality of the model? Would this be a set of test photos with "known" IDs that you run the model against and count the percentage of correct IDs that are the #1 guess or that place in the top 3?

Posted by rupertclayton about 1 year ago

yes - we use an test dataset of observations not used to train the model and look at the percentage where the top suggestion was correct. We look at this globally, and also taxonomically (e.g. just reptiles) and regionally (e.g. just Europe) and also with and without the geographic weighting.

Posted by loarie about 1 year ago

Great info @loarie. That seems very comprehensive.

Good to see Plantae is still on top! :)

Posted by yerbasanta about 1 year ago

I would argue that insects should be on top if we want a balanced representation.
But it is probably true that of all groups, plants are probably the best suited to virtual museums: they dont run-fly away - they pose for photos; they dont have hidden sex organs - they display their "species" to pollinators; they dont hide (not when in flower anyway) - and can easily be revisited later on; photos show their true shape and 3-D features (which get squashed to oblivion when pressed and dried - although small flowers can be rehydrated - but not all organs); they dont have unrelated life stages - special stages (seeds, pollen) are readily associated with adults; and many are too large for standard herbaria - but fit perfectly onto photos. So yes, plants probably should be way "on top" in virtual museums.

What would be nice is a graph showing the proportional representation of different floras (apart from vertebrates do we have comprehensive lists of species for biogeographic realms?). I suspect that at present the representation (for 401 new plant species) is basically North America (148) and European (81), and it would be nice to see how the African (91), South American (82), Asian (102) and Australian (107) species are progressing relative to their total floras.

Great!
A new species of earthworms in the vision model!
Lumbricus rubellus
All worms will no longer be automatically identified as Lumbricus terrestris

Posted by max_carabus about 1 year ago

Great!
What happens to the observations used for the training but for which the identification is changed/corrected afterwards, how long will those influence the suggested IDs of the computer vision ?
I corrected the identifications of some taxa (~10% were wrongly identified even in research grade), unfortunately after the deadline for v2.1.
Those observations will be potentially used for the correct taxa and discarded for the wrong ones in v2.2, right ?

Posted by karsten_s about 1 year ago

Updated every month recently

PS I would like to find your earlier blog posts on Computer Vision - but this

https://www.inaturalist.org/posts/search?utf8=%E2%9C%93&q=computer+vision+&post%5Bparent_type%5D=Site&post%5Bparent_id%5D=1&commit=Search

doesn't get me there.

A label or tag for Computer Vision on your blog posts?

Thanks, CV and I have learnt together, since I started on iNat in 2018.
Better and more useful suggestions, which I learn to evaluate.
Going back to my first CNC when the CV couldn't recognise Protea cynaroides.

Data from 15th February is being used for current training?

Then we have a March and April chance to add species ahead of CNC22.

A new Computer Vision Model (v2.1) including 1,770 new taxa

Taxa differences to previous model

We couldn't do it without you

Comments

Add a Comment