Podcast: Upleveling Image Segmentation with Segment Anything

Upleveling Image Segmentation with Segment Anything

March 28, 2024

Christina Cardoza

Paula Ramos

Across all industries, businesses actively adopt computer vision to improve operations, elevate user experiences, and optimize overall efficiency. Image segmentation stands out as a key approach, enabling various applications such as recognition, localization, semantic understanding, augmented reality, and medical imaging. To make it easier to develop these types of applications, Meta AI released the Segment Anything Model (SAM)—an algorithm for identifying and segmenting any objects in an image without prior training.

In this podcast, we look at the evolution of image segmentation, what the launch of Meta AI’s Segment Anything model means to the computer vision community, and how developers can leverage OpenVINO^™ to optimize for performance.

Listen Here

Our Guest: Intel

Our guest this episode is Paula Ramos, AI Evangelist at Intel. Paula has worked in the computer vision field since the early 2000s. At Intel, Paula works to build and empower developer communities with Intel^® AI Inference Software.

Podcast Topics

Paula answers our questions about:

(1:28) The importance of image segmentation to computer vision
(2:54) Traditional challenges building image segmentation solutions
(5:50) What value the Segment Anything Model (SAM) brings
(8:29) Business opportunities for image segmentation and SAMs
(11:43) The power of OpenVINO to image segmentation
(16:36) The future of OpenVINO and Segment Anything

Transcript

Christina Cardoza: Hello and welcome to the IoT Chat, where we explore the latest technology advancements and trends. Today we’re talking about image segmentation with our good friend Paula Ramos from Intel. Paula, welcome back to the show.

Paula Ramos: Thank you, Christina. Thank you for having me here. I’m so excited to have this conversation with you.

Christina Cardoza: Yeah, absolutely. For those listeners who haven’t listened to any of your past episodes, like the one we recently did on artificial intelligence and looking at the next generation of solutions and technologies to come there, what can you tell us and them about what you do at Intel and what you’re working on these days?

Paula Ramos: Yes, this is so exciting. So, a little bit of my background, I have a PhD in computer vision and I’m working as an AI evangelist at Intel, bridging the gap between technology and users—so, users as developers, business leaders—so, people that want to have a specific journey in AI. I’m trying to make this technology closer to them so they kind of start the AI journey easily.

Christina Cardoza: Yeah, very exciting stuff. AI and computer vision: they are becoming hugely important across all different industries, transforming operations and everything from frontend to backend. So that’s where I wanted to start this conversation today, looking at the importance of image segmentation in computer vision and the opportunities that businesses get with this field.

Paula Ramos: That is really good, because I think that image segmentation is the most important task in computer vision. I would say there are multiple computer vision tasks—classification, object detection—but I think that image segmentation is playing a crucial role in computer vision because we can create object detection, recognition, and analysis there.

And maybe the question is, why this is so important? And the answer’s very simple: image segmentation helps us to isolate individual objects from the background or from other objects. We can localize important information, we can create some metrics around specific objects, we can extract some features that can also help us to understand one specific scenario. And this is really, really important in the computer vision land.

Christina Cardoza: Yeah, that’s great to hear Paula. And of course we have different technologies and advancements coming together that are making it easier for developers to add image segmentation to their solutions, and implement this and develop for this a little bit more seamlessly.

But before we get into some of the new technologies that are helping out, I’m curious: what have been the challenges previously with traditional image segmentation that developers face when building and deploying these types of solutions?

Paula Ramos: That is a perfect question for my background, because in the past I was working in agriculture, I already mentioned that. And I was working on my doctoral thesis with image segmentation and different techniques, and I was facing a lot of challenges with that, because we have multiple techniques to segment objects but there is no one-size-fits-all approach.

So we can see thresholding, edge detection, or region growing, but depending on what is the technique that we are using, we need to carefully define our best approach. I was working detecting coffee beans, and coffee beans are so similar, are so close. Maybe I have red colors around and I could see the over segmentation, merging objects, when I was running my image-segmentation algorithm. Or under segmentation: I was missing some fruits.

And that is a challenge related with data, because it is difficult to work in that environment when you are changing the light, when you have different kinds of camera resolution—basically you are moving the camera so you can see some blurry images or you can see noise in the images. And detecting the boundaries is also challenging.

And this is also part of the thing that we need to put in the map, or the challenges, for traditional image segmentation, is the scalability and the efficiency. Because depending on the resolution of the images or how large are the data sets, we can see that the computational cost will be higher, and that can limit the real-time application.

And this is a balance that you need to have—what is the image resolution that I want to put here to have a real-time application? And for sure if you reduce the resolution you limit the accuracy the most. And in most of the cases you need to have human intervention for these traditional methods. And I think that right now with the newest technologies in image segmentation I could have saved a lot of time in the past.

Christina Cardoza: Yeah, absolutely, I’m sure. It’s great to see these advancements happening, especially to help you further your career. But it also hurts a little bit that this could have made your life a lot easier years ago if you had this, but things are always moving so fast in this space, and that brings me to my next question.

I know Meta AI, they recently released this Segment Anything Model for being able to identify and segment those objects without prior training, making things a lot easier—like we were just talking about. So I’m curious what you think about this model and the value that it’s bringing to developers in this space.

Paula Ramos: Yes, I think that I would have liked to have Segment Anything Model seven years ago. Because all the problems that I was facing, I could have improved that with the model that Meta released last year, Segment Anything Model. So, basically improve the performance on the complexity so we can demonstrate with SAM that we have a strong performance on complex data sets. So that problem with noise, blurry images, low contrast is something that is in the past, with SAM. For sure, there are some limitations, and we have nothing in the image because the image is totally blurry—it is impossible for SAM to do that. So we need also to balance the limitations, but for sure we are improving the performance on those complexities.

Another good thing SAM has is the versatility and the prompt-based control. Unlike traditional methods that require specific techniques for different scenarios as I mentioned before, SAM has this versatility and it allows users to specify what they want to segment through prompts. And prompts could be point, boxes, or even natural language description.

I would love to say in the past, “Hey, I want to see just mature coffee beans” or “immature coffee beans,” and have this flexibility. And that could also empower developers to handle diverse segmentation tasks. I think that also I was talking about scalability and efficiency; so with SAM we can process the information faster than the traditional methods so we can make more sustainable these real-time applications, and the accuracy is also higher.

Christina Cardoza: You mentioned the coffee bean example. Obviously coffee beans are very important—near and dear to probably a lot of people’s hearts listening to this—and it’s great to see how this model is coming in and helping being able to identify the coffee beans better and simpler so that we get high-quality coffee in the end result.

I’m curious what other types of business opportunities does this Segment Anything Model present that developers are able to—with still some limitations—but be able to streamline their development and their image segmentation?

Paula Ramos: I think that Segment Anything Model, from my perspective, presents several potential business opportunities, across all different image segmentation processes that we know until now. For example, we can create content or edit content in an easy way, trying to automatically manipulate the emails, remove some objects, or create some real-time special effects. Augmented reality or virtual reality is also one of the fields that is heavily impacted with SAM, with the real-time object detection. And also for this augmented reality, enabling the interactive experience with the virtual elements—this is one of the things.

So another thing is maybe—I’m thinking aloud—product segmentation, for example in retail. SAM can automatically segment product images in online stores, enabling more efficient product sales. Categorization based on the specific object features is also one of the topics. I can see also peak potential in robotics and automation. So, improve the object-detection part—how we can equip robots to use SAM to achieve a more precise object identification and manipulation in various tasks and autonomous vehicles, for sure. This is also something that I have in mind.

On the top of my mind—I also see that there are a lot of reserves around that—is how medical images and healthcare can improve the medical diagnosis because SAM has the potential to assist medical professionals in tasks like tumor segmentation, or leading accurate diagnosis. But for sure, and those are examples, I don’t want to say that those businesses will be solved with SAM; we have it as a potential application. I think that SAM is still under development, and we are still improving Segment Anything Model.

Christina Cardoza: Yeah, that’s a great point. And I love the examples and use cases you provided, because image segmentation—it just highlights that it is so important and, really, it’s one piece of a whole bigger puzzle and solution and making things really valuable for businesses. It’s a very important piece, and it is important to make some of these other capabilities or features happen for them within their industries and solutions. So, very interesting to hear all these different use cases and upcoming use cases.

You mentioned the limitations and things are still being worked on. I’m curious, because obviously you work at Intel and I know OpenVINO^™ is an AI toolkit that you guys use to help performance and optimization. So how can developers make the best use of SAM and really be able to overcome those limitations with other resources out there, especially OpenVINO?

Paula Ramos: I think that one of the good things that we have right now in these AI trends is so many models are open source, and this is also the capability that we have with SAM, and also OpenVINO is open source, and developers can access this toolkit easily. And we already have good news for developers because we have SAM integrated with OpenVINO. What that means? So, we already have optimization pipelines for SAM in the OpenVINO Notebooks repository, and I think that this is great.

And so developers need also to know about this—maybe they already know about that—is that we have this repository where we are putting multiple AI trends every day. And this is great, because something happened in the AI field and two or three days after that we have the notebook there. So right now we have a set or series of SAM examples in the OpenVINO Notebooks repository—I think that you will have access to the URL and you can take a look at these and try this by your own.

The good thing is you don’t need to have a specific machine to run these notebooks; you can also run this on your laptop, and you can see the potential to have image segmentation with some models in your laptop. Basically we have a series of four notebooks right now: we have the Segment Anything Model, this is the most common. And a good resource that we have from OpenVINO is that you can compile the model and use OpenVINO directly, and also you can optimize the model using the neural network compression framework, NNCF. And this is a great example of how you can optimize your process also in constrained hardware resources because you can quantize these models as well.

Also, we have three more resources. We have Fast Segment Anything Model. So, basically we are taking this from the open source community; we are using that model that is addressing the limitation of Segment Anything Model. Segment Anything Model has a heavy transformer model, and that model substantially requires a lot of computational resources. We can solve the problem with the quantization, for sure, but Fast SAM decouples the Segment Anything task in two sequential stages. So it’s using YOLOv8, the segmentation part, to produce the segmentation part of the model.

We have also Efficient SAM in OpenVINO. Basically what we are doing here is we are using a lightweight SAM model that exhibits the SAM performance with largely reduced complexity. And the last resource that we have is something that was just posted in the repository, is Grounding DINO plus Sam, that is called Grounded SAM. And the idea is find the bonding boxes and at the same time segment everything in those bonding boxes.

And I invite the developers to visit that repository and visit those notebooks to understand better about, what is SAM? How SAM is working? And the most important thing also: how you can utilize OpenVINO to improve the performance in your own hardware.

Christina Cardoza: Of course, and we’ll make sure that we provide those links for anybody listening in the description so that you have them and you can learn more about them.

One thing that I love that you mentioned is obviously this space is constantly changing; so, something can change and you guys will address it and have it a couple days later. So it’s great, because developers can really stay on top of the latest trends and leverage the latest technologies.

We mentioned SAM: there’s still a lot of work to do obviously, and this is just the beginning and obviously we have all of these different resources coming from OpenVINO and from the Intel side. So I’m wondering if there’s anything else that you think of that developers can look forward to, or anything that you can mention of what we can expect with SAM moving forward, and how OpenVINO will continue to evolve along with the model.

Paula Ramos: So, for sure we’ll continue creating more examples and more notebooks for you. We have a talented group of engineers also working on that. We are also trying to create meaningful examples for you, proofs of concept that you can utilize also in your day by day. I think that OpenVINO is a great tool in that you can reduce the complexity of deep learning into applications. So if you have expertise in AI it’s also a great place to learn more about these AI trends and also understand how OpenVINO can improve your day-by-day routine. But if you are a new developer, or you are a developer but you are no expert in AI, it’s a great starting point as well, because you can see the examples that we have there and you can follow up every single cell in the Jupyter Notebooks.

I think that something that is important is that people need to understand, developers need to understand, is that we are working in our community: we are also looking for what is your need, what kind of things you want to do. Also, we are open to contributions. You can take a look at the OpenVINO Notebooks repository and see how you can contribute to the repository.

The second thing is also—and Christina you know about that—so, last December the AIPC was launched, and I think that this is a great opportunity to understand the capabilities that we can increase day by day, improving also the hardware that developers can utilize so we don’t need to have any specific hardware to run the latest AI trends. It is possible to run on your laptop and improve also your performance and your activities day by day.

So basically I just want to invite the people to stay tuned with all the things that Intel has and that Intel is having for you in the AI land, and this is really good for all of us. I was a start-up developer some years ago—I don’t want to say how long was that—but I think that for me it was really important to understand how AI was moving at that moment and understand the gaps in the industry and stay one step ahead of that, to show improvements and try to create new things—in that case at that moment for farmers, then also for developers.

Christina Cardoza: And I think that open source community that you mentioned, it’s so powerful, because developers get a chance to not only learn from other developers, ask questions, but they can also help contribute to projects and features and capabilities and really immerse themselves into this AI community. So I can’t wait to see what other use cases and what other features come out of SAMs and OpenVINO.

This has been a great conversation, Paula. Before we go, I just want to ask if there’s anything else you would want to add.

Paula Ramos: Yes, Christina. Just I wanted to say something before saying “Bye.” I wanted to say that we have a great opportunity—to developers, maybe if they are interested to participate in that. You can also contribute through the Google Summer of Code program that we have. We have around 20 proposals, and I think that also we can share the link with you, Christina, and developers can see how they can contribute and what kind of amazing projects we have around Intel, around OpenVINO.

And thank you for having me here! It’s always a pleasure to talk to you.

Christina Cardoza: Absolutely, thank you. Thank you for joining and for the insightful conversation. I can’t wait to see developers—hopefully they’re inspired from this conversation—they start working with SAMs and optimizing it with OpenVINO. So, appreciate all of the insights, again. Until next time, this has been the IoT Chat.

The preceding transcript is provided to ensure accessibility and is intended to accurately capture an informal conversation. The transcript may contain improper uses of trademarked terms and as such should not be used for any other purposes. For more information, please see the Intel^® trademark information.

This transcript was edited by Erin Noble, copy editor.