App request for developers: A AI driven photo description app that adds captions to images either in apple photos etc, or has it's own album

By Unregistered User (not verified), 12 April, 2024

Forum

iOS and iPadOS

Hi,

I know chat GPT and be my eyes and other such AI driven apps can describe images. What I'm looking for though is a way for us blind folk to take pictures from family and friends, facebook and instagram etc, and be able to easily mark them up with descriptions. The work flow, to my mind would be:

Choose picture > Share > AI Photo Description App (AIDI) > automatically adds description > Save button > Sends image back to origin/album in app.

There are expansions on this, the ability to inquire more of the image, change the way it is described, focusing more on the building, for example, than the people around it, and also the ability to pull out images from it's own album and to ask questions about it.

I'm really looking for something to simplify the way we mark up and catalogue images. We live in a wonderful world now where we're not excluded from such things any more, it's just getting the technology in a format that works best for us. Going through many photographs, copy and pasting, etc, is a pain in the bottom.

I know there is a shortcut kicking around that uses AI to describe a screen, but we need to be able to caption these photographs so we can flick through them quickly and then, if we want, interrogate their content.

Thanks...

Options

Comments

Integrating this feature into an already-existing app

Integrating this feature into an already-existing app sounds more feasible than making an entirely new app for this task. I know there are apps that provide image descriptions and those that let you label images or search your photos for a particular thing by entering keywords. What has to be done is combine these two in one single app, and it's most likely much easier to begin with an app that already does either one.

It wouldn't be bad if Object Voice had these features.

Object Voice is an app that has one simple function: Describe whatever it detects in taken or imported photos, or when you point the camera at something. Plus, it's offline so everything should be more privacy-oriented, smoother and faster. Another thing is that the other apps that utilize online image processing often tend to omit certain details especially when describing people, and people are one of the most essential things that need to be identified in images so that we know who is featured in each photo. It would be great to be able to hear names instead of long descriptions without having to mess around with any of the technical limitations and measures that make the current models slower and their descriptions often lacking certain details. Object Voice should gain new and better capabilities if it can make use of the AI expected to be incorporated into iOS 18.

An Issue

I just added a caption to a picture of a hawk in my camera roll. VO describes it as an owl sitting on a pile of trash. Close enough, it was on a bird it had just preyed upon.
It turns out that this caption is not detected and red by VO, so it still says it's an owl on a pile of trash. The only way I can find to read the caption is to double tap on the picture, then go into the picture info, as if I were going to add the caption.
That's a big ordeal if I'm browsing through an album, almost useless. But hey, if there's a way to directly display the picture captions, please tell me.
I think a better way is if this hypothetical, AI app renames the picture file with a short description, then saves it wherever it's going to end up.
It's a little like what I do now, manually. Share it to Files app or my Dropbox then rename it and keep it in a folder either on my computer or on my phone.

Slight progress for me

That caption feature in the photos app is useless with Voice Over, and it seems that sighted people find it almost useless too, going by some of my Google searches. However, some of the terms you used in the OP lead me to dig deeper in the info tab, such as the markup option. Somewhere in all the options--I hope I can find it again--there is an "add description" option. This does get announced by VO right after the photo date when browsing through the thumbnails, even though the process of adding it is a huge pain, as you point out. Thank you, thank you, thank you, Ollie! I've been trying to figure this out for years!
I think I understand where you're coming from now, and it would be a good app, if it could do all that for the blind user in an easy way.
As far as the native VO image description... it isn't too bad, but I usually have it turned off for most parts of iOS. It gives better image descriptions and text recognition than the alt text Facebook inserts into its pictures, so I like it there, and in the photos app.

@Ollie

Ah, I would never have found that.

A little while ago I started writing a Python script that would go through a load of files and add captions to images from OpenAI. I hadn't found the description field and was using caption instead, which you get to through Info.

There is a standard (I think) way of adding meta data to images using exif. I found that captions were not stored there and had to be retrieved using a 3rd party library which I think was probably simulating some Apple SDK. I chickened out of trying to write the data back.

But it's possible the description is different. I will have to have another play around and see.

I'm guessing it will be an Apple proprietary thing. In which case it might be better to write using exif and then have something else that reads those and adds them to the photos in an Apple way. That way the data is still available if you are on Windows or Android and can find a way to get it.

Apple is Maddening

It's unclear to me if the description in the markup is added to the metadata of the actual photo file. I shared the photo of the hawk to my Dropbox and looked at the properties over in Linux. The caption that VO does not announce is there, but not the markup description. There's also some other information missing in the shared photo that can be accessed in the information tab in the photos app.
I'm going to guess that this information is stored in iCloud and on Apple devices, but doesn't get shared outside of the "garden." I've also read that on the Mac, you can add a title, but that's not on iOS, or at least not easily accessible on iOS. *Slams face into pillow and screams*

Re: caption

I think exif is the mp3 tagging thing for photos. I definitely couldn't find the caption from my Python script when I looked.

However I must admit I get totally lost in the Mac Photos app.

I think what I would really want for this sort of thing is for it to be aware of who was in the photo and where it was taken. I guess OpenAI won't know who anyone is. I did see in Photos that you can select faces in a photo and name them, but I presume you would have to do that for every photo individually which is a bit tricky when you can't see the faces. The location should be available somewhere though.

I think it wouldn't be too much of a stretch to think that AI will soon be able to recognise faces it has seen before in your photos and be able to automatically detect them for you. This feels like the kind of thing Apple could bring in if it can do it on-device.

My gut feeling is that although the AI descriptions are amazing, and in their current form add a hell of a lot of detail to a photo that I would love to be able to get to more easily, that it wouldn't be sufficient for generating something I could flip through quickly. For example, as I swipe I might want something like "Bob and Dave leaning against a stile in the Lake District" but I wouldn't want to know about every blade of grass or what else was going on unless I specifically asked for more details. Whereas I think the full Be My AI description would get a bit tiring after the first few swipes.

Sorry I realise I am going off on a little tangent. But I think the three problems we have are getting the right info, being able to apply it in batch to our photos and then being able to easily browse it later.