New AI app for describing images and video: PiccyBot

Feature suggestions and a bug report

Firstly, have read through the whole thread now and huge thanks to the developer for being so active here and for improving the app continously. After using the app for a while now I have some feature requests. I'm using the app without voice feedback, only with VO reading the descriptions.

Full-fledged conversation history: I would like to see all questions and answers I have asked about an image, like in most other apps of this kind.
"Clear conversation" button: A button to clear all data about the current image/conversation.
Share URL or similar: Whatever "magic" Be My Eyes has that makes it possible to share almost anything from e.g. Safari and have it described.
AI customizations: The ability to set your own "system prompt", that is sent with all requests. I don't know if all models have that feature, but OpenAI has it at least. A bit more advanced would be to also be able to control "temperature", i.e. how "random" a model response is (as I understand it the recommendation is to keep it low for scenarios like these where a more predictable answer is preferred).

I also have a bug to report. It seems that, even if the length is sett to 100%, responses are cut off. I noticed this just now, so I haven't done extensive testing, but sending a few images to GPT-4o it clearly cut off somewhere in the middle. I also tried typing "continue" as you do in e.g. ChatGPT to get the rest of the response but it re-generated it instead. Thanks again for all the hard work!

the video stuff is brilliant…

the video stuff is brilliant for the ring doorbell. any way u can get it to build up a database of people it can then recognise? so u would know who u missed or whatever.
be my eyes/ai doesnt do videos n'est pas?so piccybot has a big plus point there.
dont think theres any easy way to get facial recognition from a video.

Recognizing specific people

I guess you'd need on-device AI to have that implimented because of the privacy concerns associated. But that'd definitely improve my life if it were there as I deal with a lot of staff every day and it'd be brilliant if I could know which one of them is coming into my cabin. Just one use-case.

umm.. whoops

ok yes piccybot is great, but has its limitations
i trusted it too much, and it told me somebody was breaking into my home!
so i dialed 999, and yes should obviously have gotten some sighted input first.
the sweeny arrived post haste and umm whoops!
i showed them the video and they said, err... this is yourself putting the ring doorbell back on! i'd recharged it earlier that day, and lost track of time.
ah if i had a quid for every banana skin i ever trod on :-)

ok my apologies to the tax payers and boys in blue for the f*kwittery, but, it is very probably worth mentioning that even nowadays, ai, has its limitations.

however, with reference to gokul point, about on device ai.
it is obviously possible to arrange for off device storage and code execution to be conducted securely, but u'd need procedures/practices to be put in place that would make such a thing trust worthy.
yes u r obviously right since be my eyes has hit this problem as well.

Update: Longer videos and sharing from instagram

Hi guys,

Thanks for all the feedback! Sorry to hear about the scare Neil, but still glad you used PiccyBot for it :-)
Recognizing contacts would be hard privacy wise, Gokul. Maybe I can integrate it with Apple Intelligence later on, which should have the data..
Thanks Ollie and blindpk for the suggestions, I have actually adjusted the base prompt to be more in line with what you would need as blind or low vision user. Hope it helps. The Instagram share is one of the first 'generic' share ones. Will need a paid service for it to expand it fully, trying to balance it.

PiccyBot Pro users now get longer video descriptions (set video quality to 'high' in settings). And the Instagram video share to PiccyBot is hopefully useful. Do note it will take longer as I need to both download and then upload the video to generate the description.

Thanks for using the app, looking forward to further improve it!

Meta Ray-bans

Love the interview on Double Tap. The thought of having the pixies in my meta ray-bans is incredibly exciting. I really hope you can make that happen.

Thanks Mr. Grieves

Was great to talk on Double Tap. I have been trying to get PiccyBot integrated into the Meta Raybans, but no luck so far. It looks like the only possible way to get it done is through a Whatsapp service. You could then say 'Hey Meta, Whatsapp last photo to PiccyBot' and then get a description back. Much slower than the Meta AI though and unfortunately you can't do video descriptions this way. I love my Meta Raybans but I really wish they had opened it up for us developers.

@Martijn

Ah, I was optimistically hoping you had discovered some secret to make it all work. I can see how Meta might not want apps like yours integrated because the glasses do promise the same sort of thing using Meta AI and they seem to be very heavily pushing it. However, it's not a patch on what the other AI models that your app uses.

When I am out and about I do sometimes ask Meta AI to describe what I'm looking at but it's always a bit disappointing. I personally would love to be able to send the image via WhatsApp to the pixies and get those high quality descriptions. But I don't tend to get a huge number of WhatsApp messages so I turn notifications on. I can imagine many people not wanting to do that.

It feels like we are agonisingly close to the perfect solution.

Does anyone know if the AIRA

plans to work on the Ray-Ban Meta glasses includes Access AI? I just realised, I should probs see if the Envision glasses work with Access AI?

I heard from AIRA

It is just the live service, even on the Envision glasses. Next question is to find out about the ARX headset - that works with Seeing AI and an iPhone version comes out in two months.

Linking smart glass camera

Has anyone explored the possibility of linking some random smartglass camera with one of these apps, say, PiccyBot? Is that even possible in the IOS environment?

GPT4o Mini added

Hi guys, I added GPT4o Mini to the list of PiccyBot models. It is supposedly very close to GPT4o but faster. It's also a lot cheaper, so I may consider using it as the default model for the free version of PiccyBot if performance is good enough. Please try it out and let me know what you think?

I download this app and then delete it again, it's not for me.

But the reason is because of the love and care that goes into it.

You really should consider a donate button or perhaps a monthly/yearly payment, I'd pay yearly and then probably delete the app :)

New update today

Hello guys,

Good to have the AppleVis forum back online!

I released an update of PiccyBot today. The main improvements are in localizations, fixed some language support issues.

I also switched the home screen buttons around, after feedback from some of you.

The chat window will give brief and fast responses now. It doesn't remember earlier questions yet, I am working on that, will add it in the next update.

Hope you guys will be back with feedback in this forum, it was sorely missed!

Thanks,
Martijn

Current models

Thanks for the conversation function, really nice to see that implemented. I have a question though, is the app using the latest models from OpenAI and Google, "chatgpt-4o-latest" or "gpt-4o-2024-08-06" for OpenAI and "gemeni-pro-exp-0827" (or something similar) for Gemeni? Especially the Gemeni model seems promising after testing it on Chatbot Arena.

Latest models

Blindpk, regarding your question about the models, PiccyBot currently uses gpt-4o-2024-08-06 for the main OpenAI model and gemini-1.5-pro-latest for Google Gemini. In addition to this, you can use the GPT40-Mini and Gemini Flash models as faster but more limited options for these two.

As soon as new models surface I'll try to include them. Looking at Reflection and Grok 2 at the moment.

Excellent and fun app

This is a really fascinating app. My daughter took a boat ride down the Mississippi and she sent me videos from that boat ride and piccyBot describe them just fine. It was awesome to get it described in real time as it were.

Re: Latest models

Thank you for the fast response and really nice that you stay on top of things. This is one of the strengths of this app, that you can try out different models for the descriptions.

The app isn't working for me

I downloaded this app today, but I'm having some trouble sharing images from other apps to get descriptions with PiccyBot.

I was able to get descriptions of an image I shared from my camera roll and also one from WhatsApp, but when I shared one from Safari and another from Discord, the app continued to play the loading sound even after a whole minute had elapsed. When I tried to share a couple of images from the Dystopia app for Reddit, I was taken to a screen with no elements on it, and nothing further happened, even after waiting a whole minute.

I'd have no problem paying the subscription to help cover the development and operational costs of the app, but it doesn't seem to be able to meet my use case.

Am I doing something wrong? Why is the app able to only receive images shared with it from certain apps? This is not an issue with other image description apps.

TJT 2001 - Sharing from apps

Correct, the app can describe videos and images from your phone library, Whatsapp, and Messenger. I am still working on supporting more apps. But for now, either open PiccyBot and select the media from there, or save it from the app to your library and share it to PiccyBot from there.
Can you confirm which image description app does this properly directly? I can then study their method and hopefully implement it in PiccyBot as well.

The other apps

Thanks for your prompt reply. I didn't realise that there were differences in how image recognition apps received data from other apps.

I'm able to get descriptions from the apps I mentioned using Seeing AI and the Be My AI feature in Be My Eyes.

It makes stuff up!

I was on holiday in July and sent it a montage of photos put together into an MP4 video to describe. it did pretty well, but then started talking about 'we can hear the wind whispering through the leaves of the trees and the sound of the waves'. What nonsense! The video was silent i.e. no audio! Not that it matters enormously, but to paraphrase and channel the spirit of cricket commentator Fazeer Mohammed: Why did it do that? Unbelievable! I do not have the personality thing turned on, so that doesn't explain it.

Great app though. rEally fantastic.

How are people doing now getting it to describe the whole of a video? I still find that a bit hit and miss.

last question inadvertently get carried to the next image

example, i was discussing about baroque furniture and specifically talking about the fitness of the grand piano in the photo. then i uploaded a timelapse video about a city street. the ai explains it is a city scene and there is no piano in sight. this behavior started in the latest version, and it happens 100% of the time (that is if you do ask a question and you then send a new photo or video).

Question carry over

LaBoheme, thanks for reporting this. Checking it out. I suspect it is a side effect of the chat mode, which normally only deals with questioning a single image or video. Should be a minor fix, hopefully backend only.

Question carry over: further feedback

LaBoheme, I checked the question carry over issue. It is a result of the feature to start with a specific question. It will then continue with that same question unless you clear it. I could clear it automatically, but I can imagine you have a specific question like 'Is there a house in the picture' while going through a number of images one by one. If I clear in-between that would not be practical anymore.
So right now, if you start 'blank' PiccyBot will give you a general description of the image or video. If you enter a question, PiccyBot will then continue to use this question for any further images or videos until you clear it or edit it. Separately, you can go into chat mode and ask specific follow up questions on the same image or video.

If anyone has any suggestions for a better approach, please let me know?

Excellent app, but I have a small suggestion.

It would be nice if we could upload our picture along with our question at the same time, to avoid the full description and go straight to the important stuff. I once tried uploading a question alongside the picture at the same time, but the app ignored my question and gave me the full explanation. It seems like you can only really ask questions once the picture is already uploaded.

Suggestion for question

My suggestion would be a change of workflow. Instead of having the question box on the page before you chose an image/video, put a screen (or use an existing one) before you send the image/video to the model (the first screen would be empty except for the buttons). There you have the the question box, the history, etc.. When you have gotten your description, and the ability to chat and so on, and you choose another image/video, have the question box pre-filled with the question that was asked last, so that the flow will be speedy if you are to ask the same question about multiple images. If an image is shared to the app, put the user on the "question screen".
The only downside I see is that it might be one more button press before you get the description, but in my opinion the better logic outweighs that little inconvenience. As it is now the question box is a bit hard to grasp how it works.

simple solution for the question carry over issue

a clear button to clear the question. right now, one has to tap the text field for the clear button to appear; the clear button should be visible whether the user is editing the text area. that would make life ten times easier.

Batch Video Processing?

I've been enjoying and loving PiccyBot, especially for describing videos, which no other app can do! I wanted more, so I paid for the full version.
Most of the videos that I've been having described are from Meta Ray-Bans, which are shorter clips, less than 3 minutes each. I've been having to either count the videos or remember timestamps of which I just had described, find next to hav it described, and sometimes losing count in all of this.
A feature request would be to select videos for a given day or a set of videos, have first processed and described, then batch background process the others in the day/set, and play the descriptions in sequence.
1. Select videos for a day/set.
2. Process first/describe first.
3. While first is being described, Batch process other videos in set.
4. After first is done being described, play descriptions of other videos in sequence.

Never mind my previous coment

I have used the app quite some time ago, and it appears that many things have since been fixed, so my bad.

Clear button

I like the idea of having a Clear button. This seems to be a common way of handling th the issue of clearing a text edit box in other apps throughout the OS.

--Pete

South Indian languages immage description?

As I am from India, I would like to know or have South Indian languages detection of image when I share to this app, please improve this as well as give feature for PDF reading along with our South Indian languages and it will be better

Picstral?

So I've been hearing that Picstral, a vision-oriented model from Mistral has been doing really well in the image description department. Never checked it out, but the demos I've seen are quite good. They show it being able to solve captchas etc. So thought it'd be interesting to check out and include as a model if it's worth it.

Feature request

Speaking of which, apart from the regular image descriptions, I was thinking it'd be nice to have certain specific functions tailored to maybe specific models which can do it really well, and a preset prompt making things easy. For example, a function that's just designed for easy solving of captchas, which is something that's never addressed properly, or a function that helps us match colors of clothings or something which'd really help fully blind folks. Just throwing out random thoughts.

Pixtral added

Gokul, thanks for the suggestion. I have added Mistral's Pixtral model to the model list in settings. Please try it out? So far, I have seen it working well, but making some mistakes (switching left and right for example). But let me know what you think?

Clear button

LaBoheme, Pete, based on your input I added a clear button that is always available on the input text field. I released a new update with that change just now. Hope it makes the process at least one step easier.

Possible bug

When sharing a video from instagram, PiccyBot no longer seems to describe it. It's just stuck on please wait for ages.

Pixtral

So I played around with pixtral for a bit; interestingly, it seems to describe colors more vividly? than any other models around. Maybe it's my impression, but there it is...

I think you are right

This seems quite funky!

"The image presents a captivating digital illustration of a human eye, which is the central focus of the composition. The eye is depicted in a close-up view, with the iris and pupil clearly visible. The iris is a vibrant mix of blue and purple hues, while the pupil is a deep black color.

Surrounding the eye is a network of circuit-like lines and dots, suggesting a technological or futuristic theme. These elements are rendered in shades of blue, purple, and orange, adding a sense of depth and complexity to the image.

The background of the image is a dark blue color, which contrasts with the bright colors of the eye and circuit lines, making them stand out prominently. The overall effect is a striking blend of organic and technological elements, creating a sense of intrigue and curiosity."

Still loving this app!

I just have to say that I still enjoy this app very, very much. My best friend's husband posts a lot of his art work and nature pictures on facebook, so if there's one I'm extra curious about, I save it to my photo library and then have PiccyBot tell me all about it. Wish I could somehow use piccy bot without saving the pictures first, but that's far from a big deal. :) Thanks for all the time and effort you've put into creating such a fabulous app. If I end up getting a new iphone 16 at some point, I'll be curious if its camera takes better pictures than my current 13 pro does.

Chat button disappeared

Is it just for me or has the chat button disappeared? I've had a few images described today but the chat button is not there anymore.

Thanks for the Pixtral feedback

It is turning out to be a good model to have in the list, thanks Gokul and Charlotte. And thanks for the thumbs up, Missy! Not sure what happened with your chat button, blindpk, haven't heard that issue before.

Lots of chatter about the OpenAI voice model release. Hopefully this week. And then let's see how to implement it in PiccyBot..

Re: Chat button

Yes it is a really strange issue, I will try experimenting a bit more to see if I can make it appear again.
I also want to thank you again for making this app! In this AI landscape where new models appear all the time it is fantastic to have many of them available in the same place and the new ones added quickly.

Chat button mystery solved!

Turns out it was, at least mostly, my fault. The chat button is called "microphone" for me, but it works like the chat button would. I didn't check that one out earlier.
Two other minor things as well:
* If you turn of "waiting sound" there is still a, very discreet, waiting sound in the chat view. Actually, this quiet waiting sound is more pleasant than the standard one and it would be great to have as an option for the main screen waiting sound as well.
* The feature where the app gets a new description when you change model and leave the settings screen has a small bug (or is it intentional?) in that it activates even if you choose the same model as you had before. It only happens, it seems, if you open the model selection menu, if you don't do that nothing happens when you leave settings.

Nothing change after subscription

Today I subscribed in the premium option, but nothing change in the app for me. is that normal ?

Subscription should enable settings

Hasajaza, subscribing should enable the settings screen. There, you can select voices, personality on/off, select an AI model, set the length of the description you want, enable longer and better video descriptions, and share the audio of the description.
If this was already working for you, you must have subscribed before? Odd, but if so, please cancel the subscription through Apple.
Blindpk, thanks for figuring this out. Clear points, will adjust the button description and make sure the waiting sound setting also adjusts the chat mode sounds.

New AI app for describing images and video: PiccyBot

Options

Comments