New AI app for describing images and video: PiccyBot

Something's wonky with copying texts from the APP

Normally I have the APP generate a description and copy the description it gave me and use that as the image's title. For some reason, it's not copying now. I copy it, go to the file, rename file, paste, nothing. Just to test it out, I copy the description, try to paste it into a msg to myself, nothing. To be sure it's the APP and not other things, I randomly copied texts from safari, google, dropbox, home screen...and paste them into a msg to myself. They all work, but when I try to copy the description from the APP, it doesn't paste.

Copy and paste

Sorry bit slow to post this, but just wanted to say thanks for the change to copy and paste. This works really well. I was worried it would keep hassling me if I switched back to PiccyBot but it seems to know that once it has described what is on the clipboard then it doesn't need to do it again. If I quit the app and restart then it does, but that's not a big deal.

This is a fantastic new feature and really opens up Facebook. Thanks for continuing to release such great updates.

Problems with descriptions on the app

I think something got broken in the latest update with the description feature on the app. Not sure exactly when it started happening, but here’s what’s going on: when I try to copy the text description—whether it’s through the option on the screen, the share sheet, or using the direct method on iPhone by tapping four times with four fingers—it doesn’t actually copy the description.

The only way I can currently get the text description is by sharing it to another app like WhatsApp, Messenger, or my email, and then sending it to myself. So yes, I just wanted to comment and confirm that this is definitely a bug happening on the app right now.

Winter Roses

Tapping the screen four times with 4 fingers? Is that something you set up? Only asking because, by default, it should be tapping the screen four times with—3—fingers.
Just a heads up. :-)

Grok 4 issue

In the last 3, maybe 4 days, I've not been able to get Grok4 to work hardly at all. 9 out of 10 times it times out or just sits there "processing" forever. Tis a shame cause Grok4 is by far my favorite AI model ATM.

Copy paste gesture

I don't remember changing the setting, but I must have at some point, so, yeah. Maybe that's why.

Fair enough

Just wanted to clarify incase you were performing the wrong gesture by mistake. 🙂

Having the same issue with Grok 4

All I get is some timeout message in response to subsequent follow-up questions even if and after I upload a photo to Grok 4 and do get an initial description.

Grok and copy

Guys, thanks for pointing out the copy issue. This was a side effect of handling the automatic paste. It will be fixed in an update within a day or two. The Grok 4 model should be working again fine now.

how exactly does share description work?

it seems share description only work with the last answer. i send a pic and get a description, i use the "ask more" function and ask two more questions. when i share the description and save it to a file, i only get the second (last) response from the "ask more" section, is this how it is supposed to work? if that's the case, can we add an option to share the entire transcript from begin to end?

Share description

LaBoheme, this is how it currently works. After each response there are a trio of buttons, including a share button. If you use that, it will share that response. So if you want all responses, you'd have to share all of them one by one. I'll see if I can change that for the general share button on the main screen, for example.

I have just released another update of PiccyBot, which streamlines the settings screen somewhat and adds a few more options. It also includes short descriptions of each AI model, which was a common request.

please add .webp processing

it has become very common among many sites. since PiccyBot doesn't support it, sharing won't work, one would need to save it to photo album first.

Updates

LaBoheme, I'll look into support of .webp, hope to add that soon.

The Claude 4.5 Sonnet model is now available within PiccyBot, replacing the Claude 4 Sonnet one. I find it particularly good in describing emotions within scenes.

The PiccyBot interface has had an overhaul and should work smoother now. Some earlier video processing glitches have been resolved as well.

I received a request for guidance while taking a selfie within PiccyBot. It's quite some effort to do this though, giving audio feedback to get your face in the correct frame. Do you think it is worth adding or are there alternatives for this?

Thanks for all support!

re: Updates

Hello, thank you very much for constantly updating the software. Yes, it would be excellent if we could receive an audio guide for taking pictures, whether with the rear camera or the front camera. It would be absolutely great if we could then save the photo in Photos.

My thoughts, and a couple of suggestions

I know you said you don’t want to remove any of the models, but I’m making a suggestion here. You already have a pretty active model base, but most of the models don’t really differ that much from each other. I was only suggesting this because I thought you’d maybe like to simplify the platform a bit — that way we could focus on getting higher-quality models without having so many versions doing almost the same thing. There’s virtually no major difference between most of them, but I was thinking maybe you could keep the latest few and remove the ones that aren’t needed. The best way to handle it might be to ask for feedback first — then decide which models are worth keeping as standalone and which could be consolidated. You’re never going to be able to please everyone, of course, but asking the community what they think might help. People could vote on which models should stay and which ones they hardly use. That way, we’d get a more consistent setup and more consolidated descriptions, especially for videos, since these descriptions tend to be pretty short even when the output is set to 100.
As for the question you asked about feedback and taking photos — I’m not entirely sure how that would work. I had an idea, but here’s what I was thinking: some apps out there guide you while taking a photo — they tell you cues such as if your face is in the frame, then instruct you to move up or down. The only issue is, if your hands shake or shift slightly while following the instructions, by the time it tells you to move, you might’ve already missed the perfect position, and the photo could come out blurry or slightly off. Then again, I don’t take photos of myself for social media or public sharing. If I do take one, I usually have someone else take it for me — family or friends, people I know personally. I’m not one to post much on social media, but that’s a separate topic. Anyway, I don’t think this exact idea is possible because I don’t know if VoiceOver speaks while you’re taking photos. What I was thinking instead is maybe there could be a way for the model to record a short video instead of taking a static photo. For example, when you’re using your iPhone’s camera, you can record a video and also capture a photo at the same time.

So maybe the model could use that approach — where instead of snapping a single image, it starts recording a short video. While you’re moving the phone to find the right angle, the system could detect when your face is properly in frame, then automatically capture the perfect still image from that video. That way, you wouldn’t have to worry about exact timing. It could either (a) take the photo automatically once your face is detected clearly in frame, or (b) analyze the short video afterward and extract the best single image from it.

You could also include a timer function — like how the phone’s camera takes a photo after 3 or 10 seconds — except this would be automated once the framing is correct. It wouldn’t need to be complicated. You could just have the user hold the phone, let it record briefly while adjusting the angle, and once it detects the face is clear, it captures the image automatically or selects the best frame from the clip.

You might need to talk to the users to see if this is technically possible, but that’s the general idea I was thinking of. It could be a useful feature — maybe something premium, if it takes more processing power or development work.

Maybe changing voicing to Grok4?

In the "voice mode" in Grok you can prompt Grok to "read this with a Chinese accent" or "Read this in an ASMR style soozing tone<" and it'd do really well. Is there a way for us to do this with the current voices for Piccy, or maybe just change the voice set up to be voiced by Grok? Personally I find the Grok voices more human and expressive especially in other languages like Mandarin Chinese.

Guidance when taking pictures

Yeah, I was the one who suggested that feature the other day, I would even be willing to pay for it, but I think it’s a very important feature! It is true that there are other apps that can kind of help, but not really! Because even once you point out a face correctly, by the time you actually take the picture, the phone has moved and the picture comes out terrible! So what we need is the app to guide us and once we’re pointing correctly for it to automatically take the picture!🤓 as for the models, I don’t change them anymore because I don’t really see the difference, I don’t really get it so ever since I reinstalled the app I’ve been using the model that was already being used when I downloaded the app

I have to say

I have to say I am a huge fan of the Groc for model. I hope dictation is saying that correctly. Natalie doesn’t give extremely detailed descriptions, but I can view things that may be filtered out in the other AI models.

PiccyBot on MacOS

PiccyBot is now available for MacOS as well. You need a M1 or higher Mac for this. You can benefit from any existing subscription by using the same Apple account. Just go to the App Store on your Mac and search for PiccyBot, it should pop right up.
The camera on the Mac won't be available, but you can describe any video or image stored on your Mac and it has all the regular PiccyBot features.

Guidance for taking pictures

I'm already subscribed, but would pay again for this. it'd be an wonderful thing to have especially if we could refine it to work fairly decent.

Pasting an image

Firstly, great news about the Mac app - thank you. I was initially unsure I was downloading the right thing as the App Store suggested it wasn't a Mac app, but I installed it and opened it up and it seems to work well.

On the Mac, my main use case for this sort of thing is getting images on the clipboard described. But being on my work computer, I don't really want the automatically detection of clipboard on. I noticed there was an option to turn it off. Is there another way I can paste the image in? I tried but I only ever managed to get the last text I copied into the text box as opposed to the last image.

Whilst the feature is fantastic, I'm still a little unsure about how much I like the automatic option on the phone either. For example, I could be in an app and share an image to PiccyBot. So the app opens and immediately I'm prompted if I want to paste in the text I happened to have copied to clipboard before. It's a minor thing but I think I would prefer to be in control of when it happens. Is the only other way to do this in iOS as per the original version? I never really got the hang of it because I don't think it was really set for VoiceOver.

I would quite like an easy way to paste on my own terms in both applications.

And going back to the Mac, the other thing I would really, really like is a way to have the image described locally without going to the cloud. I'm a bit reluctant to put work images in here - I probably will if I have to, but I would really rather have something that was all local in those cases. Please correct me if I am wrong but I don't think an option like this exists now, so can it maybe be added as a feature request?

Thanks again for continuing to work on this. The amount of new features that have come in since I first subscribed is incredible.

Photo guidance iOS and shortcuts for MacOS version

Guys,

In the latest update of PiccyBot, I have added a photo guidance mode. Switch to front facing camera while using VoiceOver, and you will get spoken guidance on whether you have centered your face and if you are the correct distance. Hope this helps! It even works in all PiccyBot languages.

For MacOS, keyboard shotcuts Command I and Command V now work to select images or videos. This should allow easier keyboard only control of PiccyBot on MacOS.

Thanks for the feedback as always!

Tried it out.

Works as well as the Guided frames feature in Google pixel devices. It's come out really well for a first implimentation. but it'd be really nice and useful if it can be developed as a dedicated photography tool for the visually impaired with detailed instructions as to frame and capture pictures, using both cameras and not just portraits, but also of landscapes etc. Maybe one will have to surrender some of the creative autonomy to the AI in such a tool, but I'd be fine with that.

Added haptic feedback as well

The latest version adds haptic feedback when you are centered and at the correct distance for a selfie.

This selfie mode is quite popular and I am considering adding a separate app for just this feature. PiccyBot is getting a bit heavy on features, of which many are not used often. Separate 'one thing' apps may be more practical.

Dedicated app

I feel that'd be great. The way I see it, the primary function of PiccyBot is image/video recognition and the primary function of the new app should be photography. Like I said in my last comment, I'm really looking forward to that app becoming an actual photography tool for the blind camera user. That would be empowering for so many at so many levels.

Feature Request: Rear Camera Guidance with Multiple Face Detecti

Hello Martijn and the PiccyBot community,
Thank you for developing such a fantastic and useful app! I find PiccyBot's image and video description features incredibly helpful.
I have a suggestion for a feature that would greatly enhance the experience for taking photos of people:
Could you please consider adding guidance for using the rear camera that also incorporates the ability to detect and count multiple faces?
Currently, using the rear camera for photos of people, especially groups, can be challenging. Adding audio guidance (like "Move left," "Two faces detected," or "Closer," "Further away") would make it much easier to frame the shot correctly and ensure everyone is in view before taking the picture.
This would be a game-changer for group photos and is a feature many users would appreciate.
Thanks again for all your hard work!

Agreed

Yes thanks so much for continuing to improve the app.
I agree that having this kind of feature for the rear camera too would be great. And in both cases, having it work for both single and multiple faces or subjects.
If you were to go down the road of spinning it off into a separate app, I wonder could you designate it as an app that can be launched with the camera control button?
Dave

Added another setting

The idea of a camera spinoff app with initial rough quick feedback as a guide is interesting and am looking into it. For now, I have added an option in settings to allow PiccyBot to automatically take a photo and describe it, while using the front facing camera when it finds a face is properly in focus. This is available in the latest update, it was a requested feature.

Updates

PiccyBot has been updated with the latest models this week: GPT 5.1, Gemini 3 Pro and Grok 4.1. Note that these are used for image descriptions only, for video descriptions PiccyBot still relies on earlier versions.

Working on integration with Meta glasses

Hi guys, with Meta now gradually releasing their SDK for their glasses, developers can now access the live feed from the glasses within third party apps.

This is the first test I have done with PiccyBot processing this feed. Next step should be processing it handsfree and do video descriptions..

https://www.youtube.com/watch?v=L-0U7bc3ucE

This is brilliant!

@Martijn you continue to be one of the first to bring these promissed excitements to the community! All the best for the good work!

super excited for this!

I'm super excited to get my hands on this!
Having alternatives to meta AI will be a welcome change for many of us.
I do have a suggestion for the mobile app.
Would it be possible for the guided selfie mode to have the option to start a countdown and automatically capture selfies?
This was one of my favorite features of selfieX before it died.

Sounds great

Thanks again for the update, Martijn. I love seeing all these new things appear in PiccyBot and can't wait to give this a go.

Next step in the Meta integration: handsfree

Gokul, Quinton, Mr Grieves, thanks a lot! I have taken it a step further by adding a voice trigger to process images from the Meta video stream. The API is limited and they promise more features by end of next month but let's see what we can cobble together already:

https://youtu.be/a1Ue8M6dWaM

It's definitely coming along

I'm looking forward to seeing this evolve, as more tools become available. :-)

Does this work right now?

For everyone I mean?

Hands-free

That sounds amazing. Can you just clarify what is going on?

I think you are opening up PiccyBot as normal. Is it then sitting there listening out for a voice command, which you can speak through the microphone in the glasses? And so at that point it takes a picture and does its thing?

So if I was going out and about, could I just leave PiccyBot running and then talk to either meta or PiccyBot as needed? Does PiccyBot need to be in the foreground? Does it matter if the phone is locked?

Anyway really excited by this. I love how this app always seem to be ahead of the pack with new features, and genuinely useful ones at that.

Clarification

Gokul, no this is not yet available, I am working on it. Expect an integrated release next month.

Mr Grieves, you open PiccyBot as usual, in settings you select that you want to link with Meta glasses. It will then start streaming the Meta output to the PiccyBot app.
The voice command is currently only picked up by the app, not from the glasses. Meta has indicated they will add this to the SDK in January.
Right now (development version), PiccyBot would need to be running in the foreground. And with a separate audio input. So you can start it and say both Hey Meta (picked up by glasses) or Capture (picked up by phone). But with the current version you have to constantly run it, so this would not be practical or good for your phone's and glasses battery life. Still lots of work to be done..

Description and a bug

I found a bug in the app: tapping in the subscription page the price of the different purchaase options with VoiceOver on doesn't activate the relative purchase option, I had to turn off VoiceOver and try tapping the right spot. Secondly a question: I sent a video and the app described it in a text description which the voice I chose read it sequentially without the audio of the video underneath it. Would it be possible to have the video and the audio playing at the same time and the audio is played and generating respecting the silences of the video, like it's done in a real audio description for films or TV shows? Meaning, it tells what happens when it really happens in the video.
Lastly, could we have more voice options for more languages? OpenAI voices such as Fable, Onix etc. are very good in English, but struggle with other languages like Italian, while ElevenLabs voices are much better.

Voiceover bug

Knut, thanks for pointing this out! I will release a fix for this either today or tomorrow.

Regarding the audio description, I will look at it. Earlier attempts to synchronise video and audio description didn't work out due to model costs and slow performance, but now with new AI models such as Gemini 3 available, I will check it again.

Regarding the AI voices: quite a few users actually don't use them at all, and just rely on their preferred VoiceOver voice. To use the Elevenlabs voices through API was very expensive last time I checked. I would have to raise the subscription costs of PiccyBot by quite a bit, which I fear would not be appreciated.

ask more

When I'm in ask more to ask follow up questions, the app hangs. I'm noticing this on the latest update and this seems to happen quite frequently across the different LLM engines.

re: the "ask more" bug

it appears to happen only with device running ios 26.2, hopefully this will help Martijn diagnose the problem. basically, when you tap "ask more", type in a message and tap send, or attach additional image, the app hangs, totally dead. if you then exit to the home screen and relaunch piccybot, it will crash and abort.

The 'ask more' issue

Michael, LaBoheme, thanks for pointing this out. It is indeed related to iOS 26.2, which has introduced a number of new restrictions. Working on a way around that, hope to have an update soon.

Fix and upcoming changes

Michael, LaBoheme, I have just released an update for PiccyBot that should fix the 'ask more' issue. Since it is not so easy to reproduce I hope you can confirm it all works now.
Focus at the moment is on a revamp of the interface, adding more features to the Home Screen. And still working on a practical live AI and meta glasses integration, hopefully the coming weeks.

ask more

Unfortunately, it is still freezing for me. I find that the first couple of questions it is fine with but as you start to ask more and more questions, the app will freeze up completely rendering it completely unuseable until restarted.

question

Hi,

Question: if I ever want to try the features with the meta glasses, can I use it for free? if yes, can I continue use it without paying a subscription in doing basic tasks? also, when can you use it hands free?

Ask more and Meta

Michael, thanks, will look into it further. Will take the fix along with the interface update.
JC, regarding the Meta glasses, it will be possible to enable it in settings, which are only available to pro users of PiccyBot.

another question

Hi,

Another question: do you guys have a lifetime subscription available so you can use forever?

new interface

Since Martijn is discussing interface changes, I’d like to share my thoughts.

It might be a good idea to replace the “ask more” button with an “attach additional image” button. As it stands, you can ask another question from the main screen, but you cannot attach more information—so why have the extra step? An “attach additional image” button would handle this directly. The existing buttons, like camera and photo attachment, start a new session; this change would streamline the process and help avoid the hanging issue we’re experiencing. The fact that few people have complained about the “ask more” problem likely indicates that not many are using that function anyway.

"Ask more" is definitely useful in current form

"Ask more" works differently to asking another question from the main screen. I use "Ask more" when the initial description is quite good, but I am curious on some aspects and/or details. Think of "Ask more" as sort of an interactive process, where we can refine the given information or enhance it if we wish. The next question we ask is dependent upon the previous answers, and with "Ask more" the model in use has access to the whole conversation and can and in fact does taylor its answersto the conversation context very well.
"likely indicates that not many are using that function anyway" - LaBoheme, I am not quite sure about this. Please take note that not everybody who uses Piccybot has an iOS 26 compatible device. Me myself for example am using an iPhone XS, which is not iOS 26 compatible. As the bug is iOS 26.2 specific folks like me simply don't experience the bug, and keep using this useful feature free of problems. Secondly it seems from the descriptions, that the bug depends on the conversation length, at least after the initial fix. In a lot of cases, 2-3 follow-up questions are more than enough to clear up all details we are interested in, especially if we are skilled at asking well-formed and clear questions. And that many questions won't neccessarily trigger the bug if I understand everything well. Thirdly in a lot of cases it is not neccessary to ask anything more after the initial description, because it satisfies all of our curiosity. And what we are curious about an image or video is a very very deeply subjective matter and differs greatly even on case-by-case basis. But if the initial description doesn't satisfy that curiosity for some reason, then "ask more" is a true life-saver and a very handy, valuable and practical tool.
Me personally find the "Ask more" interface very neat and practical in its current form, and I wouldn't vote for any change in that, but instead would wish Martin the best luck in uncovering and completely fixing this nasty iOS 26.2 specific "Ask more" chat bug.

New AI app for describing images and video: PiccyBot

Options

Comments