New AI app for describing images and video: PiccyBot

Issues

Hi Carter Wu, there was an issue with the speech last week while we were trying to upgrade the speech engine. For now, this has been rolled back, it should work as before again. Aiming to add the new voices with an update next week instead, after full testing.

Winter Roses, there has been an issue with PiccyBot getting stuck while trying to download private/copyrighted YouTube videos. It should have given an alert, but it didn't. In the next update (within a week), this will be handled differently. Private videos will be described directly, without downloading them. Note that you won't be able to do any audio mixing on those YouTube videos, but at least you'll get a description of them.

Thanks for the feedback!

Many thoughts

As a voice actor, I hate AI audio description. AI voices in general make me mad because not only do they just not usually sound very good, but they are a crutch that takes away from my vocation in the worst way.

As an audio editor, I utterly loathe how audio ducking is used with audio description. It is almost never ever justified in any way and takes away from music and sound as someone said. I will die on that hill speaking about this practice. Don't do it.

But as an avid gamer and lover of AI , I'm all for this. I'd rather have AI than no audio description. And if AI can describe in real time and be customized, that would be really cool. Right now I really want to have game cutscenes audio described. There still isn't a practical way to do this. I thought Google Gemini was on the verge, but not yet. ANd Pixie Bot is pretty interesting, even if I have to essentially watch the cutscene twice or three times. It too is not all that practical because I ahve to record the video with my phone, or upload it to YouTube first. I love some of the customization suggestions listed above. I Think AI has a lot of potential for audio description. I also love the idea of pausing a scene and getting more in-depth description. So much audio description is surface level at best. It gives an overview, but sometimes a bit more dept would be nice.

Other ideas, maybe?

OK so I have another couple of ideas—I don’t know if anybody else actually wants this personally, but I figured I’d just throw it out there in case it’s something that can be done. And I really hope this doesn’t come across as annoying or anything because I feel like I’m asking for a feature that maybe nobody else would use or even think about. But hey, maybe by putting it out there, something happens, right? So first off—I’m really excited for the new update. I’m currently on the one-month plan and I do have the money for the lifetime subscription, I’m holding off a little bit until some of the current issues get sorted. But yeah, as it stands now, I’m honestly very satisfied with what I’ve got. One thing I saw mentioned was that there are going to be some new voices—which I’m genuinely excited about! So here’s my slightly selfish ask: would it be possible to somehow sign into Amazon Kindle and use one of these voices to read Kindle books directly? I know some text-to-speech apps let you do this, and I’m not sure if it would clash with your current goals for the app or if it’s something that can even be implemented, but I figured it couldn’t hurt to ask.

Basically, I’d love a feature where I can sign into my Kindle account and have one of your voices read the books in the library. The voices are really good—I hope they keep getting better. Honestly, one of the best I’ve heard so far in the TTS space is Speechify. I don’t know where they get their voices, but the quality is top-tier. It's pretty expensive though. Seriously, $140 per year? I don't know if it's because it's promoted by MrBeast, Gwyneth Paltrow, and Snoop Dogg, but, yeah. Must be for the rich. They seem to be promoting as a product for individuals with dyslexia, but again, super expensive. I’d even love the option to create virtual voices. I’ve also seen features in other apps where you can turn text or books into podcasts using virtual voices—kind of like generating a little audio book from your reading material. That would be amazing. Again, not trying to step on any audiobook publishers’ toes here—that’s not my intention. Like, imagine clicking on your Kindle book (once linked), choosing one of the voices, and having it read to you right in the app—no downloads, just direct reading. And maybe with uploaded files or personal documents, there could be more freedom. That’d be next-level accessibility.

Anyway, I know this might be a long shot or not a priority, but thought I’d throw it out there. Thanks again for listening and for continuing to improve the app—it really does make a difference.

Youtube

Could anyone remind me how getting youtube videos described works? I tried sharing it from the youtube app to picciebot and it just says it isn't valid no matter what video I try using.

Latest update

Hi guys,

There is an update in the App Store available that improves YouTube video handling (This should fix your issue Icosa).
A new set of onboarding screens has been added to guide new users using the app.
In addition, the personality voices have received an update. They will be less extreme but still have extra intonation and style. Try them out if you are subscribed to the app.
Also, the latest Gemini Pro 2.5 I/O model has been added to the list. It's good, but slow, so keep that in mind.

Good luck!

Models in the update

Hi Martin,
First of all thank you for the update!
There are more changes in the models list than "just" adding Gemini Pro 2.5 I/O.
A model with an interesting name has also appeared there: "Native Blind Style". What's that exactly?
Furthermore there are two entries named "GPT4O mini" (next to each other in the list). And Deepseek Janus seems to be gone. Is this just a mistake and one of them is still Deepseek Janus actually? If I remember well Deepseek Janus used to be after GPT4O mini in the list, so that gave me the idea to ask this.
Thanks!

Models

Hi Lazlo, the 'Native Blind Style' is a model based upon GPT o4-mini that describes images with the perspective of someone who has been blind their entire life. The focus is on how objects feel, mention of colours is avoided, etc.

And GPT o4-mini is different than GPT 4o-mini. The latter is a small version of GPT4o while the former is a small version of O4. GPT o4-mini is actually the current latest model of OpenAI. Their naming is bizarre, they agree on that themselves. But it is what it is now.

DeepSeek Janus has been running on a local machine that I now use to develop a voice driven assistant agent for the blind. This will be a separate product from PiccyBot. DeepSeek will be back when I find a good place to host it..

Good luck with the update!

Models

Hi Lazlo, the 'Native Blind Style' is a model based upon GPT o4-mini that describes images with the perspective of someone who has been blind their entire life. The focus is on how objects feel, mention of colours is avoided, etc.

And GPT o4-mini is different than GPT 4o-mini. The latter is a small version of GPT4o while the former is a small version of O4. GPT o4-mini is actually the current latest model of OpenAI. Their naming is bizarre, they agree on that themselves. But it is what it is now.

DeepSeek Janus has been running on a local machine that I now use to develop a voice driven assistant agent for the blind. This will be a separate product from PiccyBot. DeepSeek will be back when I find a good place to host it..

Good luck with the update!

Thanks much!

Thank you for the info and the very quick response! Ah yes, the names of the two GPT variants sounded so similar to each other with the Hungarian voice of VoiceOver that I perceived them to be the same. Now I have just had a closer listen (and had them spelled out) and I see the difference.

Thanks

Very much appreciated. Is there any chance you can signpost which models are useable for video or does the app always use a preset model without the ability to change it?

Image and video

Hi Icosa,
Right now, Amazon Nova Lite, Amazon Nova Pro, Gemini Flash 2.5, Gemini Flash 2.0 Lite, Gemini Pro 2.5 I/O and Reka will give video descriptions. For any others not in this list, it will default to Gemini Flash 2.5. So if you set GPT O4-Mini it will use that for image descriptions, and Flash 2.5 for video. If you set Pro 2.5 I/O it will use that for both image and video.

Hope that helps!

Re: Update, personalities won't turn off

First I'll say that the voices do sound better with personalities on, but... Once I switch the personality on, it won't switch off even if the toggle says off- I have to physically restart the app after turning off personality for it go to away.
Also, how come the personalities are more censored than no personality? I had a piece of erotic art described, and when I tried to ask questions with personality on, it show the text on the screen but the voice just kept repeating "Sorry I can't describe that" over and over and over again...like it was stuck.

possible bug with voice personalities?

Hi! First of all, I have to say that I absolutely love that Piccy bot can describe youtube videos now. I know it's kind of old news at this point, but I only gave it a try and got it working yesterday.

Unfortunately, I think I may have stumbled across a bug. Normally, I have personality for voices turned off; at least in the past, the personalities didn't really add anything, at least not in my opinion. However, when I read that the personalities had been revised, I wanted to give them a try, most specifically for describing Youtube videos. Unfortunately, when I request description of a video and personalities are enabled, a text description appears, but nothing is ever spoken, even if I use the play button. However, when I turn personalities back off, youtube videos are being described just fine.

I'm using the Gemini 2.5 model; think a previous commenter may have been on to something when pointing out that models are not labeled accurately at the moment. I think I'm using gemini pro, but it says flash? It works well, so not a huge deal. My primary concern is the failure of descriptions to be spoken when personalities are enabled. If this is an issue that can't be fixed, it's not a deal breaker for me by any means. I enjoy the descriptions I get without personalities, so probably never would have even discovered this particular bug if I hadn't read that personalities had been given a makeover. In closing, I love Piccy bot and really appreciate all the hard work that has gone into it.
Thanks!

Personality problems

The thing is, I thought it was only me until I sat down to read these comments, but yeah. I only tried it once. I turned off the personality feature because it’s not really my thing—it’s kind of funny, I don’t like the overly descriptive tone. So when I heard the personality feature had been revamped, I figured, OK, let me try it again and see if the issue’s been fixed. I love the fact that it’s not overly exaggerated anymore and it works really well.

But here’s the problem—like the other comment mentioned, once you turn on the personality feature, the voice doesn’t work. The first time I tried it with personality on, I actually heard the voice when the result appeared on the screen. But now it doesn’t seem to work anymore. Once I turn off the personality feature, everything goes back to normal and the voice works fine. For some reason, as soon as the personality feature is turned on, you don’t hear anything—the voice stops speaking.

Personally, I like having the voice on, and I also keep the sound on while the video is being prepared for description so I know something is happening on the screen. When I don’t hear the sound anymore, I assume the video’s done and that’s when I go read it. Normally, I’d prefer to read the description with VoiceOver on my iPhone directly, but for something like this, where the description pops up on the screen, I like having the voice so I know when the result is ready. I can always pause the speaking and read the text that way, but it’s good to have that audio feedback.

I don’t know if this would be overkill or not, but for people who choose to have the voice on, I feel like it’s helpful. If someone prefers to turn the voice off and only read the text with VoiceOver, it would be nice to have an audio cue to let them know the video is done processing. I’d love this feature. I’m not planning to turn the voice off because I really like the voices right now, but if I ever did, I’d want some kind of audio cue—like a chime or a little tone to let me know the description’s ready, instead of relying on silence when the preparation sound stops. Maybe it could even be a vibration, or give users the option to choose sound, vibration, or both. I think that would be really useful and a nice addition. Plus, it would be helpful if you decided to turn off the preparation sound, along with the voice. When the video is done processing and the description is on the screen, you wouldn't have to rely on the dead silence to let you know that information

Oh, another thing I forgot to mention—does it work with the Magic Tap, the two-finger double tap to start and stop when it’s speaking? When reading the description. I’d love to be able to pause or stop the voice without having to search around the screen for the play or pause button. It would make it way easier and faster if the Magic Tap could control the speech while the description is being read out loud.

Thanks again

Thanks for the list of models available for video. It would still be nice to have some kind of information on this in the app, whether this is a label on the ones that support video or a second selection for video model. Really do love the video descriptions though, especially for videos of my nieces.

The startup guide is very good

Hello Martijn,
I think as an increasingly mature APP, it is very necessary to have a startup guide. I'm glad you implemented this feature. However, I noticed an issue: your APP supports over fifty languages, but at least on the startup guide page, many texts still only display in English. Taking the Chinese I use as an example, I found that some parts of the startup guide text can already be displayed in Chinese, but there are still some parts showing in English. Could you further localize and translate these texts? If you need help, at least for the Chinese part, I can offer assistance.

retry button

the retry button, i believe, is right below the play button? it is accessible by swipe but not by touch. can we update this so it is accessible by explore/touch? thanks.

No audio for long video descriptions

As Missy and Winter reported, currently the new audio descriptions with personality 'on' won't work for very long descriptions. The TTS model can't cope with these longer descriptions. Looking into a way around this. For now, the only thing you can do is to reduce the length of the description (set length to 40 or so). Then the audio description should work. If you just want to use voiceover instead, set voice to 'none' and length to 100.

Carter, I will look at the translation of the intro pages, thanks for pointing out Chinese is not working properly yet. Appreciate the offer for help, please contact me privately?

LaBoheme, thanks for noting the retry accessibility, should be no issue to improve that, it's on the list now.

Thanks again guys!

agree about magic tap and processing complete sounds

I really like both of these suggestions. Maybe have the completion sound be toggleable in settings, but it would be really nice, although, as the other commenter said, assuming the voice starts talking, we know when processing is complete. I keep on the sound as well, so I know when it's processing stuff, but a processing complete sound would be great. Magic tap to start and stop speech would also be an excellent quality of life improvement if making it an option is possible.

Voice mode

It would also be nice to have a voice mode where you could talk to the AI and get the description that way, and even ask follow-up questions. I know most AI programs that use voice also give you a text transcript of what you said and what the AI responded, to help keep track of the conversation. Since this is focused on descriptions and it’s more of a one-off chat instead of an ongoing conversation, I’m not sure if that’s fully necessary, but I do think it would still be really useful.

For reading purposes, it would be great if the transcript stayed on the screen, so if I’m speaking into the microphone, I’d love to have a text version of both what I said and what PiccyBot replied. If this can be implemented, I say definitely go for it! Of course, it should be optional, so people who prefer to type or use both could customize everything in the settings to fit their needs.

Prompt history and Chinese

Hi guys,

There is an update available that re-introduces the prompt history, an issue noted by privateai. The last ten prompts (in the main view) will be stored. You can remove them as needed.
Carter, the Chinese descriptions have been adjusted, can you check them?

Thanks for the feedback as always!

Prompt history and Chinese

Hi guys,

There is an update available that re-introduces the prompt history, an issue noted by privateai. The last ten prompts (in the main view) will be stored. You can remove them as needed.
Carter, the Chinese descriptions have been adjusted, can you check them?

Thanks for the feedback as always!

How To Play My Original Video

How can I play my original video? I had a video described from my photo album, and now when I play it, I get the description. I've closed the Photos app as well as PiccyBot, but when I open the Photos app and find the video, it still plays just the description. Is it possible to get my original video back?

Thank you for improving the Chinese translation

Hi Martijn,
I noticed that today's app update log mentioned improvements to Chinese translations. That's really great, and your efficiency is impressive. But to be honest, there are still many untranslated parts in this app, such as the Tips on the startup page, which haven't been translated at all. If you think this is an issue that needs improvement, I'd be happy to help.
By the way, how should I contact you?

Tips translated and prompt history optional

Carter, a new update is available that includes translated tips and a setting to turn the prompt history off altogether. You can contact me with a direct message in this forum or on piccybot.com.
Earle, let me look at an option to replay the original video in the same way as it is done for the original image. Thanks for the feedback as always!

Comparing pictures

Hey, quick question—is there any way you could set it up so we can share more than one image in a chat session? Sometimes I generate different versions of a picture using AI, and I want to be able to compare them side by side. It would be great to mix and match elements or just figure out which image turned out best.

I’m not asking for a lot, maybe the ability to upload two or three images in a single session. That way, it doesn’t get too overwhelming or cluttered, but it still gives enough room to compare the aspects properly. Also, I don’t think this would work with video, especially anything longer—that would probably get too messy. But for images? It’d be super helpful. Or maybe there’s a way to connect us in real-time to PiccyBot to receive the feedback on which image looks better? I don’t know if screen sharing or syncing like that is possible, but throwing the idea out there.

Raise in subscription costs

Hi guys,

Recently, some AI providers have raised their prices of their services. In particular Google. To ensure continuity of PiccyBot, I am afraid I have no choice but to raise the subscription costs. This will impact new lifetime subscriptions and yearly subscriptions from next month onwards.

This is for your information, hope you guys understand.

What will the new costs be?

Hi,
I absolutely love this app!
Are you able to reveal what the new costs will be? Also, I have something I would like to see changed. When you get a description of an image or video and save the item back to your camera roll, I believe the buttons under that menu could use more descriptive titles. For example, the button at the top could say, “save image and description. “The one under that could say, “save image only, without description. “Is this something you would be willing to implement?
As the labels are currently, it’s not immediately clear that the top button does in fact, save the image with the description. I was able to test it though and that’s exactly what it does.

New prices and lifetime subscription

Will there still be a one time purchase option and if we’ve already bought it will we have to repurchase when the price increases?

Re:New prices and lifetime

They did say in their post that this would affect the prices of new lifetime subscriptions going forward. This logically means that yes it will affect the one time cost and no it won't mean people who already paid for it will need to pay again.

Only affects new subscriptions

Icosa is correct. This will only affect new subscriptions from next month onwards. It's a heads up for those who don't have a subscription yet and would still like to take advantage of the original price.

@martijn

Is there any way that those of us who have bought the lifetime subscription can still support you? I'd love to donate to the app in some way.

Just bought lifetime

Hi,
I just bought the lifetime option. I want to support this developer and do it before the price goes up in June, I definitely appreciate his hard work.

Added donation option

Inforover, thanks for the offer of support. I have added a Paypal donation button to the piccybot.com website. Any help is appreciated!

Impressive progress, feature suggestion, and a Whatsapp question

Hi everyone,
I first tested PiccyBot when it was very new and back then didn’t find a unique use for it since I was already using other apps with similar functionality. Out of curiosity, I re-downloaded the app a few days ago and was really impressed by how much it has grown and improved since my first try. The added features, especially video descriptions and multiple AI models, make it a very powerful and handy tool.

I also recently discovered from the comments that the developer has created a WhatsApp service for PiccyBot, allowing users to add PiccyBot as a contact to send images and videos and receive descriptions as voice messages. This seems especially designed for Ray-Ban Meta Glasses users to get hands-free descriptions via voice commands. Since I have Meta glasses, I was very excited about this because the AI descriptions from the glasses themselves are much less detailed than what PiccyBot can provide.

I registered and added the contact, and sending images from my phone’s WhatsApp works perfectly—I get detailed descriptions back each time. However, when I send images directly from the glasses through WhatsApp, the pictures do send, but I never receive any description back. I’ve checked on my phone’s WhatsApp, and the images are definitely sent (my partner even confirmed they are not empty), but the replies never come. Sending images again from the phone works without issues. Is this a known problem or something I might be missing?

One feature I’d love to see in the app is the ability to add multiple images within the same chat thread for follow-up questions. For example, on BeMyEyes, you can take additional pictures during an ongoing conversation if the first image didn’t capture the needed details or angle. This would help keep the context intact and make follow-ups smoother without starting a new chat each time.

Thanks for all the hard work—it’s clear the developer listens closely to feedback and is dedicated to making this app better.

New Lifetime Price

Has there been any word on what the new lifetime price will be? Thanks in advance.

Setting up WhatsApp for description

How do I save PiccyBot as a contact on my phone? I don't have a glasses, but, yeah. It would be nice to be able to send images to WhatsApp, and have them described that way

Re: setting up WhatsApp

There’s a section on the PiccyBot website where you register with your WhatsApp email and phone number, and then PiccyBot will message you so you can save the contact.

I purchased the lifetime subscription. Now I have some questions

Hi there, I wanted to let you know that I’ve officially purchased the lifetime subscription to PiccyBot! First of all, thank you for creating such an amazing product. When I was younger, I used to feel kind of sad that I couldn’t enjoy music videos the same way my friends did—but now, this is a ghost of the past. What you’ve built here has genuinely made a difference. The image descriptions are well done. I’m looking forward to the new features you’re planning to release in the future. And yes—don’t worry—I’ll still be donating from time to time, even with the lifetime access. You deserve it. I sincerely wish you all the best in life. I hope you’re able to secure the funding and support you need to continue building and improving this project. I also wish you good health, a positive mindset, and the strength to keep going even when the odds are against you.
I’m incredibly happy with my purchase, and I’m so glad I managed to grab the lifetime deal before the price increase—not that I wouldn’t have paid the full price anyway. It’s nice to have gotten in early. So thank you again—for your kindness, your patience, and your willingness to answer my endless questions and engage with curiosity, not just as a developer, but as a genuinely thoughtful person.

Entering and viewing descriptions stored in meta data of photos

I'm trying to figure out how to save the descriptions generated by PiccyBot into the meta data of photos in my photo library. Once the description is saved in the meta data, how do I view the description?

Donation

I subscribed to the lifetime plan a good while back and it felt like a steal at the time, so I'm happy to donate.

However with PayPal it was forcing me to add my address and in particular phone number which I'd rather not provide.

I don't suppose you would consider adding an option to use Apple Pay would you? I don't know how much of a pain this is, but I'd be happy to donate if I don't have to hand out my phone number.

start a new conversation in piccyBot

How do I clear my current conversation and start a new one, without closing and re-opening the app?

Re: start a new conversation in piccyBot

Unfortunately it doesn't have such an option unless I'm missing something. A feature for @Martijn - Spar… to consider implementing.

is pixxyBot down?

ok. I tried several times to have images described. I am using the latest version of the app, but after describing two or three images, now it just says analyzing image, and stops. I even tried multiple AI models including gemini 2.5 pro. But be my eyes is working. Not sure what is wrong now.

CloudFlair is down.

If any of your moddles use that then that could be why they're not working.

It's affecting quite a few survices at the moment.

Google Cloud had issues

Brad already indicated Cloudfare. I was a quite extensive global google cloud issue that lasted over 7 hours. It should all be working again, but expect some slower performance still for a while.
Google cloud is still a bottleneck for PiccyBot, despite having a choice of models, as I am storing model data in Google Firebase. I'll look at making that more redundant. On the other hand, an outage like this is rare, fortunately.

Support for adding more images

In the latest update you can add additional images to a conversation in the chat interface. Really neat. This is a feature that has been requested here multiple times, so once again, thanks for listening and implemening our suggestions!
Another feature request that I think has been mentioned before, an advanced setting to customize the system prompt that is sent to the LLMs. As I understand it now the model selector changes both the model and sometimes the system prompt that is used (e.g. the "Native blind style" option). It would be nice if these were separate, so you could have different "styles" for all the LLMs and combine freely, preferably also with the option to add your own system prompt (I have one that works very well for me for multiple AIs). You could have like "quick description", "very detailed description", "native blind style", etc. as standards.

Prompt and llm?

Hello guys! Any prompt and an LLM that respect the video time? I haven't found any that can do this yet.

Copying images from clipboard

I think the idea of copying an image and pasting it directly into a text box on this app is an interesting feature. It’s not an angle I’d considered before. From my experience on the iPhone, I believe you can copy an image directly to your clipboard from the Camera Roll. When you select a photo, on the share sheet, you get two options: “Copy iCloud Link” and “Copy photo.” The “Copy photo” option, if I’m remembering correctly, copies the actual image to the clipboard, not just a link.

I’ve done this before, so I know it’s possible to copy an image this way. On Safari, I can use the share sheet to copy a link to an image, but not the image itself. On the Google Chrome app, though, there’s an option called “Copy Image” that lets you copy the actual picture. I’ve successfully copied images from Chrome and pasted them into the Notes app. The issue is, I’m not always sure what format the image is in when it’s copied—like, I’d see something like image PNG or GPG. Unfortunately, I don't know about the size of the image or how it's displayed on the screen. Other apps would not allow me to paste the image directly into their text box, even though I know it's copied to the clipboard.

I’m not certain how well this would work on here. If an app doesn't offer a direct option to copy and paste images, users might need to use Chrome to copy the images from these websites. And from what I know, there is no way to copy multiple photos from a post. You would have to focus on each picture directly, copy the image from the share sheet, and then share it to PiccyBot. I don’t think this would work through Safari or the Facebook app itself, as they don’t seem to offer a “Copy Image” option.

My usual workaround for describing photos on Facebook is to take a screenshot of the picture I want described. I don’t save it to my Camera Roll—instead, I immediately run the screenshot through PiccyBot to get a description and then delete the screenshot right away. If that doesn’t work for some reason, I’ll save the image to my Camera Roll, have it described, and then delete it afterward. This process works, but being able to copy and paste an image directly into the tex field would be nice, if you're able to figure out the logistics of how to make this feature work reliably.

Hello, first of all, I…

Hello, first of all, I apologize for my English, I don't master the language. I would like to mention that I am very surprised and delighted with the application. Something I would like to suggest is that the camera guide you when taking a picture. I don't know if this is possible with iPhone, but I use the app on Android.

New AI app for describing images and video: PiccyBot

Options

Comments