New AI app for describing images and video: PiccyBot

Alternative video models

I have added the new Amazon models Nova Lite and Nova Pro to PiccyBot. They don't seem to be as descriptive as some of the other models, but they may be able to describe videos that are rejected by others. Another possible alternative is Reka, which does follow a different description approach. Again, these models don't seem great at the moment, but we can count on them to improve the coming months.

adding my voice to requesting longer video and description time

have not tried out the new AI models yet, but definitely looking forward to it. Thank you so much for the constant update. I do want to add my voice to those who mentioned in the past needing longer processing videos and descriptions. As I mentioned before, I have begun using this app to generate descriptions for some of my documentaries. and I have come to a problem where most of my documentaries are between 6 to 10 minutes, right above the cut off point as it stands. So if that processing time can be extended to 10 minute rather than the five minute mark where it seems to be capped right now, it would be a lot more helpful for projects, such as what I am attempting. I understand that there is a cost associated with longer processing, but it may be worth looking into having a professional membership fee for that kind of stuff?

Censorship, models and a New Year's wish

In my experience for images Mistral Pixtral is the go-to model when I want to have an image described that has such content that all other models would reject. It is stated on their site that this model has virtually no censorship in place. Its descriptions are of course not on par with much much larger models, e.g. Gemini experimental 1206, but quite good nevertheless, and with follow-up questions it does the job very well indeed.
For videos I have no such experience, because I haven't dealt with such a video yet.
By the way, Martin, could you please list those models in Piccybot for us that have both image and video description abilities? It would be quite useful to know this. E.g. I highly doubt that if I set the model to Claude (any version) or Mistral Pixtral, then that would affecct video descriptions as these models process only images and not video, don't they?
Last, but not least, Martin you were among the persons that made 2024 a really special year for me. You know there was 17 years in my life when I had some vision: not too strong by far, but quite useable, that produced tons of memories and imagery that I still live on many many years later. And now Piccybot is among the pieces of technology that brings me much closer to that part of my life again when I had some eyesight. And all this without complicated, risky, invasive and very expensive surgery. I cannot thank your commitment in this field enough! So I wish you a very merry New Year from the bottom of my heart and let's walk further interesting roads in 2025!!!

Mixer problem

Hey guys, I activated the audio description mixer mode for videos, but when it processes or shares the video, it only plays the voice audio, not the video together. What should I do? I have a Galaxy S23. Sorry for posting in the iPhone thread, but since the discussions are here, I thought I'd comment. Once again, I apologize if this is the wrong place.

prompt customization

This app is cool, but I would like to be able to modify the initial prompt to say something other than "what is in this video".
Additionally, I'd like to be able to stop it from mentioning what is said in the audio if possible and maybe allow for haptic feedback?
Otherwise, this app is quite intriguing and I do appreciate your engagement with the community.

It completely changed a story!

So I wanted to try this app. I'm an avid gamer, so I had it describe the opening movie of a game I was playing. It was awkward to record the movie and then listen to the descriptions after the fact, but it did a decent job. Inserted its own commentary which was ... interesting and kind of weird, but It was helpful.

Then I went into the in-game menu to the "story so far" summary page, just to see if it could read me the text of the summary. It's a long multi-paragraph document you can scroll. So I recorded a video of me slowly scrolling the text until the end. I wish I could share a comparrison of what it actually said with what pixie came up with, because it was hilarious, creative and so very, VERY inaccurate. I'm completely flabberghastedhow it came up with what it did. If anyone wants, I could actually try it again and show the comparrison.

Video feedback

Laszio, thanks so much for including me in the people that made 2024 special for you. It really means so much to me. I have been creating software all my life but I never received the type of feedback people like yourself are giving me. As for the video capable models, they are Nova Lite, Nova Pro, Gemini Flash and Reka currently. This will change the coming weeks I am sure though.
Diego, thanks for using the Android version. When you have subscribed, PiccyBot will include the description audio track when sharing the video in the main view. The audio mixing is not done yet on Android however, hope to include that in an update soon.
Quinton, you can interrupt the processing of the video and change the prompt for it. But you are right, initially it defaults to 'what is in this video' despite having written something in the field before. I will adjust that.
Remy, sorry for the hallucinating model. If you suspect something is amiss, try different models or take multiple shots, it may have been not clear enough and the model will proceed, it won't admit that it is speculating.

Hope you all have a fabulous New Year and looking forward to improving PiccyBot further in 2025!

how long can a video be?

What's the limit on length of a video for PiccyBot to describe? I would love for it to describe my wedding, but it only worked with the first 5 minutes or so.

I believe it's five minutes.

Pretty sure that is the current limit.

Images and videos problem

Guys, I've been trying to use the app to recognize photos and videos using various models for 2 days now, but none of them work. Is anyone else having this problem? What can I do?

Working fine here

If it helps, these are my settings:
Nova voice, with personality enabled.
Speech rate is at 120%.
Using AI Model, Google Gemini Flash 2_0.
Length equals 60.
Video quality equals medium.

bugs, observations and questions

So, I've had much more time to play with this app and love it.
I'd like to mention what I've noticed, as well as ask a couple of questions.
How is it able to understand audio nuances, like voice genders?
It's been correct every time I've used it.
Would it be possible to have the app send a notification once video processing completes, rather than needing to keep the app in the foreground?
Elevenlabs gen fm has this feature.
Selecting the "none option in the voices menu does not appear to work.

Could we have an option in settings to change the prompt from always saying "what is in this video?"
I know that had been previously discussed but didn't know if that was something which would actually be coming in the future.
I must reiterate, this app is fantastic.
It's been great for understanding videos, as well as extracting text from them.
Thank you for doing what you're doing.
I, and many others really do appreciate the effort you've put into this.

edit: none option seemed to work this time

I'm not sure what happened, but almost every time I tried using the none option it wouldn't select, but of course, it works now I've mentioned it on here lol.

mixing audio: why does it fail?

i've been experimenting a lot this last month with different videos and mixing audios with description. I used to believe that when it fails to mix, it means the generated description is too long, or longer than the video. To figure out if that is the right assumption, I started requesting for total work count in the end of each description, and I am starting to think that's a length of the description have nothing to do with why it fails to mix. Quite often, a description that is 650 words long, mixed fine the last minute, and the next time, a description of the equal word count fails. what have you guys discovered, anyone can shine a light on why does it fail to mix so often? Also, I have noticed that some AI will produce descriptions according to the work count you specify and give you an accurate work count while others will totally discard word count altogether or act like they're unable to count how many words were used.
also, has anyone successfully come up with a prompt that will generate description to a specified links, for example, give me a description that is four minute and 30 seconds long? I find that by tailoring word Count, or using prompts like, describe each scene of this video using 80 words, I am able to somewhat tailor the length of the description, but the result is very inconsistent.

DeepSeek

Today, just out of a feeling brought by my intuition, I peeked into the model list in Piccybot. And my intuition didn't fail me, as there was a surprise there: a new name on the list! A name the world is learning hyperfast these days, after they made quite a big noise with their announcement timed just before the Chinese lunar new year of course. It's deepseek itself!
Martin, do you happen to know which model this is: DeepSeek v3, or R1 maybe, or some other model of theirs - they have lots on e.g. Huggingface. Can this processs images only, or video too?
By the way Martin and anyone who reads this and just feels like that: have a happy and lucky Chinese lunar new year!!! I am in style with this wish as I am writing this post with ZDSR, a Chinese development in the field of screen readers. ZDSR is my daily driver since last Christmas after almost a 15-year "marriage" with NVDA, and I simply love it!

@Laszlo, I sent you some emails.

I'd like to know more about the screen reader, thanks.

I love the speed of the zhengdu screen reader but...

If anyone else wants to try it, you won't be able to navigate using headings, buttons, or any of those features.

Like I said, I love the speed of this thing,, it's so smooth, but if i can't use navigation keys in the public version then I'd not want to buy the more enhanced one.

Sorry for being off topic but I just thought I'd let other blind people know.

Off-topic: Zhengdu web navigation

Hi Brad,

You CAN now use those navigation features in the public welfare version with Chromium-based browsers (e.g. Chrome, Edge etc.). So this restriction was partly lifted.
For a heap of further information, please check your e-mail and you will find my detailed reply to all your questions. I did my best to answer them.

last off topic.

Thanks, I will do so.

DeepSeek

Laszlo, thanks for noticing the DeepSeek addition. It's the 7B model that I installed locally on one of my own servers. So not very powerful, as this server is not the best. It is more a proof of concept. One of the good things about it is that I have full control over it. I love open source and DeepSeek was clearly built on top of Meta's Llama, with a lot of smart optimisation steps.
The version I am running for PiccyBot only describes images, for video it will default to Gemini at the moment.

Now the stage is set.. With these kind of open source models available, it shouldn't be too expensive to train a model specifically tailored for blind and low vision use.

Another point is the censorship. At the moment the model will still walk the Chinese government rules and limit output that way. I am sure there will soon be models that will strip these restrictions. The current local model may be less censored as far as sexuality and such, still have to check that.

I have also updated PiccyBot, it should be more stable now, earlier it could get 'stuck' after many requests. It also includes a push notification to tell you when the processing of a video is finished. And you can minimise PiccyBot now while it is processing. It will play the description in the background even when you are continuing with another app.

Another development is the PiccyBot WhatsApp service. Particularly useful for Meta Rayban users who are banned from the 'look and tell' function. Sending a video or image to PiccyBot on WhatsApp will result in an audio description. Bit slow and somewhat clunky but at least it will enable handsfree video descriptions while wearing the glasses.

Good luck with the app guys, let me know how things work for you?

WhatsApp service

This sounds great - is it available now? If so, how do I use it?

Please a Mac Version

Please make it available on the MacOS. We need an app like this.

A model for VI

@martijn Exactly that's what I am excited about! even with deepseek, everything is open source and available out there isn't it? Speaking of which, what about Llama?

re: New update

thanks for the new update! I have tried it and can confirm that the audio will continue to play even when you lock your phone or go to another app. However, if I lock my phone or minimize the app and go to another app while processing, it seems to stop processing because when I come back to the screen, all it shows me is retry and no description was generated.
incidentally, I don't know. Have anyone requested this feature yet, but it would be nice to have a setting where we can set the app to auto retry when description fails or fail to mix audio, etc., waiting for 4:5 minute and then only to come back and have to manually hit retry again and again gets a little tedious. especially now, if the goal is to allow us to have it processing in the background, it makes sense if it would auto retry when fail. Maybe not indefinitely? Maybe auto retry five times or something and then sent a notification that says it has failed five times, please check the video, or something like that?

was there an AI censorship cracked down or something? ROFL

as of yesterday, both llama and Mistral are acting very frigid and refuse to describe or answer anything even remotely sensitive. They act like they're stuck on GPT four. Anyone else experiencing this?

No more gemini experimental?

I have just opened Piccybot, went into settings and discovered that AI model was somehow set to Mistral Pixtral. I have used Gemini Experimental 1206 since it appeared in the list. I browsed through the available models and saw that there was no such model listed any more and probably that's why the selection changed to Mistral Pixtral. Does the disappearance of Gemini experimental seem permanent, because there is no more experimental version of Gemini or for other reasons? Which is the preferred model as of now then thatgenerates the most close descriptions to the superb ones Gemini experimental used to? Google Pro 2.0 maybe?

Google experimental replacement

Yes, I would say that it is 2.0 Pro that is the replacement for the experimental model.

Whatsapp integration and various updates

Hi guys,

There have been some model updates as you noticed, with Google Gemini 2.0 replacing some of the earlier models. The experimental one has been replaced by Gemini 2.0 Pro as already indicated by blindpk.

The main update is the Whatsapp support, specifically handy for users of Meta Glasses, who have been locked out of the 'look and tell' feature unless they are in the US.

I think any method to access other ways to describe images and videos handsfree is useful. I am using Whatsapp to connect PiccyBot to the glasses, and even though this may be a clumsy method, it does work.

Use PiccyBot on your Meta Glasses via Whatsapp
Steps to follow :

1. Register the PiccyBot WhatsApp Service: https://piccybot.com/register
2. After registration, you can use the PiccyBot WhatsApp Service to describe images and videos. Save the contact ‘PiccyBot’ to your Whatsapp contacts.
3. In the Meta View app, go to Settings, then "Communication," and connect via WhatsApp
4. Your device is now ready for hands-free usage
5. Put on your glasses, then use the following prompts to send a message to PiccyBot via WhatsApp
For photos:
* "Hey Meta, take a photo and send it to PiccyBot."
* "Hey Meta, take a photo and WhatsApp it to PiccyBot."
* "Hey Meta, take a photo." After the photo is taken, you’ll hear a click sound. Then say, "Hey Meta, send the latest photo to PiccyBot."
For videos:
* First, say "Hey Meta, take a video," and the glasses will start recording. To stop recording, say, "Hey Meta, stop." After it stops, say, "Hey Meta, send the latest video to PiccyBot."
* Note that your Meta Glasses will capture a video of no more than 15 seconds.
* After sending the media, your glasses will ask, "Send photo/video to PiccyBot?" Respond with a confirming statement, like "Yes." Then it will reply, "Sending photo/video."

The message (image or video) is then sent to the PiccyBot account on WhatsApp.
Once the media is sent, you will receive an audio message saying, "PiccyBot is processing your image/video, please wait."
After processing, an audio description will play. To play the audio hands-free, ensure WhatsApp on your phone isn’t open. Your Meta Glasses will receive a notification about the audio message and say, "Voice message from PiccyBot." To listen, say, "Play the voice message," and the audio will play on your glasses.

For a video description of the process, please check this video by Dave Taylor-Page, with whom I have been working to get this done: https://www.youtube.com/watch?v=2KBH3y64rHk

Good luck with PiccyBot, let me know what you think?

Problem registering with WhatsApp

I read your instructions above and immediately registered for the WhatsApp service as I'm very excited by the prospect of using this on my Meta Ray-bans.

However, I couldn't figure out how to get Piccybot into my WhatsApp contacts. Then I watched the video and it sounds like I should have registered on my phone, not my Mac. I believe at the end of registration it should have given me the option to add the contact.

I tried again on my phone and was told that the number was already taken. Is there another way to add the contact?

this is brilliant!

Going to try this! Even if you have access to meta's look and tell feature, this'll still be worthwhile to have since 1: it can describe videos and 2: the image descriptions will be superior.

Fanominal

Hello I just added the Piccy Bot service to my What's app on my phone. This is great and Piccy Bot is superior; the voice is great on the what's app version too :)

I think Martijn just solved…

I think Martijn just solved Meta ... 🤯

WhatsApp registration problem

I have registered using the link in the instructions above. Got a registration successful message on web, but no contact to add to my WhatsApp.
Tried again and got message that number is already registered.
So how can I make the connection now?
Like many, keen to try this out asap as have really missed look and see on Meta Raybans since vpn option was throttled.

Registration Problem

I'm also experiencing a problem with registering. I registered using the link above. I used my iPhone to register. After hitting submit, I received a registration success message. A PiccyBot contact never opened, so I was unable to save it to my contacts. I tried to reregister, but received a message saying my number was already registered. How can this be fixed?

Mrs B, Earle, Mr Grieves,…

Mrs B, Earle, Mr Grieves, please add the PiccyBot contact separately if you missed it during setup somehow. This is the PiccyBot Whatsapp business account: https://api.whatsapp.com/send/?phone=917736089657

I will be enhancing this further. Translating the messages for non-English users as a first step. Then adding the option to ask follow up questions (like in the regular PiccyBot app).

Let me know how it works for you?

Added The Contact

Thank you. I added the contact. Now I just have to test it.

Maybe

Maybe change the number already registered message to include instructions for if someone has this issue?

Also just in case anyone gets confused when discussing technical matters in future Meta blocked VPN use they didn't throttle it, throttling means limiting the speed of something. Example, a person who runs their internet connection at 100% 24/7 may find their internet connection being throttled.

@Icosa

Yes, a poor choice of words on my part. Meta killed the VPN option.

@Martjin

Thanks for sharing that contact link. It works fine.
This does seem like a useful workaround, although there’s obviously an issue around the speed of response.
Appreciate many use cases and you can’t tailor this to everyone, but I think I would prefer the WhatsApp bot to provide much shorter answers. Perhaps that might also help with response time?

If there were scope for anyone subscribing to the app to have an element of control over the type of responses that come through WhatsApp that might be one solution…

Thank you so much for the work you have done and continue to do in this visual interpretation space. It’s coming on leaps and bounds and just makes me more excited to see what’s around the corner…

Meta Ray-bans not understanding

I added the contact and called it "the pixies" (sorry).

So first time, I asked the glasses to send a photo to the pixies. I think it asked if I wanted it to send it to someone else in my contacts via WhatsApp. I said no. But I think what it did was send it to that person using messages instead. Fortunately it was only a test picture of my dog but I did get a confused reply.

I tried again, saying "send photo to the pixies". It asked me to confirm if I wanted to send it to the pixies, so I said yes. It sent it through messages, but it did reach the right place as I could see it getting rejected.

So I did the same and added "on whatsapp" to the end. It then asked me if I wanted to send it to some random WhatsApp group which had a name that couldn't have sounded less like the pixies if it tried.

So I went into WhatsApp and sent a test message to that contact so it now appears in my Chats list. I then repeated the command to send via WhatsApp but it is absolutely determined to send it to this other random group instead. Fortunately when I say "no" it is not sending it anyway.

I'll keep messing about when I have a little more time - maybe I need to delete it all and start again, or try a more sensible contact name. I thought "PiccyBot" may be tricky for the glasses to understand which is partly why I used the pixies instead. That and because I am an imbecile of course.

How is everyone else finding this? When I get it working I think it is going to be amazing.

Working Fine For Me

Now that I have everything set correctly, it is working fine for me. I named the contact PiccyBot. I have to pronounce it as PeekyBot to get it to work. Now that I've started doing that, it will work every time.

sharing screenshots and pictures?

Hello,
I just wanted to say how much I love this app. I love the current functions and how it constantly expands so I'm excited for the future.
My question is is there a way to share screenshots or pictures from apps like reddit, dystopia and so on? every time I try it just blanks out on me. One of the ways I've found to use AI is to read comics through marvel unlimited. Previously this wasn't doable so I'd love an easier way to do this as currently I'm sending screenshots through be my eyes and there doesn't seem to be a shortcut to do this so it's pretty slow.

Sholdn't we have this in the first post?

@Martijn, would be useful for late-comers if you edit the first post to add this functionality.

Shouldn't we have this in the first post?

@Martijn, would be useful for late-comers if you edit the first post to add this functionality.

Out of curiosity

which service and model does the WhatsApp integration use for description? I asked because I noticed a slight degradation in the quality of the descriptions generated.

I'd love to see this system expand

I've tried this on a few videos from my iPhone, and, despite its personal commentary making everything sound like it's the best thing ever, it worked really well. I used it to describe a cutscene on a videogame, taken with my phone's camera from my monitor. The only problem is I first have to play the video, then have it analyzed. No big deal, I can wait. But what I'd love is a way to do this on my PC directly, either by having it share the screen with me and describe the video from start to end, OR even if I record the video myself first, then load it into the app to have it described later. Using the phone as a middle man so to speak is a bit awkward. I tried sharing the video to dropbox, then trying to share that video with the bot, but unless I take the video myself or specifically have it on my camera roll, it doesn't seem to work that way either. So what I'm saying is a few more sharing options, or having it on a PC would be extremely welcome. Overall though it's a fascinating bit of technology. Haha, except once when I had it read some scrolling paragraphs of text summmarizing part of a story, and it completely made up something completely different, both in its summary, and when I asked it to read out the full text. That was ... weird.

@Remy

If you subscribe to the app, there is an option to turn off the personality of the voice. This makes the descriptions far less opinionated. You may already be doing that, but I thought I would mention it. I found the personality mode amusing for a short space of time, and my wife loved being described as some sort of celebrity, but I find the app is so much better with it turned off.

Service used

Gokul, the PiccyBot Whatsapp service uses GPT4o for images and Gemini 2.0 Flash Lite for videos. These are not the current best models, but they are generally superior to Meta AI. Using the PiccyBot app will give you best results though.

DeepSeek

Hello guys! I have a suggestion, although I don't know if it would be very useful since I haven't tested this llm: Since DeepSeek is open source, would it be possible to download it so that we can process images or videos on the device itself, having more speed and privacy?

New AI app for describing images and video: PiccyBot

Options

Comments