New AI app for describing images and video: PiccyBot

Realtime voice?

Does it mean what I think it means? No right?

Hi all,
unfortunately I haven't got a short enough video to try this out on, but I checked out this demo on YouTube. For those interested, it's about 10 minutes in. It's a really cool feature.
https://www.youtube.com/watch?v=AGGKaw6V7Y8

2.6 update

After installing the latest update that - among others - aims at improving video processing stability, I once more get those much more detailed, much more accurate and much more useful video descriptions that I had got before the "server error crisis" of 29 October. Thank you much for the fix, I highly praise it!

Could you look into how seeing AI does their descriptions?

They manage to play a bit, describe it, then play the next bit. Honestly if you could do this, or have it as a toggle, you'd have them beat in my oppinion, as you can already describe short youtube videos.

How seeing AI Does it

Could this be a toggle if you do look into doing this? I prefer the way PiccyBot does it rather than the way Seeing AI does it.
Thanks :)

Seeing AI vs the Pixies

I am repeating myself a little from the Seeing AI thread, but the way I see it is that PiccyBot and Seeing AI are providing an entirely different perspective on a video and I really appreciate having both options.

The pixies describe a video like someone reporting back on what happened. It goes into more detail and paints a pretty vivid picture of what is happening.

Seeing AI on the other hand feels like I am watching the video myself but is giving me a lot less detail.

I honestly like having both options available because they both serve very different purposes. You couldn't get the level of details the pixies give you if you were to use the Seeing AI way.

Having said that, a lot of people are basically for asking for Audio Description of the videos so there is clearly an interest in users of PiccyBot. But I'd hate to sacrifice the level of detail I get to achieve it.

I personally am happy switching between two different apps for this as I usually use the share sheet to get videos into PiccyBot and Seeing AI anyway. But if something like this does come to PiccyBot I too would like it to be optional. Actually if it was a toggle in the UI that appeared in the main interface when I was watching a video that would be even better so I cold quickly listen to one format or the other.

I think I'll take back what I said.

I keep thinking of audio description then realising that we're not there yet.

I don't use this app so will let those that do right more about it and will stop. I don't need this app so shouldn't ask for things if i'm not going to use it.

I've tried the video feature and while it's not for me, I can see that a lot of work went into it, perhaps one day in the future we'll have an app, perhaps this one, who knows? That can be trained on Audio Description, let's see what the future holds for us :)

New model and progress on video

I have added a new AI image description model to my app PiccyBot. It's called 'Gemini experimental 114'. Some comparisons indicate that this might be the best image description model yet.
If you are a PiccyBot Pro user, please try this model for image description and compare it with 'regular' models like GPT4o, Claude Sonnet or Mistral Pixtral. I wonder what you guys think. If it is indeed the best model at the moment I will replace the default OpenAI model in PiccyBot with this one, for free users as well.

I am also working on enhancing the video descriptions. A lot of you have been asking for descriptions of long videos. I can't do very long videos as that would just become too slow and expensive for me, but I am working on a compromise where PiccyBot describes videos upto 5 minutes completely, and if it is longer the app will look for any available transcription of the video and summarise it with that.

Still working on realtime speech and interacting with live video. That one is tough though, and OpenAI is only gradually reducing the API pricing for it. Progressing, but will be another few weeks I feel.

As always, really appreciate the feedback guys! Thanks!

Hi Martijn

I am currently not a pro user. However, if you do enable this application to audio describe up to 5 minute videos, I will absolutely consider it. Because then I could use it to audio describe videos I take with my Meta smart glasses, which currently max out at 3 minutes.

Gemini Experimental didn't work for me.

Hi,
I tried the new model, but it just said processing, and then initializing, but never did describe the image. I switched back to GPT 4.0, and it worked as expected.
I do have pro.

Hope this helps.

The new model

I tried Gemini Experimental and I have to say this, it's brilliant! of all the models so far, it's given me the most detailed, factual descriptions, describing almost all the elements that can be described. I only tried with pictures taken in professional settings, so can't say if it can bring into focus the more abstract, emotive elements the way gpt4 can... Will try it some more. But it looks interesting, especially if what you want is detailed, vivid descriptions the way I want.

I only created this account…

I only created this account because I was dismayed at some of the responses from individuals. Often times as blind individuals. The first thing we say is that we don’t like something or there’s something else that could be made. Yet innovation and different ideas is what this world is all about. That’s why we’ve moved forward as a community. as an individual that does accessibility testing I am thankful and grateful that there are different apps that have different functionality based on someone’s beautiful creative mind. Yes, there is be my AI that is a part of Be My Eyes, Seeing AI and too many more to name yet what I find amazing as a person that supports hundreds of blind content, creators is the fact that this gives funny descriptions for both video and audio, and we must remember that everybody does not have equitable access to specific elements of technology advancements or someone to teach them about these things. essentially, I’m learning that more people in this community need to learn how to deliver a response in a better way Lastly, if everyone just stopped at McDonald’s and believ A burger joint because there was only McDonald’s and there wouldn’t be Burger King, Wendy’s, Nations, red Robin ECT. There’s room for every technology advancement that is going to help blind people thrive. absolutely love the app 10 out of 10 recommend so many people are using it on TikTok in the blind community on the application. Thank you for your work. I love how it has sensitivity to detail such as how it describes African-American people, the scenery and so forth. This has been the first app that I’ve had easy access to that can describe things for me as I’m on my content, creator journey. thank you for so much care to detail and the price point was amazing! I wish you the best of luck in your endeavor.

Stuck on please wait

Hey Martijn. I can't seem to get videos described, it seems as though it's just stuck on please wait. This is for both videos on my phone, and when sharing from social media sites.

New update available

Hi guys,

I released a new update today, which increases the duration of the videos PiccyBot can describe to 5 minutes. This is for the pro version. Hope this helps!

Inforover, I am not sure what is the issue, might have been a temporary glitch. If not, please restart phone and/or enable/disable any VPN you might have running?

Gokul, Louise, thanks for trying out the new Gemini model. It may have been down due to high popularity initially. It seems fine now, although it still may be slow.

BrassKnuckleBeauty, really appreciate your feedback, very encouraging. Thanks!

I can't choose an artificial intelligence model

I can't select an AI model in the app's settings. When I double click on the relevant button, instead of opening the model list, the Select AI model button is selected and tapping again deselects the button.

Gemini

Yeah, the new gemini model is a bit slow, but it does give incredible descriptions.
I really wish you'd at some point consider adding an option to have multiple images and a comparetive description, especially now with the new gemini model in, that'd be incredible.

photos taken not following orientation of the device

so if there is a table, and there is a vase at the upper right corner of the table. now the user holds her phone landscape and take a perfect picture, the vase would now at the lower right corner of the table. didn't we fix this before?

also, reka has become very difficult to get through

reka is excellent for certain kind of tasks.

LaBoheme - Orientation

Regarding the auto orientation, it should work. Could you have turned on the orientation lock on your iPhone?
Reka should work, but it is not a huge company and the model could be facing slower responses now and then. Still, it is good to have smaller players remain there as an option.

auto orientation seems to work now,

i don't know what happened. wouldn't it be better if the camera function invokes the native ios camera when taking photos? it seems a better interface for voiceover users, one can adjust zoom, exposure time, focus lock, etc. i don't know if one can do this from the current interface, but it's certainly not doable for vo users.

Variety of models

The availability of multiple models is great and I found the price reasonable, thanks. If one model is less good or refuses to describe something it's handy being able to switch to another, and the refusal can absolutely be a legitimate issue when shopping for clothing whether for yourself or a friend/partner. GPT4O is very quick to say lets talk about something else.

Native camera

LaBoheme: You are right in that the native iOS camera offers great features. But the use of custom camera helps to keep the app’s workflow simple and efficient. It lets you take photos instantly with the volume button, avoids the extra "retake/use" screen, and ensures front-camera images aren’t flipped by default. But I'll keep it in mind for updates, as the new iOS versions might further enhance the default camera.

Audio description merged with video when sharing

Hi guys,

I released an update today that adds the audio description to the video when sharing it, using the share button on the home screen. For subscribed users only. This was a feature requested often, and I feel it will be quite helpful.

I am looking at adding a live video mode as well, either with OpenAI or Gemini or both. But have to figure out if this is feasible, OpenAI's live speech mode was horribly expensive as a third party developer. But now that they already have competition it may be more economical from the start.

I am also working at running a Whatsapp service to describe videos, so PiccyBot can be indirectly used with Meta Raybans and possibly other smartglasses.

Exciting times indeed!

Absolutely!

Yes, it's more than worth it; and you got almost all the models you can think of as far as visual processing is concerned in there. And if you find a model that isn't there and which is really good, the dev is been really, really responsive so far.

Gaming with Meta Ray-Bans and PiccyBot!

So I finally gave this app a whirl, used my Meta Smart glasses to record myself playing a round of Mortal Kombat 11 on my PC, and then used PiccyBot to describe the video. It was awesome!
I am using the monthly subscription, but will absolutely be purchasing the lifetime access. This is a marvelous application!
Also, and I hope I do not get in trouble for this, but I may or may not have voted for PiccyBot for the Golden Apples of 2024. 😇

PS I went with Gemini Pro for my AI engine, not sure if that one is any better or any worse than the others, but it is what I went with.

Yes it is worth it

The life-time subscription is a bargain. So yes it's definitely worth it, if only to be able to disable personality mode. The dev deserves all the credit he gets for his dedication to the app.

I'm also really excited by the thought of being able to use this via WhatsApp on the Meta Ray-bans.

Video recording bug and a suggestion

Sometimes it happens that the "Record video" button is greyed out in the video recording interface. I tried to find any system in when this takes place, but to no avail, this seems entirely random. I first noticed this with 2.6, but waited in order to see whether it would go away. But no, today I updated to 2.8 and still saw this multiple times.
When this happens, the "record video" button stays greyed out and is stuck in this state regardless how many times I press "cancel" and so return to the main screen and retry recording video. The only really reliable method to bring it back to life is to close Piccybot from the app switcher and to start it again.
Furthermore I suggest that there should be a "retry" button in the main screen near to the description area. Sometimes the "server is overloaded, please try again in some time" error message appears instead of the description, and then there is no way to resend the video or image I have just taken to the server. It is only possible to take another video/image and to try with that, but the interesting moment captured beforehand would be lost this way. For images it also happens sometimes tthat no error is displayed, but simply no description is presented. The "retry" facility would be an immense help in all these scenarios.
Now Gemini experimental 1206 seems quite stable, that is the image doesn't get rerouted to some other "inferiour" model due to overloading, which was quite often the case 2-3 weeks ago. So now I especially like this model, as it provides all the details I am interested in: people, shapes, actions, spatial positioning of each content element, colours, lighting, atmosphere etc. And all this in vivid detail, but in a balanced and not "overdone" way and what's more practically hallucination-freely.
What I particularly like about Piccybot is that it is extremely light on battery. 3 weeks ago I watched a famous soccer match with the help of Piccybot and I took about half-an-hour of video altogether in several pieces of course to get them described. During all this it consumed only about 15 % of battery charge. This is very impressive! So keep up the good work!

Retry button and models

I second Laszlo's idea of a "retry" button. It doesn't happen often for me that the request doesn't go through, but when it does such a button would be great.
As I've said before, one of the coolest things with PiccyBot is the amount of models to choose from, both the "pro" and "fast" ones. There are some models that I'd like to try out that is not in the app now (or maybe they are, but not under those names):
OpenAI: Chatgpt-4o-latest (2024-11-20) (I mentioned this before, but there was some bug with it if I recall correctly). We also have the o1 model getting image support on the horizon, but that might be too expensive to be practical.
Meta: In the app there is a Llama 3, but there has been a Llama 3.2 Vision released (maybe 3.3 too but I'm not sure if this has vision support).
Anthropic: There is a Claude 3.5 Haiku model out now that maybe could (or already has) replace the Claude 3 Haiku model already in the app.
Mixtral: Pixtral-large-2411 (might also already be in the app under the "Mixtral pixtral" name)

Sharing Tiktoks To PiccyBot

Like the subject says, I'm wanting to share TikToks to PiccyBot. I thought I read somewhere in this thread that I could choose share while viewing a TikTok and share it directly to PiccyBot. This isn't working for me. I see messages, WhatsApp, and other options, but not PiccyBot. I don't even see an option to see a list of additional apps. Saving the TikTok to my photos works, and I can share to PiccyBot from there, but I thought you could share directly to PiccyBot from TikTok. What am I doing wrong, or am I just misunderstanding how this is supposed to work? Any help is appreciated. I'm loving PiccyBot and it's worth every penny.

I'd like to know this as well

Never even thought of describing TikTok vids, but if we can then cool. 🙂

Describing TikToks

Eerie, Brian, you can share the TikTok video to PiccyBot and it should describe it. PiccyBot is usually a bit hidden in the share sheet under 'more', but it is there.

Describing TikToks

Note that TikTok sharing is not 100% stable for sure, they seem to change the format on a regular basis. But most of the time it works.

New update: Retry added, plus audio mixer for share

Laszlo, Blindpk, thanks for the suggestion on the retry button. I added it with the latest update and I have to say it helped me as well, since sometimes for whatever network or model reason, the result doesn't come the first time.
I have also added an audio mixer option for subscribed users. In settings, you can set a percentage how loud you want the original audio and the PiccyBot description audio of a video to be. PiccyBot will now combine the two audio streams when you are sharing the video. So this should give complete freedom in whether you want a description only video, or some of the original sound, or whatever you like.
I know it adds to an already complex settings screen, so if you have any recommendations, please let me know.

Thanks for the retry button

Thanky you very much for the fast implementation. As for the settings screen, I don't personally find it that cluttered. You could of course put some settings under separate screens, like "video settings" and "voice settings", with the obvious drawback that it would take longer if the user wants to change a setting.

I would like to see shortcuts for this app

like speakaboo, It would be great if we could assign a shortcut to the action button that could directly capture and describe the seen.

using only English and without relying on visual references?

when describing video using claude 3-5 sonic and the "ask more" function, it always starts the answer with "using only English and without relying on visual references".
for example, when asked to describe the fingernails of the person, it said "Okay, let's focus on the fingernails of the hand, using only English and without relying on visual references."
why is it doing this? how can the model describe anything without any visual references? and i didn't ask the model to describe in other languages.

So since

So since the ability to share the new audio described videos with yourself or another device has been implemented, I have been able to do something very special. Last week, my wife, unfortunately lost her battle with leukemia. This has been an extremely difficult in trying time for me. It’s been very difficult for me and is still difficult for me to come to terms with this. But I found something that may make it a little less painful. I have taken all of the videos that my wife and I have ever done together on my phone, run them through PiccyBot, had them audio described and then saved the new audio described video to my Device, which now includes the original audio alongside it. So now I can look back on all of our videos and remember each good memory as if it was happening all over again. So I’d like to say a personal thank you to the developers of this app.

To Firefly

First I just wanted to say that, while I will never know what you are going through, nor could I ever know just what your significant other meant to you, I am truly sorry for your loss. For what it's worth.
Second, I think it is really awesome that you are using software such as PiccyBot, to enhance your digital memories of the life you shared with your wife.
May they bring you some semblance of joy in your darkest hour. 🙇‍♂️

Firefly

Thanks for sharing your experience, I cannot imagine how hard it must be, but I appreciate you sharing this feedback and thanks, despite it all. It is very rewarding for me to know that the effort on PiccyBot can be so impactful. It definitely motivates me to keep improving the app further. Thank you..

merging audio fail

I am using mainly the Mistral Pixtralb with videos share from YouTube. Not sure if that makes a difference. But when I get the audio description, it gives it like a summary of the video rather than scene by scene in sequential order. So I go back and ask it to do the description in sequential scene by scene order, and after processing, it says, something like merging audio, failed, and just display the text on the screen with the new result.

update to fix merging audio fail

Privateai, I have just released an update that should fix the merging audio fail. Please try it out and let me know if this works fine? How did the description scene by scene work out?

after trying the new update

first, let me say that I am truly amazed at how quick issues get addressed. Now until the testing result. Yes, the merge audio error message is gone, but the new result is not being read out by AI, it just displayed on the screen. No problem, I thought to myself that it should be OK if I save the video with the new description, which is better because it goes seen by scene sequentialy. When I saved the video, it has this new description in the video file name, but when I played the video, the audio track doing the description is still describing using the original description that sounds like a summary rather than describing each scene. So it looks like the audio merge is done initially when the video is first described, but when you ask for a new description, it does not merge the audio again?

another audio merging glitch, maybe?

following up on my previous post, today I came across another possible glitch or maybe it's intentional? I generated audio description for a short documentary I did, and the audio for the description turned out to be shorter than the actual video, so what it ended up doing was, at the end of the audio track, it started the audio over again. But here's the problem, the video is not long enough for the audio track to place through completely a second time, so about a minute in, it just ends. I am guessing the setting is for the audio track to keep looping until it matches the video lengths? Is there a way to maybe not do that but instead insert silent blocks in between paragraphs so the lengths would match? For example, if the video is one minute long, but the description is only 40 second long, just insert blocks of five second passes in between paragraphs to make that the right lengths? I am wanting to use this audio mixing and description for describing some of my earlier documentary works, but if it is going to loop and and in completely, it's going to be a lot of editing.
here is the link to the file I experimented on so you can see what I'm talking about
https://youtu.be/ib8BC7HEqHM?si=D57Id7FeNCT7TraQ

incidentally, I am noticing that in cases where the description generated as longer than the actual video, it doesn't do the audio mixing?

Question and Possible Bugs

Hi there,

As a Christmas gift, I treated myself to the lifetime subscription to PiccyBot, and I’ve been having so much fun exploring its features! I’ve spent a good amount of time experimenting with the various AI models, and it’s fascinating to see how their outputs can vary based on the same photo. The ability to choose from different voices and personalities is a fantastic touch. I also really appreciate how easily you can adjust the description length—whether you want something minimal or more elaborate, the choice is entirely up to you. Well done!

I do have a few questions I’d like to share:

Personality Toggle in Settings:
The personality toggle in the Settings menu doesn’t seem to be working as expected. When I double-tap the option, nothing happens the first time, but on the second double-tap, it switches between “on” and “off.” However, after closing the Settings menu, the generated descriptions still appear as if the personality is enabled. When I return to Settings, the personality toggle I just turned off has switched back on. Is this a bug?
Issue with Llama Model:
There seems to be a potential bug when switching to the Llama AI model. If I select Llama in the Settings while a photo is being described, I sometimes get an error message like, “There seems to be a hiccup,” followed by phrases like, “Please rephrase that” or “What personality did you want?” Interestingly, if Llama is already set as the AI model before taking or selecting a photo, it works just fine. The issue only arises when switching from another model to Llama. Could you look into this?
AI Model Variations in Descriptions:
This might be related to the models themselves rather than the app, but I’ve noticed that some models do an excellent job describing both me (in a selfie, for example) and the background, while others focus entirely on the background and seem to ignore me altogether. Is this kind of behavior typical for certain models?

Thank you for the incredible work you’ve done with this app—it’s absolutely amazing, and I love using it!

please hear me out, I know many of you would agree with me!

Hello, I'm Labron, another blind user of AI apps like yours, and i have to say, your app is amazing, but there is one major problem you need to deal with pronto! Just like every other AI app out there, we're paying money on your app, and we have the same issue of, all things bad and negative not being described to us at all! it's not fair and you know it! so, what you need to do now is make your AI apps describe things that AI just likes to avoid, like all things bad and wrong. you all can't just barrier us to the positive, we want to see the negative also, like regular sighted people. if I have a video of someone getting killed, your AI apps will try their best to not describe the bad aspects of the video, and that is really wrong, and you all know that! it's really not fair for people like us to have to deal with things of that sort! so please make your AI describe everything and don't keep the negative hidden from us! please, and thank you!
PS: I also have this same message on your YouTube! check it out and write back to me as soon as you see this! Thank you!

@labron3

This is quite literally nothing to do with the developers of the apps. It's the limits set by models such as Open AI and anthropic. Unfortunately , it's the current form that we can't have everything described to us, perhaps that will change in future but, like I say, it's nothing to do with the dev.

@inforover

Oh wow, that is so bad! why don't they fix this stuff they too like being positive! this is not the way life should work, especially with people like us!

Agents are the Solution I think

Well, we should have AI agents built specifically to help out visually impaired people, and it should be done with the full support of and in collaboration with one of the LLM providers. This way, the provider should know to relax several of these current limitations to this particular agent given the purpose and the justification.

Love this app.

First off, I want to thank the developer for creating such an amazing app, I use it almost every day to describe my random videos. This app has definitely come a long way since it was first released. I have some observations and feedback. I have started using this app around late August. I think sometime around late October to early November possibly, there was an update that changed a few things one of them being the processing or waiting sound and the other about the personality. Back with version 2.4 I believe was the one I was using in August, I feel like when the personality toggle was turned on the voice description had a lot more personality. like for example response was a lot more dramatic if that makes sense and it seems that it is not as much anymore. i’m not sure if that change was due to the actual AI model side of things or if it was on the PiccyBot side of things, but I liked those opinionated responses like it had before if that makes sense. I also think it would be nice if there could be a feature in settings where you could change the waiting sound effect so you can pick between different ones like the sound that was an earlier versions of the app or the sound that there is now.

To labron3

Well, yesterday I used it to describe a plane crash video and it did quite well with the traumatic scenes.

New AI app for describing images and video: PiccyBot

Options

Comments