By the Power of Grayscale

263 points by surprisetalk 6 days ago

shoo a day ago

If you enjoyed this post you may also like the 2024 book foundations of computer vision: https://visionbook.mit.edu/

prior hn thread: https://news.ycombinator.com/item?id=44281506

i don't have any background in computer vision but enjoyed how the introductory chapter gets right into it illustrating how to build a limited but working simple vision system

yunnpp a day ago

Thanks for the reference. Looks very from-the-ground-up and comprehensive.

Cthulhu_ 20 hours ago

About 15, 20 years ago I was still in uni and we had a computer vision lab, the main guy there had been working on that subject for years and dealt with businesses where his stuff was used for quality control.

Without fail, step one of computer vision was to bring the image down to grayscale and / or filter for specific colours so you ended up with a 1 bit representation.

My "algorithm" for a robot that was to follow a line drawn on the floor boiled down to "filter out the colour green, then look at the bottom rows of the image and find the black pixels. If they're to the left, adjust to the left, if to the right adjust to the right". Roughly. I'm sure it could be done a lot more cleverly but I was pretty proud of it AND the whole tool suite was custom made, from editing environment to programming language. Expensive cameras and robot, too.

ryukoposting a day ago

It may come as a surprise to some that a lot of industrial computer vision is done in grayscale. In a lot of industrial CV tasks, the only things that matter are cost, speed, and dynamic range. Every approach we have to making color images compromises on one of those three characteristics.

I think this kind of thing might have real, practical use cases in industry if it's fast enough.

vincenthwt a day ago

Ah, I think you work in the same industry as me, machine vision. I completely agree with you, most applications use grayscale images unless it’s color-based application.
Which vision library are you using? I’m using Halcon by MVTec.
- ryukoposting a day ago
  
  I used to work in industrial automation, I was mostly making the process control equipment that your stuff would plug into. PLCs and whatnot. We had a close relationship with Cognex, I don't remember the exact details of their software stack.
gridspy a day ago

Also resolution & uniformity
Color makes major compromises physically also, since it seems like the Red, Green and Blue channels are sampling from the same physical location but the actual sensor buckets are offset from each other.

teiferer 2 days ago

Appreciate the old school non-AI approach.

Sharlin a day ago

Classical machine vision and pattern recognition is absolutely AI. Or at least it was AI before it became too mature to be called that. As they say, any AI problem that gets solved stops being AI and becomes just normal algorithmics.
- fragmede 19 hours ago
  
  Classical computer vision is no more AI than quicksort or BFS is. What they say is ML is AI that works. But classic computer vision (CV) is hand rolled algorithms like Eigenfaces to detect faces or Mixture of Gaussians for background subtraction. There's no magic black box model in classic CV, no training on data, no generated pile of "if"s that no one knows how it works. Just linear algebra written and implemented by hand.
  Not AI, not even ML.
  
  jrmg 15 hours ago
  
  ML, at least historically, has been considered a subset of AI, not a superset.
  Until the rise of LLMs recently using human-designed deterministic algorithms to perform ‘intelligent’ tasks (like image processing, and especially image recognition) has absolutely been considered AI.
  AI encompasses (encompassed?...) everything that uses computation to produce intelligence-like results.
  I fear the terminology battle has been lost, though, and nowadays most people consider at least neural networks - perhaps also non-determinism of output - to be a prerequisite for something being “AI” - which is actually _less_ meaningful to the end-user.
  
  throwway120385 18 hours ago
  
  You might say that the loss function is the human in the loop deciding whether or not the algorithm addresses the problem.
  
  Sharlin 14 hours ago
  
  You’re moving the goalposts, which is exactly what I referred to. Search algorithms and pathfinding have absolutely been AI historically, just go take a look at the table of contents of Norvig’s AI:MA. And I mean the 4rd edition that was published in 2020. A good 90% of the book is classical algorithmics.
  It’s pretty hilarious and history-blind to claim that AI is just 2015+ era deep neural magic black boxes or something, as if the field wasn’t invented until then. As if neural networks themselves hadn’t been tried several times at that point and found okay for classification tasks but not much more.
  As if for a long time, most AI researchers didn’t even want to talk about neural networks because they feared that their "cool" factor takes focus away from real AI research, and because the last time NNs were a big deal it was followed by one of the AI winters of broken promises and dwindling budgets.
  
  fragmede 10 hours ago
  
  > It’s pretty hilarious and history-blind to claim that AI is just 2015+ era deep neural magic black boxes or something, as if the field wasn’t invented until then.
  I didn't make that claim. Was this written by a hallucinating LLM?
amelius a day ago

But have a look at the "Thresholding" section. It appears to me that AI would be much better at this operation.
- vincenthwt a day ago
  
  It really depends on the application. If the illumination is consistent, such as in many machine vision tasks, traditional thresholding is often the better choice. It’s straightforward, debuggable, and produces consistent, predictable results. On the other hand, in more complex and unpredictable scenes with variable lighting, textures, or object sizes, AI-based thresholding can perform better.
  That said, I still prefer traditional thresholding in controlled environments because the algorithm is understandable and transparent.
  Debugging issues in AI systems can be challenging due to their "black box" nature. If the AI fails, you might need to analyze the model, adjust training data, or retrain, a process that is neither simple nor guaranteed to succeed. Traditional methods, however, allow for more direct tuning and certainty in their behavior. For consistent, explainable results in controlled settings, they are often the better option.
  
  shash a day ago
  
  Not to mention performance. So often, the traditional method is the only thing that can keep up with performance requirements without needing massive hardware upgrades.
  Counter intuitively, I’ve often found that CNNs are worse at thresholding in many circumstances than a simple otsu or adaptive threshold. My usual technique is to use the least complex algorithm and work my way up the ladder only when needed.
  
  MassPikeMike a day ago
  
  I am usually working with historical documents, where both Otsu and adaptive thresholding are frustratingly almost but not quite good enough. My go-to approach lately is "DeepOtsu" [1]. I like that it combines the best of both the traditional and deep learning worlds: a deep neural net enhances the image such that Otsu thresholding is likely to work well.
  [1] https://arxiv.org/abs/1901.06081
  
  shash a day ago
  
  Ok. Those are impressive results. Nice addition to the toolbox
  
  hansvm a day ago
  
  Something I've had a lot of success with (in cases where you're automating the same task with the same lighting) is having a human operator manually choose a variety of in-sample and out-of-sample regions, ideally with some of those being near real boundaries. Then train a (very simple -- details matter, but not a ton) local model to operate on small image patches and output probabilities for each pixel.
  One fun thing is that with a simple model it's not much slower than techniques like otsu (you're still doing a roughly constant amount of vectorized, fast math for each pixel), but you can grab an alpha channel for free even when working in colored spaces, allowing you to near-perfectly segment the background out from an image.
  The UX is also dead-simple. If a human operator doesn't like the results, they just click around the image to refine the segmentation. They can then apply directly to a batch of images, or if each image might need some refinement then there are straightforward solutions for allowing most of the learned information to transfer from one image to the next, requiring much less operator input for the rest of the batch.
  As an added plus, it also works well even for gridlines and other stranger backgrounds, still without needing any fancy algorithms.
- Greamy a day ago
  
  It can benefit from more complex algorithms, but I would stay away from "AI" as much as possible unless there is indeed need of it. You can analyse your data and make some dynamic thresholds, you can make some small ML models, even some tiny DL models, and I would try the options in this order. Some cases do need more complex techniques, but more often than not, you can solve most of your problems by preprocessing your data. I've seen too many solutions where a tiny algorithm could do exactly what a junior implemented using a giant model that takes forever to run.
- Legend2440 a day ago
  
  It indeed would be much better. There’s a reason the old CV methods aren’t used much anymore.
  If you want to anything even moderately complex, deep learning is the only game in town.
  
  shash a day ago
  
  I’ve found exactly the opposite. In domain after domain the performance of a pure deep learning method is orders of magnitude less than that of either a traditional algorithm or a combination.
  And often the CNNs are so finicky about noise or distortion that you need something as an input stage to clean up the data.
- spookie a day ago
  
  There are also many other classical thresholding algos. Don't worry about it :)
- do_not_redeem a day ago
  
  sure, if you don't mind it hallucinating different numbers into your image
  
  Legend2440 a day ago
  
  Right, but the non-deep learning OCR methods also do that. And they have a much much lower overall accuracy.
  There’s a reason deep learning took over computer vision.
  
  vincenthwt a day ago
  
  You're absolutely right, deep learning OCR often delivers better results for complex tasks like handwriting or noisy text. It uses advanced models like CNNs or CRNNs to learn patterns from large datasets, making it highly versatile in challenging scenarios.
  However, if I can’t understand the system, how can I debug it if there are any issues? Part of an engineer's job is to understand the system they’re working with, and deep learning models often act as a "black box," which makes this difficult.
  Debugging issues in these systems can be a major challenge. It often requires specialized tools like saliency maps or attention visualizations, analyzing training data for problems, and sometimes retraining the entire model. This process is not only time-consuming but also may not guarantee clear answers.
  
  Legend2440 a day ago
  
  No matter how much you tinker and debug, classical methods can’t match the accuracy of deep learning. They are brittle and require extensive hand-tuning.
  What good is being able to understand a system if this understanding doesn’t improve performance anyway?
  
  vincenthwt a day ago
  
  I agree, Deep Learning OCR often outperforms traditional methods.
  But as engineers, it’s essential to understand and maintain the systems we build. If everything is a black box, how can we control it? Without understanding, we risk becoming dependent on systems we can’t troubleshoot or improve. Don’t you think it’s important for engineers to maintain control and not rely entirely on something they don’t fully understand?
  That said, there are scenarios where using a black-box system is justifiable, such as in non-critical applications where performance outweighs the need for complete control. However, for critical applications, black-box systems may not be suitable due to the risks involved. Ultimately, what is "responsible" depends on the potential consequences of a system failure.
  
  throwway120385 17 hours ago
  
  This is a classic trade-off and the decision should be made based on the business and technical context that the solution exists within.
  
  shash a day ago
  
  OCR is one of those places where you can just skip algorithm discovery and go straight to deep learning. But there are precious few of those kinds of places actually.
  
  do_not_redeem a day ago
  
  GP is talking about thresholding and thresholding is used in more than just OCR. Thresholding algorithms do not hallucinate numbers.

atum47 2 days ago

I was working on a image editor on the browser, https://victorribeiro.com/customFilter

Right now the neat future it have is the ability of running custom filters of varied window size of images, and use custom formulas to blend several images

I don't have a tutorial at hand on how to use it, but I have a YouTube video where I show some of its features

https://youtube.com/playlist?list=PL3pnEx5_eGm9rVr1_u1Hm_LK6...

atum47 2 days ago

At some point I would like to add more features as you described in your article; feature detection, image stitching...
Here's the source code if anyone's interested https://github.com/victorqribeiro/customFilter
smusamashah a day ago

I vaguely remember XnView having this matrix based custom filters.

swiftcoder a day ago

This is really solid intro to computer vision, bravo!

grep_it a day ago

Really enjoyed this article, thanks for sharing!

I had recently learned about using image pyramids[1] in conjunction with template matching algorithms like SAD to do simple and efficient object recognition, it was quite fun.

1: https://en.wikipedia.org/wiki/Pyramid_%28image_processing%29

aDyslecticCrow a day ago

Image pyramids are a brilliant method. The technique is hiding in many of the FCNN image segmentation models ive read.
A truly clever image processing method.

rmonvfer a day ago

I’m not a “C” person but I’ve really enjoyed reading this, it’s quite approachable and well written. Thank you for writing it.

jsmailes 20 hours ago

The blob-finding algorithm makes me think of the "advent of code" problems - I wouldn't have thought to do a two-pass approach, but now that I see it set out in front of me it's obviously a great idea. Seems like this technique could quite easily be generalised to work with a range of problems.

kazinator 2 days ago

Referencing "By the power of Grayskull!"

macleginn a day ago

As an aside, "For the honor of grayscale" would work no worse here.
mkaic 2 days ago

IIIII HHHAAAAAVE THE POWERRRRR

hu3 a day ago

For those who don't know, the author is a very prolific dev:

https://github.com/zserge?tab=repositories&q=&type=&language...

sethammons a day ago

This was a fantastic post. I've never really thought much about image processing, and this was a great introduction.

cestith 16 hours ago

nakamoto_damacy 2 days ago

From a 70s kid to an 80s kid, well done!

Xenoamorphous 2 days ago

Ditto. I’ve upvoted this based solely on the amazing title. Best toyline ever.
dcminter 2 days ago

I too applaud this terrible (amazing) pun.

ggm a day ago

Didn't recognize George Smiley in those photos. Which makes sense, given he's an espiocrat.

nusl 20 hours ago

This title is excellent.

ranger_danger 2 days ago

Quality He-Man reference.