Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Computers do better than pathologists in predicting lung cancer type, severity (stanford.edu)
204 points by rch on Aug 22, 2016 | hide | past | favorite | 62 comments


Yet another press release that fails to reference the study it's talking about. Not to mention giving no quantitative details about its lead ('Trounced'? By how much?).

The study is "Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features" by Kun-Hsing Yu et al. It's open access: http://www.nature.com/ncomms/2016/160816/ncomms12474/full/nc...


Computers trounce* pathologists...

* by an un-quantified amount using only histology slides -- a limitation not enforced in actual medical practice.


yeah.. seriously.. what academic journal would accept that kind of statement.


Nature, apparently.


Nature Communications, which publishes a dozen articles every working day.

(2191 so far in 2016)


Nature and Science appear to accept a lot a "controversial" articles that don't really live up to their reputation for rigor.


Indeed, this is a press release and not journalism.


I worked on a similar pipeline for renal cell carcinoma a few years ago [1], although we only published a small subset of results since parts of the pipeline (e.g., finding representative tiles, survival prediction) had better results being produced elsewhere in the lab.

Regarding the hook in the headline -- computers surpassing pathologists -- it's a bit like automated driving in that even if true the immediate problem is the social and economic system. That is, we're not going to be removing the pathologist from the diagnostic and prognostic process anytime soon for many reasons, so how instead do we leverage machine learning in concert with the human observer to improve the diagnostic system? For that reason, decision introspection may be as valuable a topic of research as improving classification accuracy: justifying a particular automated classification to the pathologist, directing them to representative regions, and describing regions of feature space in biological terms.

[1] http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6945104


Very good point. Even if this particular study isn't a "breakthrough" it is likely a matter of time before AI / Deep Learning (what term am I supposed to use?) is able to better interpret visual data than a human. The real barriers to adoption of such a technology are significant and can be witnessed in the popular Self Driving Car technology race. Many people seem to believe the technology problem has already been solved or will be shortly but regulations, public perception, and adjacent industries (i.e. insurance) must adapt before we actually see self driving cars for the common man. These tend to be slow moving beasts.

Given how many applications we're seeing for AI it seems that there may be a battle for attention of lawmakers and others for which AI application we put into production first given the limited bandwidth our regulators have; it's not like congress has been especially productive over the last many years.


This! As far as I can remember, the experience in chess is that a human+computer team beats either of them working alone (sorry, can't find a reference just now).


I would love to believe this is true, unfortunately I do not think it is.

Computer chess algorithms completely blow away human competitors. The strongest human, Magnus Carlsen has an elo rating of 2857 [0].

Stockfish, the strongest chess algorithm (open source, btw) has an elo rating of 3445 [1].

Computer chess algorithms are so much stronger than humans that if the human second guesses the algorithm -- the human is probably wrong.

You may have been thinking of this bbc article [2] in which an amateur cyborg player beat grandmaster cyborg players -- the amateurs were crunching additional metadata about what situations were best for their play. However, they didn't beat stockfish, they beat other cyborg players.

[0] https://ratings.fide.com/card.phtml?event=1503014

[1] http://www.computerchess.org.uk/ccrl/404/

[2] http://www.bbc.com/future/story/20151201-the-cyborg-chess-pl...


> I would love to believe this is true, unfortunately I do not think it is.

This is the blog post that introduced me to the idea that human + computer might be better a computer alone: https://rjlipton.wordpress.com/2015/07/28/playing-chess-with...

It's called Freestyle Chess or Advanced Chess. Humans have beaten the best computers this way, but I'm not sure it is clear that human + computer outperforms a computer alone consistently.

Incidentally, that's a great blog to browse if you like chess and math/CS theory.


You must be right about the cyborgs. I read the same thing, in "Race Against the Machine," by Erik Brynjolfsson and Andrew McAfee [1] on page 54, where the author cites Kasparov, who wrote (in 2010, regarding the cyborg thing) [2]:

The surprise came at the conclusion of the event. The winner was revealed to be not a grandmaster with a state-of-the-art PC but a pair of amateur American chess players using three computers at the same time. Their skill at manipulating and “coaching” their computers to look very deeply into positions effectively counteracted the superior chess understanding of their grandmaster opponents and the greater computational power of other participants. Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.

It sounds like since then computers have improved enough such that humans no longer help.

[1] https://www.amazon.com/Race-Against-Machine-Accelerating-Pro...

[2] http://www.nybooks.com/articles/2010/02/11/the-chess-master-...


The top humans and top computers never play each other anymore, and so the rating pools are independent and not comparable. I suppose Stockfish would grind Magnus down in a serious match, but only because the human player is susceptible to fatigue. The teams that tune the engines for engine v engine matches would never dream of letting them make up their own moves in the opening phase, instead they use human openings, an acknowledgement that humans understand the game better.


>an acknowledgement that humans understand the game better. //

How is that true? I'd imagine it's to reduce the search space against human players and to make use of records of games played - a large corpus of which have involved traditional openings. It doesn't seem to imply the acknowledgement claimed?


Sorry I "replied" in a sibling node since no reply link was present at the time (maybe reply links only appear after a timeout?)


pbhjpbhj's comment doesn't have a reply link for some reason so I will reply here. Left to their own devices computers will play crude simple development moves in the opening. That's because there are no tactics until the pieces come into contact and it's all strategy, where humans still have an advantage. Decades of experience have resulted in a corpus of classic subtle opening strategies that the computers don't rediscover - for example in some Kings Indian positions Black has precisely one good plan - a good human player will initiate this plan with move ...f5 automatically - an engine will juggle ...f5 with a bunch of other slow moves, (all of which are irrelevant) and might not play it. If you look through the pages of "New In Chess" magazine you will see the top masters analysing their games and saying things like - "The computer does not understand this type of position - it rejects $some-good-human-move but eventually when you lead it down the right path it changes its mind".

All of this is not to say that computers have not overtaken humans in chess. They have. But primarily because they have superb qualifications as practical players - they never make crude errors and all humans on occasion do. This trumps all other considerations. But the vast gap you see when comparing the human lists and the computer lists is exaggerated - 500 Elo points means a 95% expectation. I am 100% convinced that if Magnus could be properly motivated (think big $$$ and somehow convincing him that scoring a decent proportion of draws as White would be a "win") he could deliver for humanity :-)


> I am 100% convinced that if Magnus could be properly motivated (think big $$$ and somehow convincing him that scoring a decent proportion of draws as White would be a "win") he could deliver for humanity :-)

Interesting theory. Personally, I'm not sure it's clear that good strategy can overcome ruthless tactical precision. I'm also not sure a human could ever be motivated to achieve a 0.00% tactical blunder rate. (Much as I would love to see human strategy defeat computer tactics.)


> I am 100% convinced that if Magnus could be properly motivated (think big $$$ and somehow convincing him that scoring a decent proportion of draws as White would be a "win") he could deliver for humanity :-)

You are 100% wrong. Computers overtook humans at chess in 1995. Unless there is some insight Magnus Carlsen or any other GM has which they are not sharing, it will remain that way.


Even at equal capability a person who never tired, seldom erred, never forgot a sequence, never got distracted or emotional would surely be superior at chess?


If you think there is no way Magnus Carlsen could steer a few games to draws, you really don't know much about chess.


I am a 2200 elo rated player. There will be more to the game than Magnus steering "a few games to draws". I suppose it would depend also on how many games are agreed for the match. For a suitably large number of games, the computer could simply force many dull draws then lash out with a strong tactical game. Humans have the extra dimension of emotions and fatigue to deal with. Switching from positional play to long tactical sequences does not play well to human strengths; in addition getting a draw from a "mission programmed" computer may not be trivial as there is the added dimension that it does not need to choose the most direct route to the draw.

Another factor is that it is trivial to change the computers repertoire of openings and there are a wide choice of these. Humans including Magnus require weeks to months of preparation before they are ready to play new openings or deviate from prepared lines.

Finally, (and I freely admit that this is my own personal opinion), Magnus Carlsen may not be our best choice of human to play against a computer. There is an unmistakable emotional fragility to him which manifests when he is losing (cf his games with Anand); a good deal of his strength lies mainly in the early middle game and ending but computers are superior in the latter; and he often wins games out of sheer stamina- a strategy that wont work with the silicon beast.


The original point I was making was simply that the >500 point Elo delta is an exaggeration. So I only need Magnus to steer a few games to draws to be correct.


Agreed. I published a paper on exactly that ( augmenting scientists searching for novel materials, rather than replacing scientists entirely). One of the benefits is that you can often model simpler problems which are more tractable.

http://www.nature.com/nature/journal/v533/n7601/full/nature1...


My friend worked in a pathology lab, run by a recognized world expert in the field (wrote college texts, gave lectures, etc) yet the performance was 'mediocre' due to reoccurring mistakes made by humans:

* Doctors disagreed with each other frequently. Most senior won.

* Results from one patient were given to another patient

* ... or results were lost entirely

* Lab destroyed samples before examination or stained incorrectly

* Samples never received, lost in mail or never picked up from airport

These folks did have a tough job, looking into microscopes for many hours per day. At some point I imagine things just start looking the same as fatigue sets in.


May be the next medical breakthrough is really going to be TODO lists.

Which reminds me of Lotus Notes. 20 years ago it widely used to automate workflows. But I haven't seen anything similar widely used since.


It already is:

From: The effects of safety checklists in medicine: a systematic review: [1]

Safety checklists appear to be effective tools for improving patient safety in various clinical settings by strengthening compliance with guidelines, improving human factors, reducing the incidence of adverse events, and decreasing mortality and morbidity. None of the included studies reported negative effects on safety.

[1] http://www.ncbi.nlm.nih.gov/pubmed/24116973


To me, this is a generalization of the benefit of software/automation: what we're really discovering is that when we force people to think through all the edge-cases in a process in order to explain that process to something as dumb as a computer—or a checklist—we dig up a lot of implicit institutional knowledge that, crucially, not everyone has. (Everybody that had it just assumed everyone else knew!)

The result of such a process can be a checklist, or a workflow diagram, or a program. The form of it isn't really important, so much as the fact that the business logic being applied (by a human or a computer) to each case now embeds all the previously-scattered knowledge of the domain explicitly, making each node applying it behave as the union of its institution's knowledge-bases, rather than as the intersection of those knowledge-bases.


Also, a checklist doesn't get tired and forget a step.


You've heard of The Checklist Manifesto?

Medicine features in it.


Protocol books formalize checklists for ER medicine, over-the-phone triage, etc.


> Doctors disagreed with each other frequently. Most senior won.

I'm told this is a huge problem in medicine. They might be even more hierarchical than the military.


I wonder how the pathologists would fare if they were also put through the same process as the ANN, i.e. given training data along with immediate feedback on whether their prognosis was right or wrong, then tested on the reserve data. Pathologists give daily prognosises but only get feedback, if at all, many years later.


This is actually the crucial point for me. The care and effort put into training the ANN likely far exceeds the care and effort put into training the pathologist. Pathologists learn their craft like most doctors do - a mixture of consulting text books and supervision by seniors, but mostly just by working it out themselves. Pathologists also tend to optimise for dealing with rare and unusual cases rather than small incremental gains in their performance on routune diagnoses, because of how they are assessed in exams (ie more breadth than depth of knowledge). This gives them good 'general AI' for whatever case walks in the door, rather than a highly constrained test set of one type of case.


I don't know much about the medical field, so excuse my ignorance in this, but couldn't we train Pathologists on the same data we train ANNs on? That way we could establish similar speed in feedback loops.


I think this is already done. Pathologists-in-training have to learn somewhere. Machine learning researchers didn't exactly invent the concept of studying practice problems with provided answers, then taking a test.


It's nice that the article is open access and even includes the R code use to perform the analysis

[1] http://www.nature.com/ncomms/2016/160816/ncomms12474/full/nc...

[2] http://www.nature.com/article-assets/npg/ncomms/2016/160816/...


I am surprised they only used ~2000 samples. That's a very tiny amount of data to train a decently sized CNN, even with data augmentation.

I presume using a pretrained net probably won't help you since the features found in pathology slides are so unlike those of natural images.


I've done a bit with deep learning and medical images (brain, lung, skin). One thing that surprised me, but kind of makes sense, is the net seemed to perform well without a ton of training data. I think it might actually be easier for the model because the images are much more similar than in a "natural image" setting where there are more degrees of freedom in the subject itself.


Each image can be multiple gigabytes so it could be computationally difficult. Also those are probably all the lung cancer slides that TCGA has (edit: there seems to be 3500+ lung cancer slides at [1] so I was wrong. Maybe they're not all H&E stained)

[1] http://cancer.digitalslidearchive.net/


You also need to retain a significant set of slides the classifier wasn't trained with so you can verify that it does work for data outside the training set.


Somewhat surprisingly, the paper doesn't actually use deep learning.


Everyone is missing this. They used simple random forests and survival models with basic k-fold cross validation. There's no breakthrough in ML here!


Pattern recognition has been getting results for decades now on various problems and for some of those problems deep learning doesn't help that much.


So, making shit up is standard for this field of biomedical imaging.


The world is full of constant hype about "breakthrough" studies, and cures for diseases that are "coming soon". Somehow the cures seldom materialize, and too often, if anyone bothers to check, they can't replicate the studies.

I wonder if the secret sauce that makes science effective isn't so much the scientific method, as the spirit of skepticism behind it. Without that, it's easy to make errors. And wishful thinking is pervasive because there are strong incentives: career advancement, corporate influence, ego, and so on.

My view, assuming AI doesn't kill us, is that it's going to save all of our asses (from ourselves).


While I have serious reservations, I do appreciate philosopher Paul Feyerabend's idea of science as "epistemological anarchism." In the end, all rule-based concepts of how to accumulate knowledge and understanding fall victim to methodological problems: the only way to really do science is a no-holds-barred, anything goes attitude of a mind against the universe.


Why isn't this already is practice given ML advances in recent years?

Why are mammograms still being subjectively interpreted?

Are there not already companies today where you can upload medical imaging results and get back results that beat humans?


>Why isn't this already is practice given ML advances in recent years?

Well it would be nice to first validate the results somewhere else (peer review should actually be completed) and no doubt some real world trials proceed and their results evaluated. This is what any FDA approval, at a minimum, would entail. Keep in mind what the author's themselves note in how this study may differ from the "real world":

"One limitation of this study is that cases submitted for TCGA and TMA databases might be biased in terms of having mostly images in which the morphological patterns of disease are definitive, which could be different from what pathologists encounter at their day-to-day practice."

Putting this into practice also requires looking at the bigger picture; not just the accuracy of the diagnosis vs. humans.

For example I have found that in studies doctors are much more conservative in their diagnosis than an academic algorithm with no implication on patient outcome. For instance if the question is a) "cancer" or b) "not cancer" the doctor, fearing patient harm, malpractice suits, career derailment, will be biased to identifying "a) cancer" because the perceived costs of a false positive (treating for a nonexistent cancer which may have significant ill effects) are lower than a false negative (not treating a real cancer leading potentially to death). This will reduce the human doctor's accuracy on diagnosis.

What the "right" bias is in an individual case can bring in many more factors and moral questions out of scope for this study.

This is not to argue that tools like these should not be developed and utilized but practical application is often more difficult than just solving the technology challenge. I'm certain that we'll see many advances in healthcare due to ML.


> Why are mammograms still being subjectively interpreted?

Computer-aided detection and diagnosis software is used in clinical practice, e.g. see [1]. I believe that false-postives-per-scan is still somewhat of an issue.

[1]: http://www.hologic.com/products/imaging/mammography/image-an...


Human error comes from a lot more than just pure assessment of whether something is cancer or not.

Fatigue, willpower drain, emotion, and many other factors can make a top-tier human expert into a mediocre one (or worse) in a way that is very hard to detect.

Even if computers aren't perfectly matching the ideal human (which they will inevitably pass, of course) they're still massively valuable in that they can maintain their level of expertise when humans cannot.


Um, computers do better than humans in the vast majority of applications to which they are programmed for.

By which I mean, the news is not that computer > human, but that medical industry/profession/jobs are taking first baby steps at being automated away.


No, I think any task where computers are newly enabled to beat humans is newsworthy. I'd like to know about each of the dominoes as they continue to fall.


How soon until computers can make better MVC apps than people?


The article mentioned they only had a dataset of 2,186 slides. Isn't this a _tiny_ dataset to do any learning on images? They also mentioned they blew up the predictors to over 10,000 features. This sounds like a recipe for overfitting to me. I can't access the actual paper (getting a 500 error), but does anyone know how they built these algorithms?


The images are 1-5 gb each. They tile them. Then they did some simple sorting (remove all tiles with lots of white), (emphasize tiles with lots of blue), then they did random forests, survival models and k-fold cross validation.

The test set is interesting: they used a completely unrelated dataset of tissue microarray spots. This are tiny, 1-2 mm circles. Whereas the TCGA images are full size tissue cassettes, up to 1" on the short side.


There's a LOT of biological data out there. With a little bit of effort, researchers can design AI systems that beat doctors at most specialized tasks, in both computer vision and genetic data analysis. The question is how such systems can actually be used in practice.


Pathology slides are analogue data (lot of data, more than radiology scans). Even tertiary care hospitals do not routinely digitize slides unless they have to be returned to another hospital or used for a teaching conference.


Does anyone know about how to get involved in work like this? I'm outcome-oriented so disease diagnosis would be motivating for me. I'm looking to help out on an open-source project along these lines.


http://cellprofiler.org/ is open source and was used for this paper


Maybe help improve http://ilastik.org/ ?


join Histowiz, we have 10x the data! It's one of the largest preclinical pathology slide database plus metadata like biomarkers, organs, disease, outcome, experimental conditions...to do correlations with histopathology image features




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: