More

kuzee · on Aug 13, 2022

We secretly agree and aren't sure why it's faster to do two queries, but it measurably is. We're going to try some of the suggestions littered in this conversation and will report back, this time with some EXPLAIN output. We appreciate the suggestions and theories.

kuzee · on Aug 13, 2022

We shared the mistake so others don't accidentally make the same assumption; I'm sure a few people either learned it for the first time or appreciated the reminder. Rest easy this never made it to prod.

zasdffaa · on Aug 14, 2022

Thanks. I have a lot of time for those who are willing to admit their mistakes.

kuzee · on Aug 13, 2022

I'll gladly read anything you've written on the topic, sounds like you're pretty knowledgeable.

kuzee · on Aug 13, 2022

I wrote up my recent experience optimizing Postgres text search on a database with a few million records without relying on a new service like Elastic Search.

canadiantim · on Aug 13, 2022

Awesome thanks for the write up. Definitely cool to see breaking up the query into 2 queries had such a marked improvement! I know I can often get too obsessed with making just a single query. Cheers.

kuzee · on March 10, 2022

This thread from the publisher addresses this: https://twitter.com/burk504/status/1501618260466233344

kuzee · on Jan 9, 2022

Taking the author's post at face value, I think the solution should be:

Upwork action item: Robin needs to add a new payment instrument to cover the $12000 debt. Freeze Robin's account until then. Any fraud being perpetrated requires Robins involvement, so lean in there.

Upwork action item: tell author that if author hires a lawyer they'll share information necessary for author to pursue a case against Robin.

Author action item: have a lawyer take a look at case vs Upwork and Robin as joint defendants. Make the defendants sort it out.

Kind of am aside, but if Upwork takes zero responsibility and provides no value for the manual payments, seems like all manually tracked work ought to be billed directly between freelancer and buyer. They won't even pursue Robin, the holder of the wrong credit card it seems even when Upwork knows who Robin is.

kuzee · on Jan 9, 2022

This makes sense, and I think you've taken the correct route. I look forward to trying this in one of my projects and comparing to my current postgres-only backed search strategy. For my use case losing the index between restarts isn't a deal breaker, so hopefully I'll have some useful feedback.

heywhy · on Jan 9, 2022

That's great. I will be looking forward to this.

kuzee · on Sept 25, 2021

I think it's a strong move to give away so many tools for free and consolidate their income to align so closely with the charity's goals, and while time will tell, I think charities will find this works well for them.

I'm invested in their success.

kuzee · on Sept 18, 2021

Sounds sweet. I bet a lot of companies relying on mturk built this for themselves and then sell a higher value service with better margins. You could build something right in the middle.

I know Stanford's research teams all use a common interface to mturk that keeps profiles of turkers on their side so they know who to solicit for upcoming surveys, conduct longitudinal studies, etc. I've always wondered why more universities didn't follow suit.

I built a side project called cogmint based on the insight that simple scoring and ranking of workers was valuable. I ended up building my own worker interface instead of using mturk because it wasn't much additional effort on top of the scoring logic I was building anyway. Perhaps other serious companies came to the same conclusion I did with my hobby project.

kuzee · on Sept 18, 2021

Same!

I created Cogmint.com ("cognition minting") to solve this problem for myself.

You can submit known correct answers for questions, and those questions are then used as ground truth to score worker accuracy. Workers are then scored on their similarity to known correct answers and other workers that have accurately answered questions. It works surprisingly well for how simple it is. It's been a fun challenge to create simple methods of scoring similarity across different task types.

It's a side project, so don't rely on it for mission critical things, but I rely on it for some production tasks, so it's stable.

It currently supports classification (choose from a set of possible answers) and has beta support for bounding box task types. String input task types are coming very soon.

I'd love to see if it can help you out, I'll waive the fees: I'm not in it for the money I just like making things useful and reliable. Reach out and say hi!

georgeutsin · on Sept 18, 2021

The problem of providing quality control is that there are a lot of edge cases; even known “high accuracy” turkers may have bad judgement sometimes, which means that every piece of data needs to be validated anyway, whether it be the researcher themselves or another paid contractor.

My undergrad thesis was to build https://tagbull.com, where we tried to have turkers validate the work of other turkers by breaking up a label into sub tasks, and getting multi-turker consensus on those before moving forward.

The main issue we ran into is that the incentive system is incredibly misaligned with the responsibility that the turkers have. It’s very difficult to build trust, especially with a crowd of people who haven’t signed contracts, and who face virtually no repercussions for doing bad work, whether intentionally or unintentionally.

webmaven · on Sept 18, 2021

> You can submit known correct answers for questions, and those questions are then used as ground truth to score worker accuracy. Workers are then scored on their similarity to known correct answers and other workers that have accurately answered questions. It works surprisingly well for how simple it is.

Have you noticed problems that show up with questions whose answers have a bimodal distribution (ie. The gold standard question actually has two or more correct answers)?

In one sense, this is just a labeling quality problem with the 'gold standard' data, but to a lesser extent these same issues may crop up in the data being labeled when using similarity or clustering to rate or classify the workers and transitively apply that to the other results they produce.

kuzee · on Sept 21, 2021

Anecdotally yes it's a problem if two classes (button choices) are similar, resulting in two "top answers" for a given task. This seems most common for "yes/no" task types where there are only two options, and distinguishing between them is the hard part.

I haven't dug into the data on this across the platform but you've given me the idea to go see if I can find evidence of this, and see if I can improve somehow. There's only low hundreds of projects, so I might be able to find some that have this problem.

sokoloff · on Sept 18, 2021

We can find your site, but not your email. (Email field in your HN profile is always hidden; to share an email, put it in “about” on your profile.)