In search of the perfect URL

mysterypie · on Sept 19, 2015

Xvideos uses the same technique to good effect as well. All of their videos links follow the format:

www xvideos com/videoNNNNNNN/description_of_activity_which_can_be_easily_updated (NSFW)

I wanted to mention a mystery concerning Xvideos. Here's a business that is very much in-your-face (i.e. it is not a defense contractor or an organization that wants to be discreet), but its ownership is totally unknown.

I researched it. There are literally zero articles or information about who owns it. No interviews with the founders. Nothing. I haven't been able to even figure out what country it's based in.

Somewhere out there is a very rich person whose family and friends probably don't realize that he founded a major Internet business.

Yes, a major Internet business: they have several million videos (far more than competing "tube" sites), hundreds or thousands of fast servers, and an Alexa rank of 47 which is higher than imdb.com and only a couple steps below microsoft.com.

But in this age of little privacy, they've managed to be super private.

icebraining · on Sept 19, 2015

It was actually blown recently (Aug. 15th) thanks to a lawsuit:

Another infringement suit has been waged by the MetArt Network against a well-known online adult brand. (...) This time around the target is adult tube site XVideos.com and two related web properties (...) along with defendant owners Stephane and Malorie Pacaud of France

http://www.xbiz.com/news/197942

Of course, those names could just be covers.

eli · on Sept 19, 2015

Or maybe some corporate conglomerate that doesn't want the association to harm the reputation of other holdings. I once worked for a company that also operated an adult brand, and they went out of their way to obscure their ownership.

camillomiller · on Sept 19, 2015

Hey, you should try and ramble off-topic sometimes!

lwf · on Sept 19, 2015

It also means you can trick people:

http://www.amazon.com/Intel-Quantum-Computing-Module/dp/B001...

nmjohn · on Sept 19, 2015

Amazons URL's are actually quite interesting -

    Original: 
        http://www.amazon.com/Structure-Interpretation-Computer-Programs-Engineering/dp/0262510871/

    Equivalent:
        http://www.amazon.com/dp/0262510871
        http://www.amazon.com/dp/0262510871/something-else
        http://www.amazon.com/something/dp/0262510871
        http://www.amazon.com/something/dp/0262510871/something-else

It appears so long as 'dp/0262510871' is in the url (without dp/# appearing before it, but a second one after is fine) it works.

cpeterso · on Sept 19, 2015

Or simply http://amzn.com/0262510871

rawdisk · on Sept 19, 2015

HTTP/1.1 301 Moved Permanently

This is a URL shortener that just redirects to the full URL that has same number. Easier to type but otherwise acomplishes nothing. Server with the content still needs full URL. All this shorter URL gets you is the full URL.

ademarre · on Sept 19, 2015

You can trick people, and possibly even search engines. I've wondered if blackhat SEOs could abuse such URLs to discredit content on competitors' sites.

I believe it can be a negative signal when sites stuff too many keywords in their URLs, especially if those keywords aren't relevant to the page's content. A server accepting arbitrary URLs is in a way blindly sanctioning loaded URLs.

Granted, Google's algorithms are surely very sophisticated in this regard, but fighting web spam is hard.

MartijnHoutman · on Sept 19, 2015

Surely, a correct canonical URL will prevent this from happening.

X-Istence · on Sept 19, 2015

Stack overflow does this too:

http://stackoverflow.com/questions/32672492/python-3-5-start...

Is the same as:

http://stackoverflow.com/questions/32672492/

adventured · on Sept 19, 2015

I find it interesting the author mentions making an effort to remove the numeric ID from the URL.

I love using numeric IDs in the URL, for one specific reason: perma-short-link.

http://qz.com/365810/whats-missing-from-this-13-year-old-gir...

Becomes:

http://qz.com/365810

Which then redirects to the proper full url. Total effort: almost nil.

userbinator · on Sept 20, 2015

Not only numeric but alphanumeric IDs; they also work as a nice shorthand in communication. I've seen plenty of people referring to e.g. "video jI3i9Lq4BcX on YouTube" on sites which would otherwise censor actual URLs.

franze · on Sept 19, 2015

my battle-proven URL rules. important: rule 1 is more important then rule 2 to 6 added up, rule nr 2 is more important than rule 3 to 6 totaled, rule 3 is more important than 4 to 6 together, rule 4 is more important than 5 + 6, rule 5 and rule 6 are a tradeoff (it's short, not shortest possible URL).

the targeted phrase is term(s) you want to get found for (i.e.: in google search)

URL-Rule 1: unique (1 URL == 1 resource, 1 resource == 1 URL)

URL-Rule 2: permanent (they do not change, no dependencies to anything)

URL-Rule 3: manageable (measurable, 1 logic per site section, no complicated exceptions, no exceptions)

URL-Rule 4: easily scalable logic

URL-Rule 5: short

URL-Rule 6: with a variation of the targeted phrase

most common mistake, rule 6 (least important) invalidates rule 1 (most important)

i stand with these url-rules, evertime you compromise on them - or change the priority in between the url-rules, you - your company/startup/business/website/webapp - will regret it in the longterm.

about: >This is the sort of solution that I really like. The SEO folks can fiddle with the URL until the cows come home, the engineers have the luxury of a straightforward rule, and the user never sees a broken link. Is this simple structure enough to keep everybody happy?

NO

every redirect has a cost:

- server ressources

- (web)performance a.k.a. speed

- long term project costs: redirects needs to be maintained (they will not) and documented (they are not)

- added complexity (redirect complexity add up fast, more info see https://news.ycombinator.com/item?id=8891553 )

lwf · on Sept 19, 2015

> every redirect has a cost:

If you are actually just keying your content lookup on the ID and don't redirect the user, what's the performance problem?

And use rel=canonical so search engines do the right thing.

franze · on Sept 20, 2015

no

simplified google works like this

discovery (queue) -(quality check)-> crawling(optional) -QC-> indexing

google does not "follow" canonicals, but whenever google discovers (during crawling) a canonical it pushes it back to the discovery queue -> needs to crawl again -> needs to figure out indexing

canonical is an indexing directive

so basically there are two quality checks before google can actually apply the indexing directive after it has discovered the canonical during crawling. also you can never be sure when - if ever - it will fetch the canonical URL or choose to canonical it.

for small sites this is not a big an issue (you will have internal duplicate pages for google for an unknown amount of time, but at one point they will probably be canoncalized). for big sites with millions and millions of URLs this is a big issue. basically in your example is the worst case: URL rule 6 (least important) breaks rule nr 1. then why do it at all

additionally to communicate different URLs to the users (based on the way which they came to your site) which is just bad UX.

don't do it.

ckluis · on Sept 19, 2015

I like this solution.

Essentially qz.com/122345/{anything-here} will redirect to the canonical url allowing for experimentation on the title of articles and urls.

thephyber · on Sept 19, 2015

I thought this was fairly common knowledge.

Using a DB PKID is a faster lookup than a text slug and uses much less storage space in the DB.

For SEO / URL permanence reasons, the PKID is always the authoritative key while the slug can be updated to represent the current content of the URL.

jjsewell-ff · on Sept 19, 2015

When building content management systems, we've taken a approach similar to this to keep URLs constant when names of articles, posts, objects might get changed by an site admin. The first time I noticed this approach was Trello.

Here's an example trello URL: https://trello.com/x/1234567/203-make-the-buttons-bigger

If you change the name of the card, the ID (203) stays the same, but the friendly part of the URL stays the same. When directing you to the card, the system doesn't care past the ID.

giancarlostoro · on Sept 19, 2015

Interestingly enough I think I tried the same thing when I saw a link from the same site. It is indeed a great workaround to the changing URL's dilemma.

ambirex · on Sept 19, 2015

We have reversed it to be example.com/seo-go-nuts/%d/ to bring the text closer together.

kissgyorgy · on Sept 19, 2015

The problem with that if the user chop off the last bits (e.g. Pasting in simewhere where it cannot fit) the id lost and you can't look it up. it uappens more than you would think. It's important to have it early.

eli · on Sept 19, 2015

We use this scheme as well. If only the last part with the ID is cut off and you keep the slug text unique, you can still redirect to the correct article.

kissgyorgy · on Sept 20, 2015

Then you don't need the ID at all :) because you use the slug

eli · on Sept 20, 2015

I want the slug to be able to change and I'd prefer not to have to keep track of every variation ever assigned to that piece of content.

kissgyorgy · on Sept 21, 2015

if you go with /id/slug, you can redirect anything that is not exactly the same as the current, so any older links would still work because no matter what the slug was, you can redirect because the ID doesn't change.

eli · on Sept 22, 2015

right, same as /slug/id

kissgyorgy · on Sept 23, 2015

no :D /id/slug is safer