I think the 8 trillion parameters is accurate- Tangora is an N-gram model with a vocab size of 20,000 words and N = 3.
Parameters for an N-gram model = V^(N-1) * (V-1)
Plugging in V=20,000 words and N = 3 for Tangora, you'd get 7.9996E12.
Most of the parameters are likely zero or close to it because many 3-grams are possible but not likely to occur. (However the aggregate probability of all 3-grams is substantial and thus they have to be included.)
Parameters for an N-gram model = V^(N-1) * (V-1) Plugging in V=20,000 words and N = 3 for Tangora, you'd get 7.9996E12.
Most of the parameters are likely zero or close to it because many 3-grams are possible but not likely to occur. (However the aggregate probability of all 3-grams is substantial and thus they have to be included.)