is the semantic similarity between typed-terms

x and y, which can be calculated directly as the cosine similarity between

their concept cluster vectors.

(2)

measures

semantic relatedness between typed terms and . We denote the co-occur concept cluster vector of

typed-term as which can be retrieved from the compressed

co-occurrence network, and the concept cluster vector of typed-term y as. We observe that the larger the overlapping between

these two concept cluster vectors, the stronger the relatedness between

typed-terms and y.

Therefore, we calculate follows:

(3)

To

determine a valid segmentation, the following heuristics are used.

(a)

Except for stop words, each word belongs

to one and only one term

(b)

Terms are coherent (i.e., terms mutually

reinforce each other).

We

use a graph to represent candidate terms and their relationships. In this work,

we de?ne two types of relations among candidate terms.

Mutual

Exclusion – Candidate terms that contain a same word are mutually exclusive.

Mutual

Reinforcement – Candidate terms that are related mutually reinforce each other

(i.e. they are semantically related).

A. Term graph

construction

Based

on the above two types of relations, we construct a term graph TG, where each

node is a candidate term. We associate each node with a weight representing its

coverage of words in the short text excluding stop words. We add an edge

between two candidate terms when they are not mutually exclusive, and set the

edge weight to re?ect the strength of mutual reinforcement as

(4)

Where

is a small

positive weight, is the set of

typed-terms for term x, is the set of

typed terms for term y, and is the af?nity

score between typed-terms and de?ned in Eq. (1).

Since a term may potentially map to multiple typed-terms, we de?ne the edge

weight between two candidate terms as the maximum af?nity score between their

corresponding typed-terms. When two terms are not related, the edge weight is

set to be slightly larger than 0 (to guarantee the feasibility of a Monte Carlo

algorithm). is the semantic similarity between typed-terms

x and y, which can be calculated directly as the cosine similarity between

their concept cluster vectors.

(2)

measures

semantic relatedness between typed terms and . We denote the co-occur concept cluster vector of

typed-term as which can be retrieved from the compressed

co-occurrence network, and the concept cluster vector of typed-term y as. We observe that the larger the overlapping between

these two concept cluster vectors, the stronger the relatedness between

typed-terms and y.

Therefore, we calculate follows:

(3)

To

determine a valid segmentation, the following heuristics are used.

(a)

Except for stop words, each word belongs

to one and only one term

(b)

Terms are coherent (i.e., terms mutually

reinforce each other).

We

use a graph to represent candidate terms and their relationships. In this work,

we de?ne two types of relations among candidate terms.

Mutual

Exclusion – Candidate terms that contain a same word are mutually exclusive.

Mutual

Reinforcement – Candidate terms that are related mutually reinforce each other

(i.e. they are semantically related).

A. Term graph

construction

Based

on the above two types of relations, we construct a term graph TG, where each

node is a candidate term. We associate each node with a weight representing its

coverage of words in the short text excluding stop words. We add an edge

between two candidate terms when they are not mutually exclusive, and set the

edge weight to re?ect the strength of mutual reinforcement as

(4)

Where

is a small

positive weight, is the set of

typed-terms for term x, is the set of

typed terms for term y, and is the af?nity

score between typed-terms and de?ned in Eq. (1).

Since a term may potentially map to multiple typed-terms, we de?ne the edge

weight between two candidate terms as the maximum af?nity score between their

corresponding typed-terms. When two terms are not related, the edge weight is

set to be slightly larger than 0 (to guarantee the feasibility of a Monte Carlo

algorithm).